Programming with Python

Session Overview

  1. What is programming with Python?

  2. Basic math with Python

  3. Types of variables

  4. Importing packages

  5. Numpy

  6. For loops, if statements, and logicals

  7. Creating Functions

  8. Reading in a NetCDF file

What is programming with Python?

Python is an open-source programming language and the most widely used language in the world. Why it’s awesome:

  • Free to use 💸

  • Extensive ecosystem of libraries 📚

  • Huge community of users 👭

  • Tools for reproducible research 🧑‍🔬

Often, code is written in a text editor, then run in a command-line interface. Jupyter Notebooks 📓 allows us to write and run code within a single document. They also allow us to embed text and code.

Alt text

Visual Studio Code (VSCode) is an easy to use development environment with extensions for every major programming language. We will be using VSCode for this workshop

Basic Mathematical Operations 📝

Operation

Operator

Example

Value

Addition

+

2 + 3

5

Subtraction

-

2 - 3

-1

Multiplication

*

2 * 3

6

Division

/

7 / 3

2.66667

Modulus

%

7 % 3

1

Exponentiation

**

2 ** 0.5

1.41421

We will enter our expressions in code cells. Hit shift + enter or press the “Run” button to execute the code in the cell.

[6]:
23
[6]:
23
[7]:
-15 + 23.42
[7]:
8.420000000000002
[8]:
8 ** 3
[8]:
512

Python uses typical order of operations - PEMDAS ✏️

[9]:
(2 + 3 + 4) / 3
[9]:
3.0

Variables

A variable is a place to store a value or object, so it can be referred to later in our code. To define a variable, we use an assignment statement

Alt text

In the example above, zebra is bound to 9 (the value) not 23-14 (expression)

Example

Before we assign it a value, a variable is undefined.

[10]:
temp_in_c
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 temp_in_c

NameError: name 'temp_in_c' is not defined
[ ]:
temp_in_c = 5
temp_in_c
5
[ ]:
temp_in_f = temp_in_c * 9/5 + 32
temp_in_f
41.0

Any time we use temp_in_f in an expression, 41.0 is substituted for it.

[ ]:
temp_in_f * -4
-164.0

The above expression does not change the value of temp_in_f, because we did not reassign temp_in_f

[ ]:
temp_in_f
41.0

Naming variables

Give your variables helpful names so that you/your collaborators know what they refer to

  • Variables can contain uppercase, lowercase, numbers, and underscores

    • they cannot start with a number

    • they are case sensitive!

    • no character limit!

Examples of valid but poor variable names:

[ ]:
six = 15
[ ]:
hours = 60 * 60 * 24 * 365

Examples of assignment statements that are valid and use good variable names:

[ ]:
seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year

Variable Types

What’s the difference?

[ ]:
4 / 2
2.0
[ ]:
5 - 3
2

To us, 2.0 and 2 are the same number. But to Python, these appear to be different

Two numeric variable types: int and float

  • int: an integer of any size

  • float: a number with a decimal point

Integers int:

  • If you add, subtract, multiply or exponentiate int, result is another int

  • int precision is exact (i.e., 5 is exactly 5)

Use type() to check the kind of data type

[ ]:
type(2 ** 300)
int

Floats float:

  • Specified using a decimal point

  • Might be printed using scientific notation

[ ]:
3.2 + 2.5
5.7
[ ]:
type(5.7)
float

Strings str 🧶

A string is a snippet of text.

  • Enclosed by either single quotes (’) or doulble quotes (“)

  • Can be any length

[ ]:
"My string"
'My string'
[ ]:
type("My string")
str

Note: Python automatically determines types

String arithmetic

When using the + symbol between strings, the operation is called concatenation

[ ]:
s1 = 'send'
s2 = 'waves'
[ ]:
s1 + s2
'sendwaves'
[ ]:
s1 + ' ' + s2
'send waves'

String functions

You can use special functions on strings

Examples: upper, title, replace, but there are many more

[ ]:
s1.upper()
'SEND'
[ ]:
s1.title()
'Send'
[ ]:
s1.replace('s', 'b')
'bend'

We can look at the length of our string

[ ]:
len(s1)
4

Converting between data types

  • Mixing ints and floats in an expression results in a ``float``

  • A value can be converted to an int, float, or str

  • Some strings can ve converted to int and float

[ ]:
int(2.0 + 3)
5
[ ]:
str(3)
'3'

A note on built-in Python Functions

Functions in Python work the same way mathematical functions do

  • Input values to functions are called arguments

  • Calling a function asks the function to execute code on the given arguments

Python comes with a number of built-in functions such as int, float, str, type, and len (which we have already used)

  • Type ? after a function’s name to see its documentation, or use the help function

[ ]:
str?
Init signature: str(self, /, *args, **kwargs)
Docstring:
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to 'utf-8'.
errors defaults to 'strict'.
Type:           type
Subclasses:     StrEnum, DeferredConfigString, _rstr, LSString, include, Keys, InputMode, ColorDepth, CompleteStyle, FoldedCase, ...
[ ]:
help(len)
Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.

Booleans

Conditional statements check if a statement is either True or False

The result can be stored in a variable called a boolean (bool for short).

Booleans can only be True or False

Comparison Operators

Symbol

Meaning

==

equal to

!=

not equal to

<

less than

<=

less than or equal to

>

greater than

>=

greater than or equal to

[ ]:
a = (5 == 6)
a
False
[ ]:
type(a)
bool
[ ]:
b = 9 + 10 < 21
b
True

Lists

A list is used to store multiple values. To create a new list, use [square brackets]

Lists are a sequence of any type of object

[ ]:
temp_list = [38, 33, 40, 34, 26, 23, 34]
[ ]:
type(temp_list)
list

To find the average temperature, we can divide the sum of the temperatures by the number of temperatures recorded, using built in functions sum and len:

[ ]:
sum(temp_list) / len(temp_list)
32.57142857142857

A single list can store elements of different types

[ ]:
mixed_temp = [68, 'sixty', 68.9, 62]
mixed_temp
[68, 'sixty', 68.9, 62]

A note on Lists

  • Lists are slow for data processing

  • To work with datasets, we want to use arrays instead

  • To gain this additional functionality, we need to import a library

Importing Packages

Python doesn’t have everything we need built in. Without reinventing the wheel, we can utilize tools already developed

  • We import packages (AKA libraries) through import statements

  • Syntax for calling functions: package_name.function()

[ ]:
import numpy as np # numpy is usually imported as np
from numpy import ones_like
As seen above –> instead of importing a complete library, we can import the function we need by using:
from package import function

Useful Packages:

Package

Purpose

numpy

Numerical operations

matplotlib

Plotting and visualization

netCDF4

Using netCDF files

pandas

Data analysis and manipulation

xarray

Labeled multi-dimensional arrays

Packages have their own associated documentation.

NumPy.

Alt text

NumPy provides support for arrays and matrix operations and is the most heavily used math library in Python.

Arrays

Arrays are collections of values (similar to lists), but are optimized for numerical computations

  • Store elements of a single, uniform data type

The simplest way to create an array is to pass a list of numbers as the input to np.array()

[ ]:
my_list = [0., 1., 2., 3., 4., 5.]
my_array = np.array(my_list)
my_array
array([0., 1., 2., 3., 4., 5.])

NumPy arrays have the type numpy.ndarray

[ ]:
type(my_array)
numpy.ndarray

Positions

Each element of an array has a position

Python is “0-indexed’

  • This means that the position of the first element in an array is 0, not 1.

[ ]:
my_array[0]
np.float64(0.0)

A negative number indicates that the count is going backward (i.e variable[-1] is the last element in the array)

[ ]:
my_array[-1]
np.float64(5.0)

Array-number arithmetic

Arrays make it easy to perform the same operation to every element. This is known as broadcasting.
Alt text
[ ]:
# Increase all temperatures by 3 degrees
my_array + 3
array([3., 4., 5., 6., 7., 8.])
[ ]:
# halve all temperatures
my_array / 2
array([0. , 0.5, 1. , 1.5, 2. , 2.5])

Is my_array changed?

[ ]:
my_array # no!
array([0., 1., 2., 3., 4., 5.])

Slicing Arrays

When working with NumPy arrays, slicing allows us to extract a portion of the data using:

  • array[start:stop] -> starts at start index up to but not including the stop index

  • array[:stop] -> starts at the beginning (index 0) and goes up to stop -1

  • array[start:] -> starts at start and goes all the way to the end

  • array[:] -> gives you the entire array

[ ]:
pizza = np.arange(0, 8) # (can also use np.linspace(0, 7, 8))
print(pizza)
print("Number of pizza slices:", len(pizza))
[0 1 2 3 4 5 6 7]
Number of pizza slices: 8
[ ]:
# I want to eat slices 1-4
print(pizza[1:5])
[1 2 3 4]
[ ]:
# I want to eat up to slice 2
print(pizza[:3])
[0 1 2]
[ ]:
# I want to eat all slices except 0-3
print(pizza[4:])
[4 5 6 7]
[ ]:
# I want the whole pizza
print(pizza[:])
[0 1 2 3 4 5 6 7]

Array Methods

NumPy comes with many functions that can be used with arrays

[ ]:
print("The max of my array is:", np.max(my_array))
The max of my array is: 5.0

We can call NumPy functions on an array with np.function(array) or array.function().

[ ]:
print("The minimum of my array is:", my_array.min())
print("The mean of my array is:", my_array.mean())
The minimum of my array is: 0.0
The mean of my array is: 2.5
np.append()

We can add to the end of our array with np.append.

[ ]:
np.append(pizza, 8.)
array([0., 1., 2., 3., 4., 5., 6., 7., 8.])

Activity

Suppose a coastal town is experiencing a storm. On Day 1, it rains 1mm. Each day after that, rainfall total increases by 1mm. If this continues for 30 days, how much total rain falls in that month in centimeters? Save this value as rain_total.

Hint: Use np.arange and .sum()

[ ]:
rain_total = np.arange(1,31).sum() / 10 # in cm
rain_total
np.float64(46.5)

If statements and For loops

What is an If statement?

An if statement lets you make decisions in your code

  • Checks whether a condition is True or False

  • If True, a block of code is run

If statements check a conditional statement, and they evaluate if it is True. The syntax is:

if (conditional statement):
    code to execute
[ ]:
temp_today = 22

if (temp_today > 20):
    print("It's warm today!")
It's warm today!

Adding else and elif

You can also use elif, short for else if to check multiple conditions

An else statement catches everything else

[ ]:
temp_today = 19

if temp_today > 20:
    print("It's hot!")
elif temp_today > 18:
    print("It's warm")
else:
    print("it's chilly")
It's warm

for loops

A for loop is used to repeat a block of code a certain number of times (perhaps for iterating over elements in a list, array, or range of numbers). range is commonly used in for loops, and is similar to np.arange().

[ ]:
for i in range(0, 10):  # i is the loop variable, which will range between 0 and 9
    print(i)
0
1
2
3
4
5
6
7
8
9

We can also iterate over lists/arrays

[ ]:
colors = ["Red", "Orange", "Yellow", "Green", "Blue", "Purple"]
for c in colors:
    print(c)
Red
Orange
Yellow
Green
Blue
Purple

if statements in for loops:

for loops and if statements can be combined!

[ ]:
temps = np.arange(15.,25.,2)

for t in temps:
    if t > 20:
        print("It's hot", t)
    else:
        print("It's cold", t)
It's cold 15.0
It's cold 17.0
It's cold 19.0
It's hot 21.0
It's hot 23.0

Creating counters

A counter is a variable you use to keep track of how many times something happens. The general format is:

  • Start the counter at 0

  • Add 1 every time something meets a condition

  • Short hand notation for updating variables with add/subtract/multiply is: +=, -=, *=, etc.

Let’s try to find how many days are warm enough to swim without a wetsuit

[ ]:
count_warm = 0

for t in temps:
    if t > 20:
        count_warm += 1

print("Number of warm days: ", count_warm)
Number of warm days:  2

Activity

Scenario: You are tasked to monitor local buoys. Each buoy records significant wave height in meters. Your task is to:

  • Print out the height of each wave

  • Use an if statement to flag dangerous waves (greater then 2.5 meters)

  • Count how many waves are dangerous

[ ]:

wave_heights = [1.2, 2.7, 3.1, 0.9, 2.0, 2.6, 1.8, 3.5, 2.3, 2.3, 3.1, 0.8, 1.9, 2.5, 1.7] # 1. Loop through each wave height # 2. Print the wave height # 3. If the height is > 2.5, print a warning # 4. Count how many are dangerous

Solution

[ ]:
# Solution

# Wave heights recorded by different buoys (in meters)
wave_heights = [1.2, 2.7, 3.1, 0.9, 2.0, 2.6, 1.8, 3.5, 2.3]

# Initialize counter for dangerous waves
danger_count = 0

# Loop through each wave height
for wave in wave_heights:
    if wave > 2.5:
        print(f"Wave height: {wave} m —  Danger!")
        danger_count += 1  # Increment counter
    else:
        print(f"Wave height: {wave} m")

# Print total number of dangerous waves
print(f"\nTotal dangerous waves: {danger_count}")
Wave height: 1.2 m
Wave height: 2.7 m —  Danger!
Wave height: 3.1 m —  Danger!
Wave height: 0.9 m
Wave height: 2.0 m
Wave height: 2.6 m —  Danger!
Wave height: 1.8 m
Wave height: 3.5 m —  Danger!
Wave height: 2.3 m

Total dangerous waves: 4

Creating Functions

Up until this point, we have used existing functions to learn Python

We can also define our own functions

Basic Syntax

def function_name(argument):
    # comment
    result = 1 + argument
    return result
  • def = tells Python we are defining a function

  • function_name = name of the function (you choose!)

  • argument = input(s) to the function

  • comment = text explaining the code

  • return = sends back a result

  • variables defined inside a function only exist inside the function, they must be returned to save them

[ ]:
def greeting():
    print("Hello!")
    return

greeting()
Hello!
[ ]:
# Convert Celsius to Fahrenheit
def c_to_f(temp_c):
    temp_f = (temp_c * 9/5) + 32
    return temp_f
[ ]:
c_to_f(20)
68.0
[ ]:
# add_it_up takes 2 arguments
def add(a,b):
    """ adds two numbers"""
    return a+b
add(2,2)
4
[ ]:
def pythagorean(a, b):
    '''Computes the hypotenuse length of a right triangle with legs a and b.'''

    c = (a ** 2 + b ** 2) ** 0.5

    return c
[ ]:
pythagorean(3,4)
5.0

Reading In Data: Scripps Pier Temperature

The data for this exercise is taken from the Scripps Pier. The data is stored in a netCDF format, so we will import some tools from our netCDF4 Python package. In case you aren’t familiar, NetCDF is a file format that stores data and meta data. This allows us to have a temperature timeseries with an associated time, depth, lat, and lon.

[1]:
from netCDF4 import Dataset

# open the dataset in read mode, we will not be editing it
ds = Dataset("python_programming/scripps_pier-2023.nc", mode='r')

print(ds)
<class 'netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): time(125305), maxStrlen64(64)
    variables(dimensions): int64 time(time), float32 temperature(time), float32 conductivity(time), float32 pressure(time), float32 salinity(time), float32 chlorophyll_raw(time), float32 chlorophyll(time), int8 temperature_flagPrimary(time), int8 temperature_flagSecondary(time), int8 conductivity_flagPrimary(time), int8 conductivity_flagSecondary(time), int8 pressure_flagPrimary(time), int8 pressure_flagSecondary(time), int8 salinity_flagPrimary(time), int8 salinity_flagSecondary(time), int8 chlorophyll_flagPrimary(time), int8 chlorophyll_flagSecondary(time), float32 sigmat(time), float32 diagnosticVoltage(time), float32 currentDraw(time), float32 aux1(time), float32 aux3(time), float32 aux4(time), |S1 instrument1(maxStrlen64), |S1 instrument2(maxStrlen64), |S1 platform1(maxStrlen64), |S1 station(maxStrlen64), float32 lat(), float32 lon(), float32 depth(), float64 crs()
    groups:
[ ]:
# Print the available variable names with keys():
print(ds.variables.keys())
dict_keys(['time', 'temperature', 'conductivity', 'pressure', 'salinity', 'chlorophyll_raw', 'chlorophyll', 'temperature_flagPrimary', 'temperature_flagSecondary', 'conductivity_flagPrimary', 'conductivity_flagSecondary', 'pressure_flagPrimary', 'pressure_flagSecondary', 'salinity_flagPrimary', 'salinity_flagSecondary', 'chlorophyll_flagPrimary', 'chlorophyll_flagSecondary', 'sigmat', 'diagnosticVoltage', 'currentDraw', 'aux1', 'aux3', 'aux4', 'instrument1', 'instrument2', 'platform1', 'station', 'lat', 'lon', 'depth', 'crs'])

We can extract the tempearture data using it’s name temperature and the [:] operator to get all of the values.

[ ]:
# Extract data
temp_nc = ds.variables['temperature'][:]
temp_nc
masked_array(data=[15.1105, 15.1084, 15.0969, ..., 16.6199, 16.6152,
                   16.6175],
             mask=False,
       fill_value=np.float64(1e+20),
            dtype=float32)

We can see that this has some extra meta data in it, so lets convert it to a regular NumPy array.

[ ]:
temp = np.array(temp_nc)
temp
array([15.1105, 15.1084, 15.0969, ..., 16.6199, 16.6152, 16.6175],
      shape=(125305,), dtype=float32)

Convert all temperatures to Farenheit and assign it to a new variable temp_farenheit

  • Hint: $ ^:nbsphinx-math:circ `F = (:nbsphinx-math:frac{9}{5}` * ^:nbsphinx-math:`circ `C) + 32$

[ ]:
# convert all temperature to Farenheit
temp_far = (9/5) * temp + 32
#or
temp_far = c_to_f(temp)
temp_far
array([59.198902, 59.19512 , 59.17442 , ..., 61.91582 , 61.90736 ,
       61.9115  ], shape=(125305,), dtype=float32)
[ ]:
time_nc = ds.variables['time']
time_nc
<class 'netCDF4.Variable'>
int64 time(time)
    units: minutes since 2023-01-01 00:01:00
    calendar: proleptic_gregorian
unlimited dimensions: time
current shape = (125305,)
filling on, default _FillValue of -9223372036854775806 used

The meta data can be useful! Here it tells us that our units of time are minutes since Jan 1, 2023 at 12:01 AM. Let’s extract the time data, the same way we did for temperature.

[ ]:
# create an array
time = np.array(time_nc[:])
time
array([     0,      4,      8, ..., 504299, 504303, 504307],
      shape=(125305,))

What is the time between data points? (How frequently do we get our temperature measurements?)

[ ]:
dt = time[1] - time[0]

print('Temperature is measured every', dt, 'minutes')
Temp is measured every 4 minutes

How many data points are there per day?

[ ]:
data_per_hr = 60/dt
data_per_day = data_per_hr * 24
data_per_day
np.float64(360.0)

Print the first hour’s worth of data

[ ]:
temp[:data_per_hr]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[157], line 1
----> 1 temp[:data_per_hr]

TypeError: slice indices must be integers or None or have an __index__ method

Whoops! We can’t index with a float. We need to convert to an int.

[ ]:
temp[:int(data_per_hr)]
array([15.1105, 15.1084, 15.0969, 15.0874, 15.0832, 15.0824, 15.0818,
       15.0799, 15.0732, 15.0711, 15.0799, 15.0741, 15.0745, 15.0684,
       15.095 ], dtype=float32)

Compute the daily mean temperature for each day

Steps:

  1. Define a function that takes in a 1D array of temperatures and returns a list of daily means

  2. use a for loop to slice the array into chunks of 360

  3. Compute and store the mean for each day

[ ]:
## Starter Code

def compute_daily_means(temp_array):
    daily_means = []  # Start with an empty array

    # Calculate how many full days are in the dataset
    num_days = ...  # Hint: use len(temp_array) and  division /

    # Loop through each day
    for i in range(num_days):
        # HINT: slice the array to get one day's worth of temperatures
        start = i * ...
        end = start + ...
        day_temps = temp_array[start:end]

        # Compute the daily mean
        mean = ...

        # Append the result to daily_means
        daily_means = ...

    return daily_means

# Run your function on the data and print results
daily_avgs = compute_daily_means(temp)

daily_avgs

Potential solution:

[ ]:
def compute_daily_means(temp_array):
    daily_means = [] # empty array
    num_days = len(temp_array) / 360

    for i in np.arange(num_days):
        start = int(i * 360)
        end = start + 360
        day_temps = temp_array[start:end]
        daily_mean = np.mean(day_temps)
        daily_means.append(daily_mean)

    return daily_means

# Run the function
daily_means = compute_daily_means(temp)
daily_means[:10]


[np.float32(14.855899),
 np.float32(14.873767),
 np.float32(14.834493),
 np.float32(14.812212),
 np.float32(14.873991),
 np.float32(14.855767),
 np.float32(14.821135),
 np.float32(14.878883),
 np.float32(14.8914795),
 np.float32(14.778408)]

Acknowledgements

Some of the material in this lesson is derived from the Software Carpentry Lessons for Python Programming and Plotting https://swcarpentry.github.io/python-novice-inflammation/reference/ and HDSI at UC San Diego https://datascience.ucsd.edu/