Programming with Python¶
Session Overview¶
What is programming with Python?
Basic math with Python
Types of variables
Importing packages
Numpy
For loops, if statements, and logicals
Creating Functions
Reading in a NetCDF file
What is programming with Python?¶
Python is an open-source programming language and the most widely used language in the world. Why it’s awesome:
Free to use 💸
Extensive ecosystem of libraries 📚
Huge community of users 👭
Tools for reproducible research 🧑🔬
Often, code is written in a text editor, then run in a command-line interface. Jupyter Notebooks 📓 allows us to write and run code within a single document. They also allow us to embed text and code.
Visual Studio Code (VSCode) is an easy to use development environment with extensions for every major programming language. We will be using VSCode for this workshop
Basic Mathematical Operations 📝¶
Operation |
Operator |
Example |
Value |
|---|---|---|---|
Addition |
|
|
|
Subtraction |
|
|
|
Multiplication |
|
|
|
Division |
|
|
|
Modulus |
|
|
|
Exponentiation |
|
|
|
We will enter our expressions in code cells. Hit shift + enter or press the “Run” button to execute the code in the cell.
[6]:
23
[6]:
23
[7]:
-15 + 23.42
[7]:
8.420000000000002
[8]:
8 ** 3
[8]:
512
Python uses typical order of operations - PEMDAS ✏️
[9]:
(2 + 3 + 4) / 3
[9]:
3.0
Variables¶
A variable is a place to store a value or object, so it can be referred to later in our code. To define a variable, we use an assignment statement
In the example above, zebra is bound to 9 (the value) not 23-14 (expression)
Example¶
Before we assign it a value, a variable is undefined.
[10]:
temp_in_c
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 temp_in_c
NameError: name 'temp_in_c' is not defined
[ ]:
temp_in_c = 5
temp_in_c
5
[ ]:
temp_in_f = temp_in_c * 9/5 + 32
temp_in_f
41.0
Any time we use temp_in_f in an expression, 41.0 is substituted for it.
[ ]:
temp_in_f * -4
-164.0
The above expression does not change the value of temp_in_f, because we did not reassign temp_in_f
[ ]:
temp_in_f
41.0
Naming variables¶
Give your variables helpful names so that you/your collaborators know what they refer to
Variables can contain uppercase, lowercase, numbers, and underscores
they cannot start with a number
they are case sensitive!
no character limit!
Examples of valid but poor variable names:
[ ]:
six = 15
[ ]:
hours = 60 * 60 * 24 * 365
Examples of assignment statements that are valid and use good variable names:
[ ]:
seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year
Variable Types¶
What’s the difference?
[ ]:
4 / 2
2.0
[ ]:
5 - 3
2
To us, 2.0 and 2 are the same number. But to Python, these appear to be different
Two numeric variable types: int and float¶
int: an integer of any sizefloat: a number with a decimal point
Integers int:¶
If you add, subtract, multiply or exponentiate
int, result is anotherintintprecision is exact (i.e., 5 is exactly 5)
Use type() to check the kind of data type
[ ]:
type(2 ** 300)
int
Floats float:¶
Specified using a decimal point
Might be printed using scientific notation
[ ]:
3.2 + 2.5
5.7
[ ]:
type(5.7)
float
Strings str 🧶¶
A string is a snippet of text.
Enclosed by either single quotes (’) or doulble quotes (“)
Can be any length
[ ]:
"My string"
'My string'
[ ]:
type("My string")
str
Note: Python automatically determines types
String arithmetic¶
When using the + symbol between strings, the operation is called concatenation
[ ]:
s1 = 'send'
s2 = 'waves'
[ ]:
s1 + s2
'sendwaves'
[ ]:
s1 + ' ' + s2
'send waves'
String functions¶
You can use special functions on strings
Examples: upper, title, replace, but there are many more
[ ]:
s1.upper()
'SEND'
[ ]:
s1.title()
'Send'
[ ]:
s1.replace('s', 'b')
'bend'
We can look at the length of our string
[ ]:
len(s1)
4
Converting between data types¶
Mixing
ints andfloats in an expression results in a ``float``A value can be converted to an
int,float, orstrSome strings can ve converted to
intandfloat
[ ]:
int(2.0 + 3)
5
[ ]:
str(3)
'3'
A note on built-in Python Functions¶
Functions in Python work the same way mathematical functions do
Input values to functions are called arguments
Calling a function asks the function to execute code on the given arguments
Python comes with a number of built-in functions such as int, float, str, type, and len (which we have already used)
Type
?after a function’s name to see its documentation, or use thehelpfunction
[ ]:
str?
Init signature: str(self, /, *args, **kwargs)
Docstring:
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to 'utf-8'.
errors defaults to 'strict'.
Type: type
Subclasses: StrEnum, DeferredConfigString, _rstr, LSString, include, Keys, InputMode, ColorDepth, CompleteStyle, FoldedCase, ...
[ ]:
help(len)
Help on built-in function len in module builtins:
len(obj, /)
Return the number of items in a container.
Print statements¶
In order to look at the content of a variable, we do not always need to call the variable name at the end of the script
We can use the built-in print function instead
[ ]:
phrase = "world!"
print(phrase)
world!
We can also print statements before printing the variable
[ ]:
print("Hello", phrase)
Hello world!
Booleans¶
Conditional statements check if a statement is either True or False
The result can be stored in a variable called a boolean (bool for short).
Booleans can only be True or False
Comparison Operators
Symbol |
Meaning |
|---|---|
|
equal to |
|
not equal to |
|
less than |
|
less than or equal to |
|
greater than |
|
greater than or equal to |
[ ]:
a = (5 == 6)
a
False
[ ]:
type(a)
bool
[ ]:
b = 9 + 10 < 21
b
True
Lists¶
A list is used to store multiple values. To create a new list, use [square brackets]
Lists are a sequence of any type of object
[ ]:
temp_list = [38, 33, 40, 34, 26, 23, 34]
[ ]:
type(temp_list)
list
To find the average temperature, we can divide the sum of the temperatures by the number of temperatures recorded, using built in functions sum and len:
[ ]:
sum(temp_list) / len(temp_list)
32.57142857142857
A single list can store elements of different types
[ ]:
mixed_temp = [68, 'sixty', 68.9, 62]
mixed_temp
[68, 'sixty', 68.9, 62]
A note on Lists¶
Lists are slow for data processing
To work with datasets, we want to use arrays instead
To gain this additional functionality, we need to import a library
Importing Packages¶
Python doesn’t have everything we need built in. Without reinventing the wheel, we can utilize tools already developed
We import packages (AKA libraries) through import statements
Syntax for calling functions:
package_name.function()
[ ]:
import numpy as np # numpy is usually imported as np
from numpy import ones_like
from package import functionUseful Packages:
Package |
Purpose |
|---|---|
numpy |
Numerical operations |
matplotlib |
Plotting and visualization |
netCDF4 |
Using netCDF files |
pandas |
Data analysis and manipulation |
xarray |
Labeled multi-dimensional arrays |
Packages have their own associated documentation.
NumPy.¶

NumPy provides support for arrays and matrix operations and is the most heavily used math library in Python.
Arrays¶
Arrays are collections of values (similar to lists), but are optimized for numerical computations
Store elements of a single, uniform data type
The simplest way to create an array is to pass a list of numbers as the input to np.array()
[ ]:
my_list = [0., 1., 2., 3., 4., 5.]
my_array = np.array(my_list)
my_array
array([0., 1., 2., 3., 4., 5.])
NumPy arrays have the type numpy.ndarray
[ ]:
type(my_array)
numpy.ndarray
Positions¶
Each element of an array has a position
Python is “0-indexed’
This means that the position of the first element in an array is 0, not 1.
[ ]:
my_array[0]
np.float64(0.0)
A negative number indicates that the count is going backward (i.e variable[-1] is the last element in the array)
[ ]:
my_array[-1]
np.float64(5.0)
Array-number arithmetic¶

[ ]:
# Increase all temperatures by 3 degrees
my_array + 3
array([3., 4., 5., 6., 7., 8.])
[ ]:
# halve all temperatures
my_array / 2
array([0. , 0.5, 1. , 1.5, 2. , 2.5])
Is my_array changed?
[ ]:
my_array # no!
array([0., 1., 2., 3., 4., 5.])
Slicing Arrays¶
When working with NumPy arrays, slicing allows us to extract a portion of the data using:
array[start:stop]-> starts atstartindex up to but not including thestopindexarray[:stop]-> starts at the beginning (index 0) and goes up tostop -1array[start:]-> starts atstartand goes all the way to the endarray[:]-> gives you the entire array
[ ]:
pizza = np.arange(0, 8) # (can also use np.linspace(0, 7, 8))
print(pizza)
print("Number of pizza slices:", len(pizza))
[0 1 2 3 4 5 6 7]
Number of pizza slices: 8
[ ]:
# I want to eat slices 1-4
print(pizza[1:5])
[1 2 3 4]
[ ]:
# I want to eat up to slice 2
print(pizza[:3])
[0 1 2]
[ ]:
# I want to eat all slices except 0-3
print(pizza[4:])
[4 5 6 7]
[ ]:
# I want the whole pizza
print(pizza[:])
[0 1 2 3 4 5 6 7]
Array Methods¶
NumPy comes with many functions that can be used with arrays
A full list of methods can be found in the NumPy documentation
[ ]:
print("The max of my array is:", np.max(my_array))
The max of my array is: 5.0
We can call NumPy functions on an array with np.function(array) or array.function().
[ ]:
print("The minimum of my array is:", my_array.min())
print("The mean of my array is:", my_array.mean())
The minimum of my array is: 0.0
The mean of my array is: 2.5
np.append()¶
We can add to the end of our array with np.append.
[ ]:
np.append(pizza, 8.)
array([0., 1., 2., 3., 4., 5., 6., 7., 8.])
Activity¶
Suppose a coastal town is experiencing a storm. On Day 1, it rains 1mm. Each day after that, rainfall total increases by 1mm. If this continues for 30 days, how much total rain falls in that month in centimeters? Save this value as rain_total.
Hint: Use np.arange and .sum()
[ ]:
rain_total = np.arange(1,31).sum() / 10 # in cm
rain_total
np.float64(46.5)
If statements and For loops¶
What is an If statement?¶
An if statement lets you make decisions in your code
Checks whether a condition is
TrueorFalseIf
True, a block of code is run
If statements check a conditional statement, and they evaluate if it is True. The syntax is:
if (conditional statement):
code to execute
[ ]:
temp_today = 22
if (temp_today > 20):
print("It's warm today!")
It's warm today!
Adding else and elif¶
You can also use elif, short for else if to check multiple conditions
An else statement catches everything else
[ ]:
temp_today = 19
if temp_today > 20:
print("It's hot!")
elif temp_today > 18:
print("It's warm")
else:
print("it's chilly")
It's warm
for loops¶
A for loop is used to repeat a block of code a certain number of times (perhaps for iterating over elements in a list, array, or range of numbers). range is commonly used in for loops, and is similar to np.arange().
[ ]:
for i in range(0, 10): # i is the loop variable, which will range between 0 and 9
print(i)
0
1
2
3
4
5
6
7
8
9
We can also iterate over lists/arrays
[ ]:
colors = ["Red", "Orange", "Yellow", "Green", "Blue", "Purple"]
for c in colors:
print(c)
Red
Orange
Yellow
Green
Blue
Purple
if statements in for loops:¶
for loops and if statements can be combined!
[ ]:
temps = np.arange(15.,25.,2)
for t in temps:
if t > 20:
print("It's hot", t)
else:
print("It's cold", t)
It's cold 15.0
It's cold 17.0
It's cold 19.0
It's hot 21.0
It's hot 23.0
Creating counters¶
A counter is a variable you use to keep track of how many times something happens. The general format is:
Start the counter at 0
Add
1every time something meets a conditionShort hand notation for updating variables with add/subtract/multiply is:
+=,-=,*=, etc.
Let’s try to find how many days are warm enough to swim without a wetsuit
[ ]:
count_warm = 0
for t in temps:
if t > 20:
count_warm += 1
print("Number of warm days: ", count_warm)
Number of warm days: 2
Activity¶
Scenario: You are tasked to monitor local buoys. Each buoy records significant wave height in meters. Your task is to:
Print out the height of each wave
Use an
ifstatement to flag dangerous waves (greater then 2.5 meters)Count how many waves are dangerous
[ ]:
wave_heights = [1.2, 2.7, 3.1, 0.9, 2.0, 2.6, 1.8, 3.5, 2.3, 2.3, 3.1, 0.8, 1.9, 2.5, 1.7]
# 1. Loop through each wave height
# 2. Print the wave height
# 3. If the height is > 2.5, print a warning
# 4. Count how many are dangerous
Solution¶
[ ]:
# Solution
# Wave heights recorded by different buoys (in meters)
wave_heights = [1.2, 2.7, 3.1, 0.9, 2.0, 2.6, 1.8, 3.5, 2.3]
# Initialize counter for dangerous waves
danger_count = 0
# Loop through each wave height
for wave in wave_heights:
if wave > 2.5:
print(f"Wave height: {wave} m — Danger!")
danger_count += 1 # Increment counter
else:
print(f"Wave height: {wave} m")
# Print total number of dangerous waves
print(f"\nTotal dangerous waves: {danger_count}")
Wave height: 1.2 m
Wave height: 2.7 m — Danger!
Wave height: 3.1 m — Danger!
Wave height: 0.9 m
Wave height: 2.0 m
Wave height: 2.6 m — Danger!
Wave height: 1.8 m
Wave height: 3.5 m — Danger!
Wave height: 2.3 m
Total dangerous waves: 4
Creating Functions¶
Up until this point, we have used existing functions to learn Python
We can also define our own functions
Basic Syntax¶
def function_name(argument):
# comment
result = 1 + argument
return result
def= tells Python we are defining a functionfunction_name= name of the function (you choose!)argument= input(s) to the functioncomment= text explaining the codereturn= sends back a resultvariables defined inside a function only exist inside the function, they must be returned to save them
[ ]:
def greeting():
print("Hello!")
return
greeting()
Hello!
[ ]:
# Convert Celsius to Fahrenheit
def c_to_f(temp_c):
temp_f = (temp_c * 9/5) + 32
return temp_f
[ ]:
c_to_f(20)
68.0
[ ]:
# add_it_up takes 2 arguments
def add(a,b):
""" adds two numbers"""
return a+b
add(2,2)
4
[ ]:
def pythagorean(a, b):
'''Computes the hypotenuse length of a right triangle with legs a and b.'''
c = (a ** 2 + b ** 2) ** 0.5
return c
[ ]:
pythagorean(3,4)
5.0
Reading In Data: Scripps Pier Temperature¶
The data for this exercise is taken from the Scripps Pier. The data is stored in a netCDF format, so we will import some tools from our netCDF4 Python package. In case you aren’t familiar, NetCDF is a file format that stores data and meta data. This allows us to have a temperature timeseries with an associated time, depth, lat, and lon.
[1]:
from netCDF4 import Dataset
# open the dataset in read mode, we will not be editing it
ds = Dataset("python_programming/scripps_pier-2023.nc", mode='r')
print(ds)
<class 'netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): time(125305), maxStrlen64(64)
variables(dimensions): int64 time(time), float32 temperature(time), float32 conductivity(time), float32 pressure(time), float32 salinity(time), float32 chlorophyll_raw(time), float32 chlorophyll(time), int8 temperature_flagPrimary(time), int8 temperature_flagSecondary(time), int8 conductivity_flagPrimary(time), int8 conductivity_flagSecondary(time), int8 pressure_flagPrimary(time), int8 pressure_flagSecondary(time), int8 salinity_flagPrimary(time), int8 salinity_flagSecondary(time), int8 chlorophyll_flagPrimary(time), int8 chlorophyll_flagSecondary(time), float32 sigmat(time), float32 diagnosticVoltage(time), float32 currentDraw(time), float32 aux1(time), float32 aux3(time), float32 aux4(time), |S1 instrument1(maxStrlen64), |S1 instrument2(maxStrlen64), |S1 platform1(maxStrlen64), |S1 station(maxStrlen64), float32 lat(), float32 lon(), float32 depth(), float64 crs()
groups:
[ ]:
# Print the available variable names with keys():
print(ds.variables.keys())
dict_keys(['time', 'temperature', 'conductivity', 'pressure', 'salinity', 'chlorophyll_raw', 'chlorophyll', 'temperature_flagPrimary', 'temperature_flagSecondary', 'conductivity_flagPrimary', 'conductivity_flagSecondary', 'pressure_flagPrimary', 'pressure_flagSecondary', 'salinity_flagPrimary', 'salinity_flagSecondary', 'chlorophyll_flagPrimary', 'chlorophyll_flagSecondary', 'sigmat', 'diagnosticVoltage', 'currentDraw', 'aux1', 'aux3', 'aux4', 'instrument1', 'instrument2', 'platform1', 'station', 'lat', 'lon', 'depth', 'crs'])
We can extract the tempearture data using it’s name temperature and the [:] operator to get all of the values.
[ ]:
# Extract data
temp_nc = ds.variables['temperature'][:]
temp_nc
masked_array(data=[15.1105, 15.1084, 15.0969, ..., 16.6199, 16.6152,
16.6175],
mask=False,
fill_value=np.float64(1e+20),
dtype=float32)
We can see that this has some extra meta data in it, so lets convert it to a regular NumPy array.
[ ]:
temp = np.array(temp_nc)
temp
array([15.1105, 15.1084, 15.0969, ..., 16.6199, 16.6152, 16.6175],
shape=(125305,), dtype=float32)
Convert all temperatures to Farenheit and assign it to a new variable temp_farenheit
[ ]:
# convert all temperature to Farenheit
temp_far = (9/5) * temp + 32
#or
temp_far = c_to_f(temp)
temp_far
array([59.198902, 59.19512 , 59.17442 , ..., 61.91582 , 61.90736 ,
61.9115 ], shape=(125305,), dtype=float32)
[ ]:
time_nc = ds.variables['time']
time_nc
<class 'netCDF4.Variable'>
int64 time(time)
units: minutes since 2023-01-01 00:01:00
calendar: proleptic_gregorian
unlimited dimensions: time
current shape = (125305,)
filling on, default _FillValue of -9223372036854775806 used
The meta data can be useful! Here it tells us that our units of time are minutes since Jan 1, 2023 at 12:01 AM. Let’s extract the time data, the same way we did for temperature.
[ ]:
# create an array
time = np.array(time_nc[:])
time
array([ 0, 4, 8, ..., 504299, 504303, 504307],
shape=(125305,))
What is the time between data points? (How frequently do we get our temperature measurements?)
[ ]:
dt = time[1] - time[0]
print('Temperature is measured every', dt, 'minutes')
Temp is measured every 4 minutes
How many data points are there per day?
[ ]:
data_per_hr = 60/dt
data_per_day = data_per_hr * 24
data_per_day
np.float64(360.0)
Print the first hour’s worth of data
[ ]:
temp[:data_per_hr]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[157], line 1
----> 1 temp[:data_per_hr]
TypeError: slice indices must be integers or None or have an __index__ method
Whoops! We can’t index with a float. We need to convert to an int.
[ ]:
temp[:int(data_per_hr)]
array([15.1105, 15.1084, 15.0969, 15.0874, 15.0832, 15.0824, 15.0818,
15.0799, 15.0732, 15.0711, 15.0799, 15.0741, 15.0745, 15.0684,
15.095 ], dtype=float32)
Compute the daily mean temperature for each day
Steps:
Define a function that takes in a 1D array of temperatures and returns a list of daily means
use a for loop to slice the array into chunks of 360
Compute and store the mean for each day
[ ]:
## Starter Code
def compute_daily_means(temp_array):
daily_means = [] # Start with an empty array
# Calculate how many full days are in the dataset
num_days = ... # Hint: use len(temp_array) and division /
# Loop through each day
for i in range(num_days):
# HINT: slice the array to get one day's worth of temperatures
start = i * ...
end = start + ...
day_temps = temp_array[start:end]
# Compute the daily mean
mean = ...
# Append the result to daily_means
daily_means = ...
return daily_means
# Run your function on the data and print results
daily_avgs = compute_daily_means(temp)
daily_avgs
Potential solution:
[ ]:
def compute_daily_means(temp_array):
daily_means = [] # empty array
num_days = len(temp_array) / 360
for i in np.arange(num_days):
start = int(i * 360)
end = start + 360
day_temps = temp_array[start:end]
daily_mean = np.mean(day_temps)
daily_means.append(daily_mean)
return daily_means
# Run the function
daily_means = compute_daily_means(temp)
daily_means[:10]
[np.float32(14.855899),
np.float32(14.873767),
np.float32(14.834493),
np.float32(14.812212),
np.float32(14.873991),
np.float32(14.855767),
np.float32(14.821135),
np.float32(14.878883),
np.float32(14.8914795),
np.float32(14.778408)]
Acknowledgements¶
Some of the material in this lesson is derived from the Software Carpentry Lessons for Python Programming and Plotting https://swcarpentry.github.io/python-novice-inflammation/reference/ and HDSI at UC San Diego https://datascience.ucsd.edu/