{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Programming with Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Session Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1) What is programming with Python?\n", "\n", "2) Basic math with Python\n", "\n", "3) Types of variables\n", "\n", "4) Importing packages\n", "\n", "5) Numpy\n", "\n", "6) For loops, if statements, and logicals\n", "\n", "7) Creating Functions\n", "\n", "8) Reading in a NetCDF file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is programming with Python? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Python** is an open-source programming language and the most widely used language in the world. Why it's awesome:\n", "- Free to use ๐Ÿ’ธ\n", "- Extensive ecosystem of libraries ๐Ÿ“š\n", "- Huge community of users ๐Ÿ‘ญ\n", "- Tools for reproducible research ๐Ÿง‘โ€๐Ÿ”ฌ\n", "\n", "\n", "Often, code is written in a text editor, then run in a command-line interface. **Jupyter Notebooks ๐Ÿ““** allows us to write and run code within a single document. They also allow us to embed text and code. \n", "\n", "![Alt text](python_programming/fig1.png)\n", "\n", "**Visual Studio Code (VSCode)** is an easy to use development environment with extensions for every major programming language. We will be using VSCode for this workshop " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Mathematical Operations ๐Ÿ“" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "| Operation | Operator | Example | Value |\n", "|------------------|----------|--------------|----------|\n", "| Addition | `+` | `2 + 3` | `5` |\n", "| Subtraction | `-` | `2 - 3` | `-1` |\n", "| Multiplication | `*` | `2 * 3` | `6` |\n", "| Division | `/` | `7 / 3` | `2.66667`|\n", "| Modulus | `%` | `7 % 3` | `1` |\n", "| Exponentiation | `**` | `2 ** 0.5` | `1.41421`|\n", "\n", "\n", "We will enter our expressions in **code cells**. Hit **shift + enter** or press the \"Run\" button to execute the code in the cell.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "23" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "23" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "8.420000000000002" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "-15 + 23.42" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "512" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "8 ** 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Python uses typical order of operations - PEMDAS โœ๏ธ**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(2 + 3 + 4) / 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A **variable** is a place to store a value or object, so it can be referred to later in our code. To define a variable, we use an **assignment statement** \n", "\n", "![Alt text](python_programming/fig2.png)\n", "\n", "In the example above, zebra is bound to 9 (the value) not 23-14 (expression)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example\n", "\n", "Before we assign it a value, a variable is undefined." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'temp_in_c' is not defined", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mNameError\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[10]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mtemp_in_c\u001b[49m\n", "\u001b[31mNameError\u001b[39m: name 'temp_in_c' is not defined" ] } ], "source": [ "temp_in_c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_in_c = 5\n", "temp_in_c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "41.0" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_in_f = temp_in_c * 9/5 + 32\n", "temp_in_f" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Any time we use `temp_in_f` in an expression, `41.0` is substituted for it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-164.0" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_in_f * -4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above expression **does not change** the value of `temp_in_f`, because we did not reassign `temp_in_f`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "41.0" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_in_f" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Naming variables\n", "\n", "Give your variables helpful names so that you/your collaborators know what they refer to \n", "\n", "- Variables can contain uppercase, lowercase, numbers, and underscores\n", " - they **cannot** start with a number\n", " - they are case sensitive!\n", " - no character limit!\n", "\n", "Examples of **valid** but **poor** variable names:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "six = 15" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hours = 60 * 60 * 24 * 365" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examples of assignment statements that are **valid** and use **good** variable names:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "seconds_per_hour = 60 * 60\n", "hours_per_year = 24 * 365\n", "seconds_per_year = seconds_per_hour * hours_per_year" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variable Types\n", "\n", "What's the difference?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "4 / 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "5 - 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To us, `2.0` and `2` are the same number. But to Python, these appear to be different\n", "\n", "### Two numeric variable types: `int` and `float`\n", "- `int`: an integer of any size\n", "- `float`: a number with a decimal point" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Integers `int`:\n", "- If you add, subtract, multiply or exponentiate `int`, result is another `int`\n", "- `int` precision is exact (i.e., 5 is exactly 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use type() to check the kind of data type" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(2 ** 300)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Floats `float`:\n", "- Specified using a **decimal** point\n", "- Might be printed using scientific notation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.7" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3.2 + 2.5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(5.7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Strings `str` ๐Ÿงถ\n", "A string is a snippet of text. \n", "- Enclosed by either single quotes (') or doulble quotes (\")\n", "- Can be any length " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'My string'" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"My string\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(\"My string\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note: Python automatically determines types**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### String arithmetic\n", "When using the `+` symbol between strings, the operation is called **concatenation**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s1 = 'send'\n", "s2 = 'waves'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'sendwaves'" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1 + s2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'send waves'" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1 + ' ' + s2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### String functions\n", "You can use special functions on strings\n", "\n", "Examples: `upper`, `title`, `replace`, but there are [many more](https://docs.python.org/3/library/stdtypes.html#string-methods)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'SEND'" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.upper()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Send'" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.title()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bend'" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.replace('s', 'b')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can look at the length of our string " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(s1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Converting between data types\n", "\n", "- Mixing `int`s and `float`s in an expression results in a **`float`**\n", "- A value can be converted to an `int`, `float`, or `str`\n", "- Some strings can ve converted to `int` and `float`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "int(2.0 + 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'3'" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "str(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A note on built-in Python Functions\n", "\n", "Functions in Python work the same way mathematical functions do\n", "- Input values to functions are called **arguments**\n", "- **Calling** a function asks the function to execute code on the given arguments\n", "\n", "Python comes with a number of built-in functions such as `int`, `float`, `str`, `type`, and `len` (which we have already used)\n", "- Type `?` after a function's name to see its documentation, or use the `help` function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mInit signature:\u001b[39m str(self, /, *args, **kwargs)\n", "\u001b[31mDocstring:\u001b[39m \n", "str(object='') -> str\n", "str(bytes_or_buffer[, encoding[, errors]]) -> str\n", "\n", "Create a new string object from the given object. If encoding or\n", "errors is specified, then the object must expose a data buffer\n", "that will be decoded using the given encoding and error handler.\n", "Otherwise, returns the result of object.__str__() (if defined)\n", "or repr(object).\n", "encoding defaults to 'utf-8'.\n", "errors defaults to 'strict'.\n", "\u001b[31mType:\u001b[39m type\n", "\u001b[31mSubclasses:\u001b[39m StrEnum, DeferredConfigString, _rstr, LSString, include, Keys, InputMode, ColorDepth, CompleteStyle, FoldedCase, ..." ] } ], "source": [ "str?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function len in module builtins:\n", "\n", "len(obj, /)\n", " Return the number of items in a container.\n", "\n" ] } ], "source": [ "help(len)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Print statements\n", "In order to look at the content of a variable, we do not always need to call the variable name at the end of the script\n", "\n", "We can use the built-in `print` function instead" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "world!\n" ] } ], "source": [ "phrase = \"world!\"\n", "print(phrase)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also print statements before printing the variable" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello world!\n" ] } ], "source": [ "print(\"Hello\", phrase)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Booleans\n", "Conditional statements check if a statement is either `True` or `False`\n", "\n", "The result can be stored in a variable called a boolean (bool for short).\n", "\n", "Booleans can only be `True` or `False`\n", "\n", "**Comparison Operators**\n", "\n", "| Symbol | Meaning |\n", "|--------|--------------------------|\n", "| `==` | equal to |\n", "| `!=` | not equal to |\n", "| `<` | less than |\n", "| `<=` | less than or equal to |\n", "| `>` | greater than |\n", "| `>=` | greater than or equal to |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = (5 == 6)\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "bool" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = 9 + 10 < 21\n", "b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lists\n", "A list is used to store multiple values. To create a new list, use `[square brackets]`\n", "\n", "Lists are a sequence of any type of object" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "temp_list = [38, 33, 40, 34, 26, 23, 34]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(temp_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To find the average temperature, we can divide the **sum of the temperatures** by the **number of temperatures recorded**, using built in functions `sum` and `len`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "32.57142857142857" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(temp_list) / len(temp_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A single list can store elements of different types" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[68, 'sixty', 68.9, 62]" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mixed_temp = [68, 'sixty', 68.9, 62]\n", "mixed_temp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A note on Lists\n", "- Lists are slow for data processing\n", "- To work with datasets, we want to use arrays instead\n", "- To gain this additional functionality, we need to import a library" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing Packages " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python doesn't have everything we need built in. Without reinventing the wheel, we can utilize tools already developed\n", "- We import **packages (AKA libraries)** through **import statements**\n", "- Syntax for calling functions: `package_name.function()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np # numpy is usually imported as np \n", "from numpy import ones_like" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As seen above --> instead of importing a complete library, we can import the function we need by using:\\\n", "`from package import function`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Useful Packages: \n", "\n", "| Package | Purpose |\n", "|-----------|-----------------------------|\n", "| numpy | Numerical operations |\n", "| matplotlib| Plotting and visualization |\n", "| netCDF4 | Using netCDF files |\n", "| pandas | Data analysis and manipulation|\n", "| xarray | Labeled multi-dimensional arrays |\n", "\n", "Packages have their own associated **documentation**. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **NumPy**. \n", "\n", "\"Alt\n", "\n", "NumPy provides support for arrays and matrix operations and is the most heavily used math library in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Arrays**\n", "\n", "Arrays are collections of values (similar to lists), but are optimized for numerical computations\n", "- Store elements of a single, uniform data type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest way to create an array is to pass a list of numbers as the input to `np.array()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0., 1., 2., 3., 4., 5.])" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_list = [0., 1., 2., 3., 4., 5.]\n", "my_array = np.array(my_list)\n", "my_array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy arrays have the type `numpy.ndarray`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(my_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Positions\n", "Each element of an array has a position\n", "\n", "Python is \"0-indexed'\n", "- This means that the position of the first element in an array is 0, not 1. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(0.0)" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A negative number indicates that the count is going backward (i.e variable[-1] is the **last element** in the array)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(5.0)" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Array-number arithmetic\n", "\n", "Arrays make it easy to perform the same operation to every element. This is known as **broadcasting**. \\\n", "\"Alt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3., 4., 5., 6., 7., 8.])" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Increase all temperatures by 3 degrees\n", "my_array + 3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.5, 1. , 1.5, 2. , 2.5])" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# halve all temperatures\n", "my_array / 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Is `my_array` changed?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0., 1., 2., 3., 4., 5.])" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array # no!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Slicing Arrays\n", "When working with NumPy arrays, slicing allows us to extract a portion of the data using:\n", "- `array[start:stop]` -> starts at `start` index **up to but not including** the `stop` index\n", "- `array[:stop]` -> starts at the beginning (index 0) and goes up to `stop -1`\n", "- `array[start:]` -> starts at `start` and goes **all the way to the end**\n", "- `array[:]` -> gives you the **entire array**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7]\n", "Number of pizza slices: 8\n" ] } ], "source": [ "pizza = np.arange(0, 8) # (can also use np.linspace(0, 7, 8))\n", "print(pizza)\n", "print(\"Number of pizza slices:\", len(pizza))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4]\n" ] } ], "source": [ "# I want to eat slices 1-4\n", "print(pizza[1:5])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2]\n" ] } ], "source": [ "# I want to eat up to slice 2\n", "print(pizza[:3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4 5 6 7]\n" ] } ], "source": [ "# I want to eat all slices except 0-3\n", "print(pizza[4:])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7]\n" ] } ], "source": [ "# I want the whole pizza\n", "print(pizza[:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Array Methods\n", "NumPy comes with many functions that can be used with arrays\n", "- A full list of methods can be found in the NumPy [documentation](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The max of my array is: 5.0\n" ] } ], "source": [ "print(\"The max of my array is:\", np.max(my_array))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can call NumPy functions on an array with `np.function(array)` or `array.function()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The minimum of my array is: 0.0\n", "The mean of my array is: 2.5\n" ] } ], "source": [ "print(\"The minimum of my array is:\", my_array.min()) \n", "print(\"The mean of my array is:\", my_array.mean()) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### np.append()\n", "We can add to the end of our array with `np.append`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0., 1., 2., 3., 4., 5., 6., 7., 8.])" ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.append(pizza, 8.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Activity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose a coastal town is experiencing a storm. On Day 1, it rains **1mm**. Each day after that, rainfall total increases by 1mm. If this continues for 30 days, how much total rain falls in that month in **centimeters**? Save this value as `rain_total`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hint: Use `np.arange` and `.sum()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(46.5)" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rain_total = np.arange(1,31).sum() / 10 # in cm\n", "rain_total" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## If statements and For loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is an `If` statement?\n", "An `if` statement lets you make decisions in your code\n", "- Checks whether a condition is `True` or `False` \n", "- If `True`, a block of code is run\n", "\n", "If statements check a conditional statement, and they evaluate if it is `True`. The syntax is:\n", "\n", "```\n", "if (conditional statement):\n", " code to execute\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "It's warm today!\n" ] } ], "source": [ "temp_today = 22\n", "\n", "if (temp_today > 20):\n", " print(\"It's warm today!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Adding `else` and `elif`\n", "You can also use `elif`, short for **else if** to check multiple conditions\n", "\n", "An `else` statement catches everything else" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "It's warm\n" ] } ], "source": [ "temp_today = 19\n", "\n", "if temp_today > 20:\n", " print(\"It's hot!\")\n", "elif temp_today > 18:\n", " print(\"It's warm\")\n", "else:\n", " print(\"it's chilly\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `for` loops\n", "\n", "A for loop is used to repeat a block of code a certain number of times (perhaps for iterating over elements in a list, array, or range of numbers). `range` is commonly used in for loops, and is similar to `np.arange()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n", "5\n", "6\n", "7\n", "8\n", "9\n" ] } ], "source": [ "for i in range(0, 10): # i is the loop variable, which will range between 0 and 9\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also iterate over lists/arrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Red\n", "Orange\n", "Yellow\n", "Green\n", "Blue\n", "Purple\n" ] } ], "source": [ "colors = [\"Red\", \"Orange\", \"Yellow\", \"Green\", \"Blue\", \"Purple\"]\n", "for c in colors:\n", " print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `if` statements **in** `for` loops:\n", "`for` loops and `if` statements can be combined!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "It's cold 15.0\n", "It's cold 17.0\n", "It's cold 19.0\n", "It's hot 21.0\n", "It's hot 23.0\n" ] } ], "source": [ "temps = np.arange(15.,25.,2)\n", "\n", "for t in temps:\n", " if t > 20: \n", " print(\"It's hot\", t)\n", " else: \n", " print(\"It's cold\", t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Creating counters**\n", "A counter is a variable you use to **keep track of how many times something happens**. The general format is:\n", "- Start the counter at 0 \n", "- Add `1` every time something meets a condition\n", "- Short hand notation for updating variables with add/subtract/multiply is: `+=`, `-=`, `*=`, etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try to find how many days are warm enough to swim without a wetsuit" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of warm days: 2\n" ] } ], "source": [ "count_warm = 0 \n", "\n", "for t in temps:\n", " if t > 20:\n", " count_warm += 1\n", "\n", "print(\"Number of warm days: \", count_warm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Activity**\n", "\n", "Scenario: You are tasked to monitor local buoys. Each buoy records significant wave height in meters. Your task is to:\n", "- Print out the height of each wave\n", "- Use an `if` statement to flag **dangerous waves** (greater then 2.5 meters)\n", "- Count how many waves are dangerous" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "wave_heights = [1.2, 2.7, 3.1, 0.9, 2.0, 2.6, 1.8, 3.5, 2.3, 2.3, 3.1, 0.8, 1.9, 2.5, 1.7]\n", "\n", "# 1. Loop through each wave height\n", "\n", "# 2. Print the wave height\n", "\n", "# 3. If the height is > 2.5, print a warning\n", "\n", "# 4. Count how many are dangerous" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wave height: 1.2 m\n", "Wave height: 2.7 m โ€” Danger!\n", "Wave height: 3.1 m โ€” Danger!\n", "Wave height: 0.9 m\n", "Wave height: 2.0 m\n", "Wave height: 2.6 m โ€” Danger!\n", "Wave height: 1.8 m\n", "Wave height: 3.5 m โ€” Danger!\n", "Wave height: 2.3 m\n", "\n", "Total dangerous waves: 4\n" ] } ], "source": [ "# Solution\n", "\n", "# Wave heights recorded by different buoys (in meters)\n", "wave_heights = [1.2, 2.7, 3.1, 0.9, 2.0, 2.6, 1.8, 3.5, 2.3]\n", "\n", "# Initialize counter for dangerous waves\n", "danger_count = 0\n", "\n", "# Loop through each wave height\n", "for wave in wave_heights:\n", " if wave > 2.5:\n", " print(f\"Wave height: {wave} m โ€” Danger!\")\n", " danger_count += 1 # Increment counter\n", " else:\n", " print(f\"Wave height: {wave} m\")\n", "\n", "# Print total number of dangerous waves\n", "print(f\"\\nTotal dangerous waves: {danger_count}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Functions\n", "Up until this point, we have used existing functions to learn Python\n", "\n", "We can also define our *own* functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic Syntax\n", "\n", "```python\n", "def function_name(argument):\n", " # comment\n", " result = 1 + argument\n", " return result\n", "```\n", "- `def` = tells Python we are defining a function\n", "- `function_name` = name of the function (you choose!)\n", "- `argument` = input(s) to the function\n", "- `comment` = text explaining the code\n", "- `return` = sends back a result\n", "- variables defined inside a function only exist inside the function, they must be returned to save them" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello!\n" ] } ], "source": [ "def greeting():\n", " print(\"Hello!\")\n", " return\n", "\n", "greeting()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Convert Celsius to Fahrenheit\n", "def c_to_f(temp_c):\n", " temp_f = (temp_c * 9/5) + 32\n", " return temp_f" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "68.0" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_to_f(20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# add_it_up takes 2 arguments\n", "def add(a,b):\n", " \"\"\" adds two numbers\"\"\"\n", " return a+b\n", "add(2,2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def pythagorean(a, b):\n", " '''Computes the hypotenuse length of a right triangle with legs a and b.'''\n", " \n", " c = (a ** 2 + b ** 2) ** 0.5\n", "\n", " return c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.0" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pythagorean(3,4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading In Data: Scripps Pier Temperature\n", "\n", "The data for this exercise is taken from the Scripps Pier. The data is stored in a netCDF format, so we will import some tools from our netCDF4 Python package. In case you aren't familiar, NetCDF is a file format that stores data and meta data. This allows us to have a temperature timeseries with an associated time, depth, lat, and lon." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "root group (NETCDF4 data model, file format HDF5):\n", " dimensions(sizes): time(125305), maxStrlen64(64)\n", " variables(dimensions): int64 time(time), float32 temperature(time), float32 conductivity(time), float32 pressure(time), float32 salinity(time), float32 chlorophyll_raw(time), float32 chlorophyll(time), int8 temperature_flagPrimary(time), int8 temperature_flagSecondary(time), int8 conductivity_flagPrimary(time), int8 conductivity_flagSecondary(time), int8 pressure_flagPrimary(time), int8 pressure_flagSecondary(time), int8 salinity_flagPrimary(time), int8 salinity_flagSecondary(time), int8 chlorophyll_flagPrimary(time), int8 chlorophyll_flagSecondary(time), float32 sigmat(time), float32 diagnosticVoltage(time), float32 currentDraw(time), float32 aux1(time), float32 aux3(time), float32 aux4(time), |S1 instrument1(maxStrlen64), |S1 instrument2(maxStrlen64), |S1 platform1(maxStrlen64), |S1 station(maxStrlen64), float32 lat(), float32 lon(), float32 depth(), float64 crs()\n", " groups: \n" ] } ], "source": [ "from netCDF4 import Dataset\n", "\n", "# open the dataset in read mode, we will not be editing it\n", "ds = Dataset(\"python_programming/scripps_pier-2023.nc\", mode='r')\n", "\n", "print(ds)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys(['time', 'temperature', 'conductivity', 'pressure', 'salinity', 'chlorophyll_raw', 'chlorophyll', 'temperature_flagPrimary', 'temperature_flagSecondary', 'conductivity_flagPrimary', 'conductivity_flagSecondary', 'pressure_flagPrimary', 'pressure_flagSecondary', 'salinity_flagPrimary', 'salinity_flagSecondary', 'chlorophyll_flagPrimary', 'chlorophyll_flagSecondary', 'sigmat', 'diagnosticVoltage', 'currentDraw', 'aux1', 'aux3', 'aux4', 'instrument1', 'instrument2', 'platform1', 'station', 'lat', 'lon', 'depth', 'crs'])\n" ] } ], "source": [ "# Print the available variable names with keys():\n", "print(ds.variables.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can extract the tempearture data using it's name `temperature` and the `[:]` operator to get all of the values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "masked_array(data=[15.1105, 15.1084, 15.0969, ..., 16.6199, 16.6152,\n", " 16.6175],\n", " mask=False,\n", " fill_value=np.float64(1e+20),\n", " dtype=float32)" ] }, "execution_count": 145, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract data \n", "temp_nc = ds.variables['temperature'][:] \n", "temp_nc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that this has some extra meta data in it, so lets convert it to a regular NumPy array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([15.1105, 15.1084, 15.0969, ..., 16.6199, 16.6152, 16.6175],\n", " shape=(125305,), dtype=float32)" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp = np.array(temp_nc)\n", "temp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "\n", "Convert all temperatures to Farenheit and assign it to a new variable `temp_farenheit`\n", "- Hint: $ ^\\circ F = (\\frac{9}{5} * ^\\circ C) + 32$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([59.198902, 59.19512 , 59.17442 , ..., 61.91582 , 61.90736 ,\n", " 61.9115 ], shape=(125305,), dtype=float32)" ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# convert all temperature to Farenheit\n", "temp_far = (9/5) * temp + 32\n", "#or \n", "temp_far = c_to_f(temp)\n", "temp_far" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's look at the time variable" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "int64 time(time)\n", " units: minutes since 2023-01-01 00:01:00\n", " calendar: proleptic_gregorian\n", "unlimited dimensions: time\n", "current shape = (125305,)\n", "filling on, default _FillValue of -9223372036854775806 used" ] }, "execution_count": 153, "metadata": {}, "output_type": "execute_result" } ], "source": [ "time_nc = ds.variables['time']\n", "time_nc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The meta data can be useful! Here it tells us that our units of time are minutes since Jan 1, 2023 at 12:01 AM. Let's extract the time data, the same way we did for temperature." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 4, 8, ..., 504299, 504303, 504307],\n", " shape=(125305,))" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create an array\n", "time = np.array(time_nc[:])\n", "time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the time between data points? (How frequently do we get our temperature measurements?)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Temp is measured every 4 minutes\n" ] } ], "source": [ "dt = time[1] - time[0]\n", "\n", "print('Temperature is measured every', dt, 'minutes')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Activity**\n", "\n", "**How many data points are there per day?**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(360.0)" ] }, "execution_count": 156, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_per_hr = 60/dt\n", "data_per_day = data_per_hr * 24\n", "data_per_day" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Print the first hour's worth of data**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "slice indices must be integers or None or have an __index__ method", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mTypeError\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[157]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mtemp\u001b[49m\u001b[43m[\u001b[49m\u001b[43m:\u001b[49m\u001b[43mdata_per_hr\u001b[49m\u001b[43m]\u001b[49m\n", "\u001b[31mTypeError\u001b[39m: slice indices must be integers or None or have an __index__ method" ] } ], "source": [ "temp[:data_per_hr]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whoops! We can't index with a float. We need to convert to an int." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([15.1105, 15.1084, 15.0969, 15.0874, 15.0832, 15.0824, 15.0818,\n", " 15.0799, 15.0732, 15.0711, 15.0799, 15.0741, 15.0745, 15.0684,\n", " 15.095 ], dtype=float32)" ] }, "execution_count": 158, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp[:int(data_per_hr)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise \n", "\n", "Compute the daily mean temperature for each day \n", "\n", "Steps:\n", "1) Define a function that takes in a 1D array of temperatures and returns a list of daily means\n", "2) use a for loop to slice the array into chunks of 360\n", "3) Compute and store the mean for each day" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Starter Code\n", "\n", "def compute_daily_means(temp_array):\n", " daily_means = [] # Start with an empty array\n", "\n", " # Calculate how many full days are in the dataset\n", " num_days = ... # Hint: use len(temp_array) and division /\n", "\n", " # Loop through each day\n", " for i in range(num_days):\n", " # HINT: slice the array to get one day's worth of temperatures\n", " start = i * ...\n", " end = start + ...\n", " day_temps = temp_array[start:end]\n", "\n", " # Compute the daily mean \n", " mean = ...\n", "\n", " # Append the result to daily_means \n", " daily_means = ...\n", "\n", " return daily_means\n", "\n", "# Run your function on the data and print results\n", "daily_avgs = compute_daily_means(temp)\n", "\n", "daily_avgs\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Potential solution:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[np.float32(14.855899),\n", " np.float32(14.873767),\n", " np.float32(14.834493),\n", " np.float32(14.812212),\n", " np.float32(14.873991),\n", " np.float32(14.855767),\n", " np.float32(14.821135),\n", " np.float32(14.878883),\n", " np.float32(14.8914795),\n", " np.float32(14.778408)]" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def compute_daily_means(temp_array):\n", " daily_means = [] # empty array\n", " num_days = len(temp_array) / 360 \n", "\n", " for i in np.arange(num_days):\n", " start = int(i * 360)\n", " end = start + 360\n", " day_temps = temp_array[start:end]\n", " daily_mean = np.mean(day_temps)\n", " daily_means.append(daily_mean)\n", "\n", " return daily_means\n", "\n", "# Run the function\n", "daily_means = compute_daily_means(temp)\n", "daily_means[:10]\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Acknowledgements " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some of the material in this lesson is derived from the Software Carpentry Lessons for Python Programming and Plotting https://swcarpentry.github.io/python-novice-inflammation/reference/ and HDSI at UC San Diego https://datascience.ucsd.edu/" ] } ], "metadata": { "kernelspec": { "display_name": "sio_software", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 2 }