{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gridded Data with Xarray\n",
"\n",
"[Xarray](https://docs.xarray.dev/en/stable/) is a very powerful tool for exploring multi-dimensional (esp. geospatial) data in a way that is efficient and robust to making coding mistakes. Pandas provided us with a way to look at tabular data, Xarray takes this further and provides a framework for N-dimensional data. One of the coolest Xarray features, is the [integration with Dask](https://docs.xarray.dev/en/stable/user-guide/dask.html). This will allow us to easily parallelize our Xarray analysis!\n",
"\n",
"When using Xarray, our data will be stored in `DataArrays` which are collected together into a `DataSet`. A nice example of this is in the context of a climate model:\n",
"\n",
"1) DataSet - contains all possible coordinates on the model grid and provides a list of all model variables (DataArrays)\n",
"\n",
"2) DataArray - an individual model variable (e.g., sea surface temperature), the variable's coordinates on the model grid, and any additional meta data about that specific variable\n",
"\n",
"In the graphic below, you can see that `temperature` and `precipitation` are both variables with coordinates `lat` and `lon`. In this simple example temperature and precip each have 3 dimensions. Not only does the DataSet store all of this information, it also relates these two variables to each other by understanding that they share coordinates `lat` and `lon`. We will use some test data to inspect this further. Any NetCDF can be read into an Xarray dataset. Many popular models facilitate reading raw output directly into Xarray."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import xarray as xr\n",
"\n",
"# load a sample dataset from the xarray library \n",
"ds = xr.tutorial.load_dataset(\"air_temperature\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TIP** If you realize you need to use Pandas for a particular problem, never fear! You can convert between Pandas and Xarray with a single line of code."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
<xarray.Dataset>\n", "Dimensions: (lat: 25, time: 10, lon: 53)\n", "Coordinates:\n", " * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n", " * time (time) datetime64[ns] 2013-01-01 ... 2013-01-03T06:00:00\n", " * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0\n", "Data variables:\n", " air (lat, time, lon) float32 241.2 242.5 243.5 ... 296.9 296.8 297.1
<xarray.Dataset>\n", "Dimensions: (lat: 25, time: 2920, lon: 53)\n", "Coordinates:\n", " * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n", " * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0\n", " * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00\n", "Data variables:\n", " air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7\n", "Attributes:\n", " Conventions: COARDS\n", " title: 4x daily NMC reanalysis (1948)\n", " description: Data is from NMC initialized reanalysis\\n(4x/day). These a...\n", " platform: Model\n", " references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
<xarray.DataArray 'lon' (lon: 53)>\n", "array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,\n", " 225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,\n", " 250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,\n", " 275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,\n", " 300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,\n", " 325. , 327.5, 330. ], dtype=float32)\n", "Coordinates:\n", " * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0\n", "Attributes:\n", " standard_name: longitude\n", " long_name: Longitude\n", " units: degrees_east\n", " axis: X
<xarray.DataArray 'time' (time: 2920)>\n", "array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',\n", " '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',\n", " '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],\n", " dtype='datetime64[ns]')\n", "Coordinates:\n", " * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00\n", "Attributes:\n", " standard_name: time\n", " long_name: Time
<xarray.DataArray 'air' (lat: 25, lon: 53)>\n", "array([[241.2 , 242.5 , 243.5 , ..., 232.79999, 235.5 ,\n", " 238.59999],\n", " [243.79999, 244.5 , 244.7 , ..., 232.79999, 235.29999,\n", " 239.29999],\n", " [250. , 249.79999, 248.89 , ..., 233.2 , 236.39 ,\n", " 241.7 ],\n", " ...,\n", " [296.6 , 296.19998, 296.4 , ..., 295.4 , 295.1 ,\n", " 294.69998],\n", " [295.9 , 296.19998, 296.79 , ..., 295.9 , 295.9 ,\n", " 295.19998],\n", " [296.29 , 296.79 , 297.1 , ..., 296.9 , 296.79 ,\n", " 296.6 ]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n", " * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0\n", " time datetime64[ns] 2013-01-01\n", "Attributes:\n", " long_name: 4xDaily Air temperature at sigma level 995\n", " units: degK\n", " precision: 2\n", " GRIB_id: 11\n", " GRIB_name: TMP\n", " var_desc: Air temperature\n", " dataset: NMC Reanalysis\n", " level_desc: Surface\n", " statistic: Individual Obs\n", " parent_stat: Other\n", " actual_range: [185.16 322.1 ]
<xarray.DataArray 'air' (time: 2920, lat: 25)>\n", "array([[242.5 , 241.09999, 242.2 , ..., 292.79 , 293.79 ,\n", " 295.5 ],\n", " [244.29999, 242.2 , 242.09999, ..., 293. , 294.19998,\n", " 295.79 ],\n", " [246.79999, 242.39 , 243.7 , ..., 292.29 , 293. ,\n", " 295. ],\n", " ...,\n", " [235.98999, 241.89 , 251.29 , ..., 296.29 , 297.69 ,\n", " 298.29 ],\n", " [237.09 , 239.89 , 250.29 , ..., 296.49 , 297.88998,\n", " 298.29 ],\n", " [238.89 , 238.59 , 246.59 , ..., 297.19 , 297.69 ,\n", " 298.09 ]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n", " lon float32 220.0\n", " * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00\n", "Attributes:\n", " long_name: 4xDaily Air temperature at sigma level 995\n", " units: degK\n", " precision: 2\n", " GRIB_id: 11\n", " GRIB_name: TMP\n", " var_desc: Air temperature\n", " dataset: NMC Reanalysis\n", " level_desc: Surface\n", " statistic: Individual Obs\n", " parent_stat: Other\n", " actual_range: [185.16 322.1 ]
<xarray.DataArray 'air' (time: 2920, lat: 17, lon: 17)>\n", "array([[[273.69998, 273.6 , 273.79 , ..., 273. , 275.5 ,\n", " 276. ],\n", " [274.79 , 275.19998, 275.6 , ..., 270.19998, 272.79 ,\n", " 274.9 ],\n", " [275.9 , 276.9 , 276.9 , ..., 271.1 , 271.6 ,\n", " 272.79 ],\n", " ...,\n", " [295.4 , 295.69998, 295.79 , ..., 290.19998, 290. ,\n", " 289.9 ],\n", " [297. , 296.69998, 296.1 , ..., 290.79 , 290.9 ,\n", " 290.69998],\n", " [296.6 , 296.19998, 296.4 , ..., 292. , 292.1 ,\n", " 291.79 ]],\n", "\n", " [[272.1 , 272.69998, 273.19998, ..., 270.19998, 272.79 ,\n", " 273.6 ],\n", " [274. , 274.4 , 275.1 , ..., 267. , 270.29 ,\n", " 272.5 ],\n", " [275.6 , 276.1 , 276.29 , ..., 267.79 , 269.19998,\n", " 270.6 ],\n", "...\n", " [290.88998, 291.49 , 293.19 , ..., 291.69 , 291.38998,\n", " 290.79 ],\n", " [291.59 , 291.69 , 293.59 , ..., 292.79 , 292.59 ,\n", " 292.59 ],\n", " [293.69 , 293.88998, 295.38998, ..., 295.09 , 294.59 ,\n", " 295.09 ]],\n", "\n", " [[272.59 , 271.99 , 272.19 , ..., 274.19 , 275.38998,\n", " 273.88998],\n", " [274.29 , 274.49 , 275.59 , ..., 269.38998, 272.88998,\n", " 274.69 ],\n", " [276.79 , 277.49 , 277.99 , ..., 264.59 , 266.88998,\n", " 269.69 ],\n", " ...,\n", " [291.49 , 291.38998, 292.38998, ..., 291.59 , 291.19 ,\n", " 290.99 ],\n", " [292.88998, 292.09 , 292.99 , ..., 293.49 , 292.88998,\n", " 292.88998],\n", " [293.79 , 293.69 , 295.09 , ..., 295.38998, 294.79 ,\n", " 294.79 ]]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 60.0 57.5 55.0 52.5 50.0 ... 30.0 27.5 25.0 22.5 20.0\n", " * lon (lon) float32 200.0 202.5 205.0 207.5 ... 232.5 235.0 237.5 240.0\n", " * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00\n", "Attributes:\n", " long_name: 4xDaily Air temperature at sigma level 995\n", " units: degK\n", " precision: 2\n", " GRIB_id: 11\n", " GRIB_name: TMP\n", " var_desc: Air temperature\n", " dataset: NMC Reanalysis\n", " level_desc: Surface\n", " statistic: Individual Obs\n", " parent_stat: Other\n", " actual_range: [185.16 322.1 ]
<xarray.DataArray 'UVEL' (time: 366, Z: 1, YC: 84, XG: 241)>\n", "[7409304 values with dtype=float32]\n", "Coordinates: (12/13)\n", " iter (time) int64 ...\n", " * time (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00\n", " * YC (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917\n", " * XG (XG) float64 210.0 210.2 210.3 210.5 ... 249.5 249.7 249.8 250.0\n", " * Z (Z) float64 -1.0\n", " dyG (YC, XG) float32 ...\n", " ... ...\n", " rAw (YC, XG) float32 ...\n", " drF (Z) float32 ...\n", " PHrefC (Z) float32 ...\n", " hFacW (Z, YC, XG) float32 ...\n", " maskW (Z, YC, XG) bool ...\n", " rhoRef (Z) float32 ...\n", "Attributes:\n", " standard_name: UVEL\n", " long_name: Zonal Component of Velocity (m/s)\n", " units: m/s\n", " mate: VVEL
<xarray.DataArray 'VVEL' (time: 366, Z: 1, YG: 85, XC: 240)>\n", "[7466400 values with dtype=float32]\n", "Coordinates: (12/13)\n", " iter (time) int64 ...\n", " * time (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00\n", " * Z (Z) float64 -1.0\n", " drF (Z) float32 ...\n", " PHrefC (Z) float32 ...\n", " rhoRef (Z) float32 ...\n", " ... ...\n", " * YG (YG) float64 -4.0 -3.833 -3.667 -3.5 ... 9.5 9.667 9.833 10.0\n", " dxG (YG, XC) float32 ...\n", " dyC (YG, XC) float32 ...\n", " rAs (YG, XC) float32 ...\n", " hFacS (Z, YG, XC) float32 ...\n", " maskS (Z, YG, XC) bool ...\n", "Attributes:\n", " standard_name: VVEL\n", " long_name: Meridional Component of Velocity (m/s)\n", " units: m/s\n", " mate: UVEL
<xarray.DataArray (time: 366, Z: 22, YC: 84, XC: 240)>\n", "array([[[[ 1.37244683e-06, 1.43047191e-06, 1.46951675e-06, ...,\n", " 6.62662956e-07, 1.10988776e-06, 1.41652595e-06],\n", " [ 5.94653841e-07, 6.68495829e-07, 7.41130521e-07, ...,\n", " 1.22283927e-06, 1.87492367e-06, 2.07832477e-06],\n", " [-3.75415624e-07, -3.15581644e-07, -2.50060850e-07, ...,\n", " 1.70110923e-06, 1.76351773e-06, 1.99446072e-06],\n", " ...,\n", " [ 8.62729678e-07, 7.14634893e-07, 3.88255700e-07, ...,\n", " -3.11771328e-06, -2.89566060e-06, -2.37975678e-06],\n", " [ 8.57911502e-08, -4.61088696e-08, -3.47979068e-07, ...,\n", " -3.19846913e-06, -2.62534672e-06, -2.25218719e-06],\n", " [-7.58210774e-07, -6.46966441e-07, -7.30312934e-07, ...,\n", " -2.86520299e-06, -2.44957505e-06, -2.19216781e-06]],\n", "\n", " [[ 1.34057188e-06, 1.40016880e-06, 1.44013302e-06, ...,\n", " 7.09427923e-07, 1.17261857e-06, 1.48308425e-06],\n", " [ 5.64541381e-07, 6.34911260e-07, 7.06026526e-07, ...,\n", " 1.25195004e-06, 1.92384437e-06, 2.14392071e-06],\n", " [-4.11960059e-07, -3.54990362e-07, -2.90499941e-07, ...,\n", " 1.67858138e-06, 1.79416759e-06, 2.04223534e-06],\n", "...\n", " -8.71693283e-06, -5.95823076e-06, -6.15254976e-06],\n", " [-6.71484213e-06, -4.20082051e-06, -2.26667885e-06, ...,\n", " 3.52229357e-09, 1.22546305e-06, -1.09015275e-06],\n", " [-2.41673092e-06, -2.31488502e-06, -1.91019740e-06, ...,\n", " 9.23616972e-06, 7.55049496e-06, 6.20488800e-06]],\n", "\n", " [[-1.69971406e-06, -1.02415493e-06, -4.37987836e-07, ...,\n", " 7.77255536e-07, 9.61086997e-08, -7.09721562e-08],\n", " [-5.18033232e-07, -3.21178746e-08, 5.64866127e-07, ...,\n", " 2.97049564e-06, 3.35908271e-06, 3.85124167e-06],\n", " [ 2.11237398e-06, 2.65563722e-06, 3.24382063e-06, ...,\n", " 9.82252459e-07, 1.40929524e-06, 8.60606008e-07],\n", " ...,\n", " [-4.70574696e-06, -3.26171971e-06, -2.07992093e-06, ...,\n", " -8.10987694e-06, -5.83119663e-06, -5.26532494e-06],\n", " [-5.91146409e-06, -3.32312925e-06, -2.49081700e-06, ...,\n", " -2.19475260e-06, -7.33080412e-07, -2.52351583e-06],\n", " [-2.57763872e-06, -3.84479017e-06, -2.77097206e-06, ...,\n", " 6.42505483e-06, 4.72095462e-06, 3.74518891e-06]]]],\n", " dtype=float32)\n", "Coordinates:\n", " * time (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00\n", " * YC (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917\n", " * Z (Z) float64 -1.0 -3.0 -5.0 -7.0 -9.0 ... -68.5 -76.0 -85.0 -95.0\n", " * XC (XC) float64 210.1 210.2 210.4 210.6 ... 249.4 249.6 249.8 249.9\n", " rA (YC, XC) float32 3.425e+08 3.425e+08 ... 3.382e+08 3.382e+08\n", " Depth (YC, XC) float32 4.65e+03 4.491e+03 ... 3.44e+03 3.665e+03\n", " dxF (YC, XC) float32 1.849e+04 1.849e+04 ... 1.825e+04 1.825e+04\n", " dyF (YC, XC) float32 1.853e+04 1.853e+04 ... 1.853e+04 1.853e+04
<xarray.DataArray 'THETA' (time: 366, Z: 22, YC: 84, XC: 240)>\n", "dask.array<xarray-THETA, shape=(366, 22, 84, 240), dtype=float32, chunksize=(1, 22, 84, 240), chunktype=numpy.ndarray>\n", "Coordinates: (12/14)\n", " iter (time) int64 dask.array<chunksize=(1,), meta=np.ndarray>\n", " * time (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00\n", " * YC (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917\n", " * Z (Z) float64 -1.0 -3.0 -5.0 -7.0 -9.0 ... -68.5 -76.0 -85.0 -95.0\n", " drF (Z) float32 dask.array<chunksize=(22,), meta=np.ndarray>\n", " PHrefC (Z) float32 dask.array<chunksize=(22,), meta=np.ndarray>\n", " ... ...\n", " rA (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray>\n", " Depth (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray>\n", " hFacC (Z, YC, XC) float32 dask.array<chunksize=(22, 84, 240), meta=np.ndarray>\n", " maskC (Z, YC, XC) bool dask.array<chunksize=(22, 84, 240), meta=np.ndarray>\n", " dxF (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray>\n", " dyF (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray>\n", "Attributes:\n", " standard_name: THETA\n", " long_name: Potential Temperature\n", " units: degC