Gridded Data with Xarray¶
Xarray is a very powerful tool for exploring multi-dimensional (esp. geospatial) data in a way that is efficient and robust to making coding mistakes. Pandas provided us with a way to look at tabular data, Xarray takes this further and provides a framework for N-dimensional data. One of the coolest Xarray features, is the integration with Dask. This will allow us to easily parallelize our Xarray analysis!
When using Xarray, our data will be stored in DataArrays
which are collected together into a DataSet
. A nice example of this is in the context of a climate model:
DataSet - contains all possible coordinates on the model grid and provides a list of all model variables (DataArrays)
DataArray - an individual model variable (e.g., sea surface temperature), the variable’s coordinates on the model grid, and any additional meta data about that specific variable
In the graphic below, you can see that temperature
and precipitation
are both variables with coordinates lat
and lon
. In this simple example temperature and precip each have 3 dimensions. Not only does the DataSet store all of this information, it also relates these two variables to each other by understanding that they share coordinates lat
and lon
. We will use some test data to inspect this further. Any NetCDF can be read into an Xarray dataset. Many popular models
facilitate reading raw output directly into Xarray.
[1]:
import matplotlib.pyplot as plt
import xarray as xr
# load a sample dataset from the xarray library
ds = xr.tutorial.load_dataset("air_temperature")
TIP If you realize you need to use Pandas for a particular problem, never fear! You can convert between Pandas and Xarray with a single line of code.
[2]:
df = ds.isel(time=slice(10)).to_dataframe() # to dataframe from xarray
df.to_xarray() # to xarray from dataframe
[2]:
<xarray.Dataset> Dimensions: (lat: 25, time: 10, lon: 53) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * time (time) datetime64[ns] 2013-01-01 ... 2013-01-03T06:00:00 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 Data variables: air (lat, time, lon) float32 241.2 242.5 243.5 ... 296.9 296.8 297.1
Inspecting our DataSet¶
[3]:
ds
[3]:
<xarray.Dataset> Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
We can see that our variable air
has 3 dimensions (lat, lon, time) defined by coordinates of the same name. We can inspect each of those coordinates to see their ranges.
[4]:
ds.lon
[4]:
<xarray.DataArray 'lon' (lon: 53)> array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5, 225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5, 250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5, 275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5, 300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5, 325. , 327.5, 330. ], dtype=float32) Coordinates: * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 Attributes: standard_name: longitude long_name: Longitude units: degrees_east axis: X
[5]:
ds.time
[5]:
<xarray.DataArray 'time' (time: 2920)> array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000', '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000', '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: standard_name: time long_name: Time
Attributes (attrs
) provide additional meta-data in the form of strings.
[6]:
ds.attrs
[6]:
{'Conventions': 'COARDS',
'title': '4x daily NMC reanalysis (1948)',
'description': 'Data is from NMC initialized reanalysis\n(4x/day). These are the 0.9950 sigma level values.',
'platform': 'Model',
'references': 'http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html'}
[7]:
ds.lon.attrs
[7]:
{'standard_name': 'longitude',
'long_name': 'Longitude',
'units': 'degrees_east',
'axis': 'X'}
Selecting (Slicing) Sub-sets of Data with Labels¶
Xarray is great for doing reproducible science! Instead of having to remember or hard-code the order and size of your data dimensions, you can select subsets of data by their names and values. isel() is used to select by index, and sel() is used to select by dimension label and value.
[8]:
# Select the first day
ds.air.isel(time=0)
[8]:
<xarray.DataArray 'air' (lat: 25, lon: 53)> array([[241.2 , 242.5 , 243.5 , ..., 232.79999, 235.5 , 238.59999], [243.79999, 244.5 , 244.7 , ..., 232.79999, 235.29999, 239.29999], [250. , 249.79999, 248.89 , ..., 233.2 , 236.39 , 241.7 ], ..., [296.6 , 296.19998, 296.4 , ..., 295.4 , 295.1 , 294.69998], [295.9 , 296.19998, 296.79 , ..., 295.9 , 295.9 , 295.19998], [296.29 , 296.79 , 297.1 , ..., 296.9 , 296.79 , 296.6 ]], dtype=float32) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 time datetime64[ns] 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
[9]:
# Extract data at the longitude that is nearest 220.0E
ds.air.sel(lon=220.0,method='nearest')
[9]:
<xarray.DataArray 'air' (time: 2920, lat: 25)> array([[242.5 , 241.09999, 242.2 , ..., 292.79 , 293.79 , 295.5 ], [244.29999, 242.2 , 242.09999, ..., 293. , 294.19998, 295.79 ], [246.79999, 242.39 , 243.7 , ..., 292.29 , 293. , 295. ], ..., [235.98999, 241.89 , 251.29 , ..., 296.29 , 297.69 , 298.29 ], [237.09 , 239.89 , 250.29 , ..., 296.49 , 297.88998, 298.29 ], [238.89 , 238.59 , 246.59 , ..., 297.19 , 297.69 , 298.09 ]], dtype=float32) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 lon float32 220.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
[10]:
# Extract a sub-region of data by selecting slices of latitude and longitude
ds.air.sel(lat=slice(60, 20), lon=slice(200, 240)) # keep in mind that latitude is decreasing -- the slice is from 60 to 20
[10]:
<xarray.DataArray 'air' (time: 2920, lat: 17, lon: 17)> array([[[273.69998, 273.6 , 273.79 , ..., 273. , 275.5 , 276. ], [274.79 , 275.19998, 275.6 , ..., 270.19998, 272.79 , 274.9 ], [275.9 , 276.9 , 276.9 , ..., 271.1 , 271.6 , 272.79 ], ..., [295.4 , 295.69998, 295.79 , ..., 290.19998, 290. , 289.9 ], [297. , 296.69998, 296.1 , ..., 290.79 , 290.9 , 290.69998], [296.6 , 296.19998, 296.4 , ..., 292. , 292.1 , 291.79 ]], [[272.1 , 272.69998, 273.19998, ..., 270.19998, 272.79 , 273.6 ], [274. , 274.4 , 275.1 , ..., 267. , 270.29 , 272.5 ], [275.6 , 276.1 , 276.29 , ..., 267.79 , 269.19998, 270.6 ], ... [290.88998, 291.49 , 293.19 , ..., 291.69 , 291.38998, 290.79 ], [291.59 , 291.69 , 293.59 , ..., 292.79 , 292.59 , 292.59 ], [293.69 , 293.88998, 295.38998, ..., 295.09 , 294.59 , 295.09 ]], [[272.59 , 271.99 , 272.19 , ..., 274.19 , 275.38998, 273.88998], [274.29 , 274.49 , 275.59 , ..., 269.38998, 272.88998, 274.69 ], [276.79 , 277.49 , 277.99 , ..., 264.59 , 266.88998, 269.69 ], ..., [291.49 , 291.38998, 292.38998, ..., 291.59 , 291.19 , 290.99 ], [292.88998, 292.09 , 292.99 , ..., 293.49 , 292.88998, 292.88998], [293.79 , 293.69 , 295.09 , ..., 295.38998, 294.79 , 294.79 ]]], dtype=float32) Coordinates: * lat (lat) float32 60.0 57.5 55.0 52.5 50.0 ... 30.0 27.5 25.0 22.5 20.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 232.5 235.0 237.5 240.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
Interpolating (or Resampling) Data¶
[11]:
# interpolate the air temperature to a new longitude (or new set of longitudes)
Plotting in Xarray (so easy!)¶
[12]:
ds.air.isel(time=0).plot()
[12]:
<matplotlib.collections.QuadMesh at 0x177c91dc0>

[13]:
import cmocean.cm as cmo
# Let's use a non-default colormap
ds.air.isel(time=0).plot(cmap=cmo.thermal)
[13]:
<matplotlib.collections.QuadMesh at 0x177c52870>

[14]:
ds.air.sel(lon=220.0,method='nearest').plot(y='lat',x='time',cmap=cmo.thermal)
[14]:
<matplotlib.collections.QuadMesh at 0x318691dc0>

Statistics in Xarray¶
[15]:
ds.air.sel(lon=220.0,method='nearest').mean(dim='time').plot()
[15]:
[<matplotlib.lines.Line2D at 0x319317230>]

[16]:
del ds
A More Exciting Xarray Example¶
Let’s look at a more interesting and realistic example:
[17]:
import xarray as xr
ds = xr.open_dataset('TPOSE6_Daily_2012_surface.nc')
What about something harder? Gradients? Integrals? … Enter xgcm
.
Calculate Divergence with xgcm¶
xgcm is a Python package that is built around Xarray. With it you create a Grid
object which can then be used to perform more complicated mathematical operations, while remaining aware of complex grid geometries. The Grid
is aware of the different coordinates and their relationships to each other in space. This can be really useful for model output or observations where not all variables are on the same points in space.
[18]:
import xgcm
# create the grid object from our dataset
grid = xgcm.Grid(ds, periodic=['X','Y'])
grid
[18]:
<xgcm.Grid>
Y Axis (periodic, boundary=None):
* center YC --> outer
* outer YG --> center
X Axis (periodic, boundary=None):
* center XC --> outer
* outer XG --> center
Z Axis (not periodic, boundary=None):
* center Z
T Axis (not periodic, boundary=None):
* center time
[19]:
# inspect some of our variables
ds.UVEL # zonal velocity
[19]:
<xarray.DataArray 'UVEL' (time: 366, Z: 1, YC: 84, XG: 241)> [7409304 values with dtype=float32] Coordinates: (12/13) iter (time) int64 ... * time (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00 * YC (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917 * XG (XG) float64 210.0 210.2 210.3 210.5 ... 249.5 249.7 249.8 250.0 * Z (Z) float64 -1.0 dyG (YC, XG) float32 ... ... ... rAw (YC, XG) float32 ... drF (Z) float32 ... PHrefC (Z) float32 ... hFacW (Z, YC, XG) float32 ... maskW (Z, YC, XG) bool ... rhoRef (Z) float32 ... Attributes: standard_name: UVEL long_name: Zonal Component of Velocity (m/s) units: m/s mate: VVEL
[20]:
ds.VVEL # meridional velocity
[20]:
<xarray.DataArray 'VVEL' (time: 366, Z: 1, YG: 85, XC: 240)> [7466400 values with dtype=float32] Coordinates: (12/13) iter (time) int64 ... * time (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00 * Z (Z) float64 -1.0 drF (Z) float32 ... PHrefC (Z) float32 ... rhoRef (Z) float32 ... ... ... * YG (YG) float64 -4.0 -3.833 -3.667 -3.5 ... 9.5 9.667 9.833 10.0 dxG (YG, XC) float32 ... dyC (YG, XC) float32 ... rAs (YG, XC) float32 ... hFacS (Z, YG, XC) float32 ... maskS (Z, YG, XC) bool ... Attributes: standard_name: VVEL long_name: Meridional Component of Velocity (m/s) units: m/s mate: UVEL
We can see that zonal and meridional velocity are computed on different grid points (XG, YC) v (XC, YG). This is not uncommon, but it means we can’t just add/subtract/multiply the velocities because they are not located at the same points in space. How would we handle this without xgcm?
[21]:
# interpolate zonal velocity to XC
We would have to do this for meridional velocity too.
[22]:
# interpolate meridional velocity to YC
We will have to decide how to handle the boundaries and implement that as well, otherwise these new variables will have different sizes… if we switch to a model or data format where the names of coordinates are slightly different, we will have to rewrite all of our code. Instead we can use our Grid object. If, in the future, the naming/structure of the grid changes then we will update the grid to reflect the correct relationships.
[23]:
# interpolate U along the X axis: this will interpolate XG to the other X cordinate XC
U = grid.interp(ds.UVEL,'X')
# interpolate V along the Y axis: this will interpolate YG to the other Y cordinate YC
V = grid.interp(ds.VVEL,'Y')
display(U.dims)
display(V.dims)
('time', 'Z', 'YC', 'XC')
('time', 'Z', 'YC', 'XC')
Even better, the Grid object will automatically handle the interpolation for us. Further reducing the complexity of the code and the opportunities for making mistakes.
[24]:
# Let's say we don't do the interpolations, we compute the quantities of interest directly on their original coordinates.
u_transport = ds.UVEL * ds.dyG * ds.hFacW * ds.drF
v_transport = ds.VVEL * ds.dxG * ds.hFacS * ds.drF
display(u_transport.dims)
display(v_transport.dims)
# The Grid object will automatically handle the coordinate transformations for us!
div_uv = (grid.diff(u_transport, 'X') + grid.diff(v_transport, 'Y')) / ds.rA # calculate the divergence of the flow
display(div_uv.dims)
('time', 'Z', 'YC', 'XG')
('time', 'Z', 'YG', 'XC')
('time', 'Z', 'YC', 'XC')
[25]:
import cmocean.cm as cmo
# plot divergence at the surface
div_uv.isel(time=0,Z=0).plot(cmap=cmo.balance, robust=True,
cbar_kwargs={'label': 'Divergence (1/s)'})
[25]:
<matplotlib.collections.QuadMesh at 0x32c25f200>

The cell above shows that we can take the divergence of the flow, while accounting for the grid geometry, in only 3 lines of code! This framework is robust to making mistakes due to coding errors or misunderstanding of the grid itself. For example:
[26]:
# multiply ds.VVEL by ds.dxC instead of ds.dxG and see what happens
v_transport = ds.VVEL * ds.dxC * ds.hFacS * ds.drF
print(v_transport)
Xarray will not blindly combine variables of the same size when it knows they are not on the same points in space. The result here is a nonsensical thing with two X dimensions, and you will fail to analyze/plot the results because of this, forcing you to see your error. That being said, nothing is fool-proof. When we use Xarray and xgcm we cannot prevent mistakes entirely. Instead we aim to spend more time on errors such as “our question is not formulated properly” rather than “the third dimension of my data isn’t the size I expected it to be, which of my indices was wrong?”.
EOF decomposition (a.k.a. PCA) with xeofs¶
Xarray automatically sped up with Dask¶
Let’s load a larger dataset (~2.5GB). Here is where Xarray really shines! The data can be easily chunked
into Dask arrays. If we spin up a Dask client with the 3 lines of code we learned earlier, our Xarray data analysis will be parallelized over the chunks. We can see the size of the chunks through inspection of the DataSet.
[1]:
import xarray as xr
import xgcm
ds = xr.open_dataset('TPOSE6_Daily_2012.nc')
grid = xgcm.Grid(ds, periodic=['X','Y'])
grid
[1]:
<xgcm.Grid>
Y Axis (periodic, boundary=None):
* center YC --> outer
* outer YG --> center
Z Axis (not periodic, boundary=None):
* center Z
T Axis (not periodic, boundary=None):
* center time
X Axis (periodic, boundary=None):
* center XC --> outer
* outer XG --> center
[2]:
%%time
# calculate the divergence of the flow at all depths
u_transport = ds.UVEL * ds.dyG * ds.hFacW * ds.drF
v_transport = ds.VVEL * ds.dxG * ds.hFacS * ds.drF
div_uv = (grid.diff(u_transport, 'X') + grid.diff(v_transport, 'Y')) / ds.rA # calculate the divergence of the flow
div_uv.compute()
CPU times: user 321 ms, sys: 585 ms, total: 907 ms
Wall time: 926 ms
[2]:
<xarray.DataArray (time: 366, Z: 22, YC: 84, XC: 240)> array([[[[ 1.37244683e-06, 1.43047191e-06, 1.46951675e-06, ..., 6.62662956e-07, 1.10988776e-06, 1.41652595e-06], [ 5.94653841e-07, 6.68495829e-07, 7.41130521e-07, ..., 1.22283927e-06, 1.87492367e-06, 2.07832477e-06], [-3.75415624e-07, -3.15581644e-07, -2.50060850e-07, ..., 1.70110923e-06, 1.76351773e-06, 1.99446072e-06], ..., [ 8.62729678e-07, 7.14634893e-07, 3.88255700e-07, ..., -3.11771328e-06, -2.89566060e-06, -2.37975678e-06], [ 8.57911502e-08, -4.61088696e-08, -3.47979068e-07, ..., -3.19846913e-06, -2.62534672e-06, -2.25218719e-06], [-7.58210774e-07, -6.46966441e-07, -7.30312934e-07, ..., -2.86520299e-06, -2.44957505e-06, -2.19216781e-06]], [[ 1.34057188e-06, 1.40016880e-06, 1.44013302e-06, ..., 7.09427923e-07, 1.17261857e-06, 1.48308425e-06], [ 5.64541381e-07, 6.34911260e-07, 7.06026526e-07, ..., 1.25195004e-06, 1.92384437e-06, 2.14392071e-06], [-4.11960059e-07, -3.54990362e-07, -2.90499941e-07, ..., 1.67858138e-06, 1.79416759e-06, 2.04223534e-06], ... -8.71693283e-06, -5.95823076e-06, -6.15254976e-06], [-6.71484213e-06, -4.20082051e-06, -2.26667885e-06, ..., 3.52229357e-09, 1.22546305e-06, -1.09015275e-06], [-2.41673092e-06, -2.31488502e-06, -1.91019740e-06, ..., 9.23616972e-06, 7.55049496e-06, 6.20488800e-06]], [[-1.69971406e-06, -1.02415493e-06, -4.37987836e-07, ..., 7.77255536e-07, 9.61086997e-08, -7.09721562e-08], [-5.18033232e-07, -3.21178746e-08, 5.64866127e-07, ..., 2.97049564e-06, 3.35908271e-06, 3.85124167e-06], [ 2.11237398e-06, 2.65563722e-06, 3.24382063e-06, ..., 9.82252459e-07, 1.40929524e-06, 8.60606008e-07], ..., [-4.70574696e-06, -3.26171971e-06, -2.07992093e-06, ..., -8.10987694e-06, -5.83119663e-06, -5.26532494e-06], [-5.91146409e-06, -3.32312925e-06, -2.49081700e-06, ..., -2.19475260e-06, -7.33080412e-07, -2.52351583e-06], [-2.57763872e-06, -3.84479017e-06, -2.77097206e-06, ..., 6.42505483e-06, 4.72095462e-06, 3.74518891e-06]]]], dtype=float32) Coordinates: * time (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00 * YC (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917 * Z (Z) float64 -1.0 -3.0 -5.0 -7.0 -9.0 ... -68.5 -76.0 -85.0 -95.0 * XC (XC) float64 210.1 210.2 210.4 210.6 ... 249.4 249.6 249.8 249.9 rA (YC, XC) float32 3.425e+08 3.425e+08 ... 3.382e+08 3.382e+08 Depth (YC, XC) float32 4.65e+03 4.491e+03 ... 3.44e+03 3.665e+03 dxF (YC, XC) float32 1.849e+04 1.849e+04 ... 1.825e+04 1.825e+04 dyF (YC, XC) float32 1.853e+04 1.853e+04 ... 1.853e+04 1.853e+04
[3]:
del ds
[ ]:
# spin up our Dask client
from dask.distributed import LocalCluster, Client
cluster = LocalCluster()
client = Client(cluster)
client
[4]:
ds = xr.open_dataset('TPOSE6_Daily_2012.nc').chunk({'time':1})
grid = xgcm.Grid(ds, periodic=['X','Y'])
grid
[4]:
<xgcm.Grid>
Y Axis (periodic, boundary=None):
* center YC --> outer
* outer YG --> center
Z Axis (not periodic, boundary=None):
* center Z
T Axis (not periodic, boundary=None):
* center time
X Axis (periodic, boundary=None):
* center XC --> outer
* outer XG --> center
[5]:
ds.THETA
[5]:
<xarray.DataArray 'THETA' (time: 366, Z: 22, YC: 84, XC: 240)> dask.array<xarray-THETA, shape=(366, 22, 84, 240), dtype=float32, chunksize=(1, 22, 84, 240), chunktype=numpy.ndarray> Coordinates: (12/14) iter (time) int64 dask.array<chunksize=(1,), meta=np.ndarray> * time (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00 * YC (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917 * Z (Z) float64 -1.0 -3.0 -5.0 -7.0 -9.0 ... -68.5 -76.0 -85.0 -95.0 drF (Z) float32 dask.array<chunksize=(22,), meta=np.ndarray> PHrefC (Z) float32 dask.array<chunksize=(22,), meta=np.ndarray> ... ... rA (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray> Depth (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray> hFacC (Z, YC, XC) float32 dask.array<chunksize=(22, 84, 240), meta=np.ndarray> maskC (Z, YC, XC) bool dask.array<chunksize=(22, 84, 240), meta=np.ndarray> dxF (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray> dyF (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray> Attributes: standard_name: THETA long_name: Potential Temperature units: degC
[ ]:
%%time
# calculate the divergence of the flow at all depths
u_transport = ds.UVEL * ds.dyG * ds.hFacW * ds.drF
v_transport = ds.VVEL * ds.dxG * ds.hFacS * ds.drF
div_uv = (grid.diff(u_transport, 'X') + grid.diff(v_transport, 'Y')) / ds.rA # calculate the divergence of the flow
div_uv.compute()
[ ]:
client.close()
See this page for a more detailed discussion of best practices with Xarray and Dask.
NOTE Take a look at all of the cool Xarray related projects! Most of them are for geoscience applications. There is also an extensive set of Xarray tutorials.