Gridded Data with Xarray

Xarray is a very powerful tool for exploring multi-dimensional (esp. geospatial) data in a way that is efficient and robust to making coding mistakes. Pandas provided us with a way to look at tabular data, Xarray takes this further and provides a framework for N-dimensional data. One of the coolest Xarray features, is the integration with Dask. This will allow us to easily parallelize our Xarray analysis!

When using Xarray, our data will be stored in DataArrays which are collected together into a DataSet. A nice example of this is in the context of a climate model:

  1. DataSet - contains all possible coordinates on the model grid and provides a list of all model variables (DataArrays)

  2. DataArray - an individual model variable (e.g., sea surface temperature), the variable’s coordinates on the model grid, and any additional meta data about that specific variable

In the graphic below, you can see that temperature and precipitation are both variables with coordinates lat and lon. In this simple example temperature and precip each have 3 dimensions. Not only does the DataSet store all of this information, it also relates these two variables to each other by understanding that they share coordinates lat and lon. We will use some test data to inspect this further. Any NetCDF can be read into an Xarray dataset. Many popular models facilitate reading raw output directly into Xarray.

38447245c3224dccaa665182c279df8f

[1]:
import matplotlib.pyplot as plt
import xarray as xr

# load a sample dataset from the xarray library
ds = xr.tutorial.load_dataset("air_temperature")

TIP If you realize you need to use Pandas for a particular problem, never fear! You can convert between Pandas and Xarray with a single line of code.

[2]:
df = ds.isel(time=slice(10)).to_dataframe() # to dataframe from xarray
df.to_xarray() # to xarray from dataframe
[2]:
<xarray.Dataset>
Dimensions:  (lat: 25, time: 10, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2013-01-03T06:00:00
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Data variables:
    air      (lat, time, lon) float32 241.2 242.5 243.5 ... 296.9 296.8 297.1

Inspecting our DataSet

[3]:
ds
[3]:
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

We can see that our variable air has 3 dimensions (lat, lon, time) defined by coordinates of the same name. We can inspect each of those coordinates to see their ranges.

[4]:
ds.lon
[4]:
<xarray.DataArray 'lon' (lon: 53)>
array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
       225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
       250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
       275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
       300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
       325. , 327.5, 330. ], dtype=float32)
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X
[5]:
ds.time
[5]:
<xarray.DataArray 'time' (time: 2920)>
array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
       '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
       '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    standard_name:  time
    long_name:      Time

Attributes (attrs) provide additional meta-data in the form of strings.

[6]:
ds.attrs
[6]:
{'Conventions': 'COARDS',
 'title': '4x daily NMC reanalysis (1948)',
 'description': 'Data is from NMC initialized reanalysis\n(4x/day).  These are the 0.9950 sigma level values.',
 'platform': 'Model',
 'references': 'http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html'}
[7]:
ds.lon.attrs
[7]:
{'standard_name': 'longitude',
 'long_name': 'Longitude',
 'units': 'degrees_east',
 'axis': 'X'}

Selecting (Slicing) Sub-sets of Data with Labels

Xarray is great for doing reproducible science! Instead of having to remember or hard-code the order and size of your data dimensions, you can select subsets of data by their names and values. isel() is used to select by index, and sel() is used to select by dimension label and value.

[8]:
# Select the first day
ds.air.isel(time=0)
[8]:
<xarray.DataArray 'air' (lat: 25, lon: 53)>
array([[241.2    , 242.5    , 243.5    , ..., 232.79999, 235.5    ,
        238.59999],
       [243.79999, 244.5    , 244.7    , ..., 232.79999, 235.29999,
        239.29999],
       [250.     , 249.79999, 248.89   , ..., 233.2    , 236.39   ,
        241.7    ],
       ...,
       [296.6    , 296.19998, 296.4    , ..., 295.4    , 295.1    ,
        294.69998],
       [295.9    , 296.19998, 296.79   , ..., 295.9    , 295.9    ,
        295.19998],
       [296.29   , 296.79   , 297.1    , ..., 296.9    , 296.79   ,
        296.6    ]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
    time     datetime64[ns] 2013-01-01
Attributes:
    long_name:     4xDaily Air temperature at sigma level 995
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature
    dataset:       NMC Reanalysis
    level_desc:    Surface
    statistic:     Individual Obs
    parent_stat:   Other
    actual_range:  [185.16 322.1 ]
[9]:
# Extract data at the longitude that is nearest 220.0E
ds.air.sel(lon=220.0,method='nearest')
[9]:
<xarray.DataArray 'air' (time: 2920, lat: 25)>
array([[242.5    , 241.09999, 242.2    , ..., 292.79   , 293.79   ,
        295.5    ],
       [244.29999, 242.2    , 242.09999, ..., 293.     , 294.19998,
        295.79   ],
       [246.79999, 242.39   , 243.7    , ..., 292.29   , 293.     ,
        295.     ],
       ...,
       [235.98999, 241.89   , 251.29   , ..., 296.29   , 297.69   ,
        298.29   ],
       [237.09   , 239.89   , 250.29   , ..., 296.49   , 297.88998,
        298.29   ],
       [238.89   , 238.59   , 246.59   , ..., 297.19   , 297.69   ,
        298.09   ]], dtype=float32)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
    lon      float32 220.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:     4xDaily Air temperature at sigma level 995
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature
    dataset:       NMC Reanalysis
    level_desc:    Surface
    statistic:     Individual Obs
    parent_stat:   Other
    actual_range:  [185.16 322.1 ]
[10]:
# Extract a sub-region of data by selecting slices of latitude and longitude
ds.air.sel(lat=slice(60, 20), lon=slice(200, 240)) # keep in mind that latitude is decreasing -- the slice is from 60 to 20
[10]:
<xarray.DataArray 'air' (time: 2920, lat: 17, lon: 17)>
array([[[273.69998, 273.6    , 273.79   , ..., 273.     , 275.5    ,
         276.     ],
        [274.79   , 275.19998, 275.6    , ..., 270.19998, 272.79   ,
         274.9    ],
        [275.9    , 276.9    , 276.9    , ..., 271.1    , 271.6    ,
         272.79   ],
        ...,
        [295.4    , 295.69998, 295.79   , ..., 290.19998, 290.     ,
         289.9    ],
        [297.     , 296.69998, 296.1    , ..., 290.79   , 290.9    ,
         290.69998],
        [296.6    , 296.19998, 296.4    , ..., 292.     , 292.1    ,
         291.79   ]],

       [[272.1    , 272.69998, 273.19998, ..., 270.19998, 272.79   ,
         273.6    ],
        [274.     , 274.4    , 275.1    , ..., 267.     , 270.29   ,
         272.5    ],
        [275.6    , 276.1    , 276.29   , ..., 267.79   , 269.19998,
         270.6    ],
...
        [290.88998, 291.49   , 293.19   , ..., 291.69   , 291.38998,
         290.79   ],
        [291.59   , 291.69   , 293.59   , ..., 292.79   , 292.59   ,
         292.59   ],
        [293.69   , 293.88998, 295.38998, ..., 295.09   , 294.59   ,
         295.09   ]],

       [[272.59   , 271.99   , 272.19   , ..., 274.19   , 275.38998,
         273.88998],
        [274.29   , 274.49   , 275.59   , ..., 269.38998, 272.88998,
         274.69   ],
        [276.79   , 277.49   , 277.99   , ..., 264.59   , 266.88998,
         269.69   ],
        ...,
        [291.49   , 291.38998, 292.38998, ..., 291.59   , 291.19   ,
         290.99   ],
        [292.88998, 292.09   , 292.99   , ..., 293.49   , 292.88998,
         292.88998],
        [293.79   , 293.69   , 295.09   , ..., 295.38998, 294.79   ,
         294.79   ]]], dtype=float32)
Coordinates:
  * lat      (lat) float32 60.0 57.5 55.0 52.5 50.0 ... 30.0 27.5 25.0 22.5 20.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 232.5 235.0 237.5 240.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:     4xDaily Air temperature at sigma level 995
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature
    dataset:       NMC Reanalysis
    level_desc:    Surface
    statistic:     Individual Obs
    parent_stat:   Other
    actual_range:  [185.16 322.1 ]

Interpolating (or Resampling) Data

[11]:
# interpolate the air temperature to a new longitude (or new set of longitudes)

Plotting in Xarray (so easy!)

[12]:
ds.air.isel(time=0).plot()
[12]:
<matplotlib.collections.QuadMesh at 0x177c91dc0>
../../_images/in_workshop_python_earth_science_xarray_20_1.png
[13]:
import cmocean.cm as cmo

# Let's use a non-default colormap
ds.air.isel(time=0).plot(cmap=cmo.thermal)
[13]:
<matplotlib.collections.QuadMesh at 0x177c52870>
../../_images/in_workshop_python_earth_science_xarray_21_1.png
[14]:
ds.air.sel(lon=220.0,method='nearest').plot(y='lat',x='time',cmap=cmo.thermal)
[14]:
<matplotlib.collections.QuadMesh at 0x318691dc0>
../../_images/in_workshop_python_earth_science_xarray_22_1.png

Statistics in Xarray

[15]:
ds.air.sel(lon=220.0,method='nearest').mean(dim='time').plot()
[15]:
[<matplotlib.lines.Line2D at 0x319317230>]
../../_images/in_workshop_python_earth_science_xarray_24_1.png
[16]:
del ds

A More Exciting Xarray Example

Let’s look at a more interesting and realistic example:

[17]:
import xarray as xr
ds = xr.open_dataset('TPOSE6_Daily_2012_surface.nc')

What about something harder? Gradients? Integrals? … Enter xgcm.

Calculate Divergence with xgcm

xgcm is a Python package that is built around Xarray. With it you create a Grid object which can then be used to perform more complicated mathematical operations, while remaining aware of complex grid geometries. The Grid is aware of the different coordinates and their relationships to each other in space. This can be really useful for model output or observations where not all variables are on the same points in space.

[18]:
import xgcm

# create the grid object from our dataset
grid = xgcm.Grid(ds, periodic=['X','Y'])
grid
[18]:
<xgcm.Grid>
Y Axis (periodic, boundary=None):
  * center   YC --> outer
  * outer    YG --> center
X Axis (periodic, boundary=None):
  * center   XC --> outer
  * outer    XG --> center
Z Axis (not periodic, boundary=None):
  * center   Z
T Axis (not periodic, boundary=None):
  * center   time
[19]:
# inspect some of our variables
ds.UVEL # zonal velocity
[19]:
<xarray.DataArray 'UVEL' (time: 366, Z: 1, YC: 84, XG: 241)>
[7409304 values with dtype=float32]
Coordinates: (12/13)
    iter     (time) int64 ...
  * time     (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00
  * YC       (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917
  * XG       (XG) float64 210.0 210.2 210.3 210.5 ... 249.5 249.7 249.8 250.0
  * Z        (Z) float64 -1.0
    dyG      (YC, XG) float32 ...
    ...       ...
    rAw      (YC, XG) float32 ...
    drF      (Z) float32 ...
    PHrefC   (Z) float32 ...
    hFacW    (Z, YC, XG) float32 ...
    maskW    (Z, YC, XG) bool ...
    rhoRef   (Z) float32 ...
Attributes:
    standard_name:  UVEL
    long_name:      Zonal Component of Velocity (m/s)
    units:          m/s
    mate:           VVEL
[20]:
ds.VVEL # meridional velocity
[20]:
<xarray.DataArray 'VVEL' (time: 366, Z: 1, YG: 85, XC: 240)>
[7466400 values with dtype=float32]
Coordinates: (12/13)
    iter     (time) int64 ...
  * time     (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00
  * Z        (Z) float64 -1.0
    drF      (Z) float32 ...
    PHrefC   (Z) float32 ...
    rhoRef   (Z) float32 ...
    ...       ...
  * YG       (YG) float64 -4.0 -3.833 -3.667 -3.5 ... 9.5 9.667 9.833 10.0
    dxG      (YG, XC) float32 ...
    dyC      (YG, XC) float32 ...
    rAs      (YG, XC) float32 ...
    hFacS    (Z, YG, XC) float32 ...
    maskS    (Z, YG, XC) bool ...
Attributes:
    standard_name:  VVEL
    long_name:      Meridional Component of Velocity (m/s)
    units:          m/s
    mate:           UVEL

We can see that zonal and meridional velocity are computed on different grid points (XG, YC) v (XC, YG). This is not uncommon, but it means we can’t just add/subtract/multiply the velocities because they are not located at the same points in space. How would we handle this without xgcm?

[21]:
# interpolate zonal velocity to XC

We would have to do this for meridional velocity too.

[22]:
# interpolate meridional velocity to YC

We will have to decide how to handle the boundaries and implement that as well, otherwise these new variables will have different sizes… if we switch to a model or data format where the names of coordinates are slightly different, we will have to rewrite all of our code. Instead we can use our Grid object. If, in the future, the naming/structure of the grid changes then we will update the grid to reflect the correct relationships.

[23]:
# interpolate U along the X axis: this will interpolate XG to the other X cordinate XC
U = grid.interp(ds.UVEL,'X')
# interpolate V along the Y axis: this will interpolate YG to the other Y cordinate YC
V = grid.interp(ds.VVEL,'Y')

display(U.dims)
display(V.dims)
('time', 'Z', 'YC', 'XC')
('time', 'Z', 'YC', 'XC')

Even better, the Grid object will automatically handle the interpolation for us. Further reducing the complexity of the code and the opportunities for making mistakes.

[24]:
# Let's say we don't do the interpolations, we compute the quantities of interest directly on their original coordinates.
u_transport = ds.UVEL * ds.dyG * ds.hFacW * ds.drF
v_transport = ds.VVEL * ds.dxG * ds.hFacS * ds.drF
display(u_transport.dims)
display(v_transport.dims)

# The Grid object will automatically handle the coordinate transformations for us!
div_uv = (grid.diff(u_transport, 'X') + grid.diff(v_transport, 'Y')) / ds.rA  # calculate the divergence of the flow
display(div_uv.dims)
('time', 'Z', 'YC', 'XG')
('time', 'Z', 'YG', 'XC')
('time', 'Z', 'YC', 'XC')
[25]:
import cmocean.cm as cmo

# plot divergence at the surface
div_uv.isel(time=0,Z=0).plot(cmap=cmo.balance, robust=True,
                             cbar_kwargs={'label': 'Divergence (1/s)'})
[25]:
<matplotlib.collections.QuadMesh at 0x32c25f200>
../../_images/in_workshop_python_earth_science_xarray_41_1.png

The cell above shows that we can take the divergence of the flow, while accounting for the grid geometry, in only 3 lines of code! This framework is robust to making mistakes due to coding errors or misunderstanding of the grid itself. For example:

[26]:
# multiply ds.VVEL by ds.dxC instead of ds.dxG and see what happens
v_transport = ds.VVEL * ds.dxC * ds.hFacS * ds.drF
print(v_transport)

Xarray will not blindly combine variables of the same size when it knows they are not on the same points in space. The result here is a nonsensical thing with two X dimensions, and you will fail to analyze/plot the results because of this, forcing you to see your error. That being said, nothing is fool-proof. When we use Xarray and xgcm we cannot prevent mistakes entirely. Instead we aim to spend more time on errors such as “our question is not formulated properly” rather than “the third dimension of my data isn’t the size I expected it to be, which of my indices was wrong?”.

EOF decomposition (a.k.a. PCA) with xeofs

Xarray automatically sped up with Dask

Let’s load a larger dataset (~2.5GB). Here is where Xarray really shines! The data can be easily chunked into Dask arrays. If we spin up a Dask client with the 3 lines of code we learned earlier, our Xarray data analysis will be parallelized over the chunks. We can see the size of the chunks through inspection of the DataSet.

[1]:
import xarray as xr
import xgcm

ds = xr.open_dataset('TPOSE6_Daily_2012.nc')
grid = xgcm.Grid(ds, periodic=['X','Y'])
grid
[1]:
<xgcm.Grid>
Y Axis (periodic, boundary=None):
  * center   YC --> outer
  * outer    YG --> center
Z Axis (not periodic, boundary=None):
  * center   Z
T Axis (not periodic, boundary=None):
  * center   time
X Axis (periodic, boundary=None):
  * center   XC --> outer
  * outer    XG --> center
[2]:
%%time

# calculate the divergence of the flow at all depths
u_transport = ds.UVEL * ds.dyG * ds.hFacW * ds.drF
v_transport = ds.VVEL * ds.dxG * ds.hFacS * ds.drF

div_uv = (grid.diff(u_transport, 'X') + grid.diff(v_transport, 'Y')) / ds.rA  # calculate the divergence of the flow
div_uv.compute()
CPU times: user 321 ms, sys: 585 ms, total: 907 ms
Wall time: 926 ms
[2]:
<xarray.DataArray (time: 366, Z: 22, YC: 84, XC: 240)>
array([[[[ 1.37244683e-06,  1.43047191e-06,  1.46951675e-06, ...,
           6.62662956e-07,  1.10988776e-06,  1.41652595e-06],
         [ 5.94653841e-07,  6.68495829e-07,  7.41130521e-07, ...,
           1.22283927e-06,  1.87492367e-06,  2.07832477e-06],
         [-3.75415624e-07, -3.15581644e-07, -2.50060850e-07, ...,
           1.70110923e-06,  1.76351773e-06,  1.99446072e-06],
         ...,
         [ 8.62729678e-07,  7.14634893e-07,  3.88255700e-07, ...,
          -3.11771328e-06, -2.89566060e-06, -2.37975678e-06],
         [ 8.57911502e-08, -4.61088696e-08, -3.47979068e-07, ...,
          -3.19846913e-06, -2.62534672e-06, -2.25218719e-06],
         [-7.58210774e-07, -6.46966441e-07, -7.30312934e-07, ...,
          -2.86520299e-06, -2.44957505e-06, -2.19216781e-06]],

        [[ 1.34057188e-06,  1.40016880e-06,  1.44013302e-06, ...,
           7.09427923e-07,  1.17261857e-06,  1.48308425e-06],
         [ 5.64541381e-07,  6.34911260e-07,  7.06026526e-07, ...,
           1.25195004e-06,  1.92384437e-06,  2.14392071e-06],
         [-4.11960059e-07, -3.54990362e-07, -2.90499941e-07, ...,
           1.67858138e-06,  1.79416759e-06,  2.04223534e-06],
...
          -8.71693283e-06, -5.95823076e-06, -6.15254976e-06],
         [-6.71484213e-06, -4.20082051e-06, -2.26667885e-06, ...,
           3.52229357e-09,  1.22546305e-06, -1.09015275e-06],
         [-2.41673092e-06, -2.31488502e-06, -1.91019740e-06, ...,
           9.23616972e-06,  7.55049496e-06,  6.20488800e-06]],

        [[-1.69971406e-06, -1.02415493e-06, -4.37987836e-07, ...,
           7.77255536e-07,  9.61086997e-08, -7.09721562e-08],
         [-5.18033232e-07, -3.21178746e-08,  5.64866127e-07, ...,
           2.97049564e-06,  3.35908271e-06,  3.85124167e-06],
         [ 2.11237398e-06,  2.65563722e-06,  3.24382063e-06, ...,
           9.82252459e-07,  1.40929524e-06,  8.60606008e-07],
         ...,
         [-4.70574696e-06, -3.26171971e-06, -2.07992093e-06, ...,
          -8.10987694e-06, -5.83119663e-06, -5.26532494e-06],
         [-5.91146409e-06, -3.32312925e-06, -2.49081700e-06, ...,
          -2.19475260e-06, -7.33080412e-07, -2.52351583e-06],
         [-2.57763872e-06, -3.84479017e-06, -2.77097206e-06, ...,
           6.42505483e-06,  4.72095462e-06,  3.74518891e-06]]]],
      dtype=float32)
Coordinates:
  * time     (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00
  * YC       (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917
  * Z        (Z) float64 -1.0 -3.0 -5.0 -7.0 -9.0 ... -68.5 -76.0 -85.0 -95.0
  * XC       (XC) float64 210.1 210.2 210.4 210.6 ... 249.4 249.6 249.8 249.9
    rA       (YC, XC) float32 3.425e+08 3.425e+08 ... 3.382e+08 3.382e+08
    Depth    (YC, XC) float32 4.65e+03 4.491e+03 ... 3.44e+03 3.665e+03
    dxF      (YC, XC) float32 1.849e+04 1.849e+04 ... 1.825e+04 1.825e+04
    dyF      (YC, XC) float32 1.853e+04 1.853e+04 ... 1.853e+04 1.853e+04
[3]:
del ds
[ ]:
# spin up our Dask client
from dask.distributed import LocalCluster, Client
cluster = LocalCluster()
client = Client(cluster)
client
[4]:
ds = xr.open_dataset('TPOSE6_Daily_2012.nc').chunk({'time':1})
grid = xgcm.Grid(ds, periodic=['X','Y'])
grid
[4]:
<xgcm.Grid>
Y Axis (periodic, boundary=None):
  * center   YC --> outer
  * outer    YG --> center
Z Axis (not periodic, boundary=None):
  * center   Z
T Axis (not periodic, boundary=None):
  * center   time
X Axis (periodic, boundary=None):
  * center   XC --> outer
  * outer    XG --> center
[5]:
ds.THETA
[5]:
<xarray.DataArray 'THETA' (time: 366, Z: 22, YC: 84, XC: 240)>
dask.array<xarray-THETA, shape=(366, 22, 84, 240), dtype=float32, chunksize=(1, 22, 84, 240), chunktype=numpy.ndarray>
Coordinates: (12/14)
    iter     (time) int64 dask.array<chunksize=(1,), meta=np.ndarray>
  * time     (time) timedelta64[ns] 00:01:12 00:02:24 ... 01:10:48 01:12:00
  * YC       (YC) float64 -3.917 -3.75 -3.583 -3.417 ... 9.417 9.583 9.75 9.917
  * Z        (Z) float64 -1.0 -3.0 -5.0 -7.0 -9.0 ... -68.5 -76.0 -85.0 -95.0
    drF      (Z) float32 dask.array<chunksize=(22,), meta=np.ndarray>
    PHrefC   (Z) float32 dask.array<chunksize=(22,), meta=np.ndarray>
    ...       ...
    rA       (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray>
    Depth    (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray>
    hFacC    (Z, YC, XC) float32 dask.array<chunksize=(22, 84, 240), meta=np.ndarray>
    maskC    (Z, YC, XC) bool dask.array<chunksize=(22, 84, 240), meta=np.ndarray>
    dxF      (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray>
    dyF      (YC, XC) float32 dask.array<chunksize=(84, 240), meta=np.ndarray>
Attributes:
    standard_name:  THETA
    long_name:      Potential Temperature
    units:          degC
[ ]:
%%time

# calculate the divergence of the flow at all depths
u_transport = ds.UVEL * ds.dyG * ds.hFacW * ds.drF
v_transport = ds.VVEL * ds.dxG * ds.hFacS * ds.drF

div_uv = (grid.diff(u_transport, 'X') + grid.diff(v_transport, 'Y')) / ds.rA  # calculate the divergence of the flow
div_uv.compute()
[ ]:
client.close()

See this page for a more detailed discussion of best practices with Xarray and Dask.

NOTE Take a look at all of the cool Xarray related projects! Most of them are for geoscience applications. There is also an extensive set of Xarray tutorials.