Quickstart tutorial#

So you’ve recently finished a new ACCESS model run and you want to create an Intake-ESM datastore for that run? This tutorial demonstrates how you can do that using Builders from the access-nri-intake Python package. You can download the Jupyter notebook rendered below from here and run it yourself in an NCI ARE instance.

Note

If you don’t know what a Builder is, see Datastore Builders

import warnings
warnings.filterwarnings("ignore") # Suppress warnings for these docs

Building an Intake-ESM datastore#

In this tutorial, we’ll build an Intake-ESM datastore for an ACCESS-OM2 model run that is currently not included in the ACCESS-NRI catalog. The base output directory for this model run is:

/g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126

which comprises about 150 GB of netcdf files. Because this is an ACCESS-OM2 run, we’ll use the AccessOm2Builder

import os

from access_nri_intake.source.builders import AccessOm2Builder

Building the Intake-ESM catalog should be as simple as passing the model run base output directory to the Builder and calling .build(). The build is parallelized so will be faster if you throw more resources at it. The following was run using an XX-Large normalbw ARE instance (28 cpus). Note, a warning is thrown below because core metadata is missing (and thus inferred) from some of the files in this model output.

%%time

builder = AccessOm2Builder(
    path="/g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126"
).build()
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/access_nri_intake/source/utils.py:37: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/access_nri_intake/source/utils.py:37: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/access_nri_intake/source/utils.py:37: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/access_nri_intake/source/utils.py:37: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/access_nri_intake/source/utils.py:37: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/access_nri_intake/source/utils.py:37: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/access_nri_intake/source/utils.py:37: UserWarning: Time coordinate does not include bounds information. Guessing start and end times.
  warnings.warn(
CPU times: user 1.48 s, sys: 676 ms, total: 2.16 s
Wall time: 4.81 s

The previous cell builds the Intake-ESM datastore in memory. We’ll want to save it somewhere so we can reuse and share it. The following cell will create two new files (a .json and .csv file) in your current work directory. These files are how Intake-ESM datastores are stored on disk.

builder.save(
    name="mydatastore", 
    description="An example datastore for ACCESS-OM2 1deg_iamip2_CMCC-ESM2ssp126",
)
Successfully wrote ESM catalog json file to: file:///home/599/ds0092/mydatastore.json

Note

All access-nri-intake Builders require that the output base path is provided. Some also have additional optional arguments. For example, AccessCm2Builder and AccessEsm15Builder can also receive an optional ensemble parameter that can be used to create datastores of ensemble outputs (see Datastore Builders)

Using your Intake-ESM datastore#

Now we can use our Intake-ESM datastore to query and load the model data. Only the basics are shown in this tutorial. You can read the Intake-ESM documentation here.

We can load the datastore directly using intake.

import intake

cat = intake.open_esm_datastore(
    "./mydatastore.json", 
    columns_with_iterables=["variable"] # This is important
)

We can see what datasets are available in our datastore by looking at the output of the keys() method. Here, a “dataset” is a set of contiguous files that we can load and combine together using xarray. It’s good to check that these make sense when creating new datastores.

cat.keys()
['iceh_XXX_daily.1day',
 'ocean_2d_area_t.fx',
 'ocean_2d_area_u.fx',
 'ocean_2d_drag_coeff.fx',
 'ocean_2d_dxt.fx',
 'ocean_2d_dxu.fx',
 'ocean_2d_dyt.fx',
 'ocean_2d_dyu.fx',
 'ocean_2d_geolat_c.fx',
 'ocean_2d_geolat_t.fx',
 'ocean_2d_geolon_c.fx',
 'ocean_2d_geolon_t.fx',
 'ocean_2d_ht.fx',
 'ocean_2d_hu.fx',
 'ocean_2d_kmt.fx',
 'ocean_2d_kmu.fx',
 'ocean_2d_mld_1_daily_mean_ym_XXXX_XX.1day',
 'ocean_2d_surface_salt_1_daily_mean_ym_XXXX_XX.1day',
 'ocean_2d_surface_temp_1_daily_mean_ym_XXXX_XX.1day',
 'ocean_3d_salt_1_yearly_mean_ym_XXXX_XX.1yr',
 'ocean_3d_temp_1_yearly_mean_ym_XXXX_XX.1yr',
 'ocean_scalar_1_monthly_ym_XXXX_XX.1mon',
 'oceanbgc_2d_npp1_1_daily_mean_y_XXXX.1day',
 'oceanbgc_2d_npp2d_1_daily_mean_y_XXXX.1day',
 'oceanbgc_2d_pprod_gross_2d_1_daily_mean_y_XXXX.1day',
 'oceanbgc_2d_radbio1_1_daily_mean_y_XXXX.1day',
 'oceanbgc_2d_stf09_1_daily_mean_y_XXXX.1day',
 'oceanbgc_2d_surface_adic_1_daily_mean_ym_XXXX_XX.1day',
 'oceanbgc_2d_surface_alk_1_daily_mean_ym_XXXX_XX.1day',
 'oceanbgc_2d_surface_det_1_daily_mean_ym_XXXX_XX.1day',
 'oceanbgc_2d_surface_fe_1_daily_mean_ym_XXXX_XX.1day',
 'oceanbgc_2d_surface_no3_1_daily_mean_ym_XXXX_XX.1day',
 'oceanbgc_2d_surface_phy_1_daily_mean_ym_XXXX_XX.1day',
 'oceanbgc_2d_surface_zoo_1_daily_mean_ym_XXXX_XX.1day',
 'oceanbgc_2d_wdet100_1_daily_mean_y_XXXX.1day',
 'oceanbgc_3d_adic_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_alk_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_caco3_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_det_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_dic_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_fe_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_no3_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_npp3d_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_o2_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_phy_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_pprod_gross_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_3d_zoo_1_yearly_mean_y_XXXX.1yr',
 'oceanbgc_scalar_1_monthly_ym_XXXX_XX.1mon']

All access-nri-intake Builders label datasets using a file identifier and a frequency: file_id.frequency. The file id is parsed from the filenames. The “X”s are there in place of time stamps that were included in the filenames for some of the data. The dataset labels above look sensible for the ACCESS-OM2 data were working with here, which have a separate set of files for each output variable.

Note

If your dataset labels don’t look right, please open an issue here.

It’s easy to search for datasets in the datastore containing a particular variable and load them as xarray Datasets. (Note for analysing large datasets, you may want to first start a dask cluster).

ds = cat.search(variable="temp").to_dask()
ds["temp"].isel(time=-1, st_ocean=0).plot()
<matplotlib.collections.QuadMesh at 0x152cfff52860>
../_images/5fa05544578d33c0aceb5399618f92360f8f4b156960381740d57c5ad81e040f.png

Once you’ve created a datastore for your model run and you think it’s working as expected, please consider adding it to the ACCESS-NRI catalog so that others can easily find and use your great data - see Adding datastores to the catalog.