Datastore Builders#

The same Python package that includes the ACCESS-NRI catalog, access-nri-intake, also includes a set of Intake-ESM datastore Builders for different ACCESS model outputs. In general, building an Intake-ESM datastore for your ACCESS model output should be as simple as passing your output base directory to an appropriate Builder.

The access-nri-intake package is installed in the xp65 analysis environment, or users can install it into their own environment (see Installing the catalog for details). The Builders can be imported from the access_nri_intake.source.builders submodule.

There are currently seven Builders available. Their core public APIs are given below (their full APIs can be found in API for access_nri_intake.source).

Note

These Builders are used by ACCESS-NRI to create the ACCESS-NRI catalog.

ACCESS-OM2 output: AccessOm2Builder#

class access_nri_intake.source.builders.AccessOm2Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets

Attributes:
columns_with_iterables

Return a set of the columns that have iterables

exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets

Return the list of valid assets that have been parsed and validated

Methods

build()

Builds a datastore from a list of netCDF files or zarr stores.

clean_dataframe()

Clean the dataframe by excluding invalid assets and removing duplicate entries.

parse()

Parse metadata from assets.

parse_filename_freq(filename[, frequencies])

Parse an ACCESS model filename and return a file id and any time information

parse_ncfile(file[, time_dim])

Get Intake-ESM datastore entry info from a netcdf file

parser(file)

Parse info from a file asset

save(name, description[, directory, use_parquet])

Save datastore contents to a file.

validate_parser()

Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a AccessOm2Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

build()

Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ACCESS-ESM1.5 output: AccessEsm15Builder#

class access_nri_intake.source.builders.AccessEsm15Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-ESM1.5 datasets

Attributes:
columns_with_iterables

Return a set of the columns that have iterables

exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets

Return the list of valid assets that have been parsed and validated

Methods

build()

Builds a datastore from a list of netCDF files or zarr stores.

clean_dataframe()

Clean the dataframe by excluding invalid assets and removing duplicate entries.

parse()

Parse metadata from assets.

parse_filename_freq(filename[, frequencies])

Parse an ACCESS model filename and return a file id and any time information

parse_ncfile(file[, time_dim])

Get Intake-ESM datastore entry info from a netcdf file

parser(file)

Parse info from a file asset

save(name, description[, directory, use_parquet])

Save datastore contents to a file.

validate_parser()

Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, ensemble, **kwargs)

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

build()

Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ACCESS-ESM1.6 output: AccessEsm16Builder#

class access_nri_intake.source.builders.AccessEsm16Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-ESM1.6 datasets

Attributes:
columns_with_iterables

Return a set of the columns that have iterables

exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets

Return the list of valid assets that have been parsed and validated

Methods

build()

Builds a datastore from a list of netCDF files or zarr stores.

clean_dataframe()

Clean the dataframe by excluding invalid assets and removing duplicate entries.

parse()

Parse metadata from assets.

parse_filename_freq(filename[, frequencies])

Parse an ACCESS model filename and return a file id and any time information

parse_ncfile(file[, time_dim])

Get Intake-ESM datastore entry info from a netcdf file

parser(file)

Get the realm and member/experiment id from the file name

save(name, description[, directory, use_parquet])

Save datastore contents to a file.

validate_parser()

Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, ensemble, **kwargs)

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

build()

Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ACCESS-CM2 output: AccessCm2Builder#

class access_nri_intake.source.builders.AccessCm2Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-CM2 datasets

Attributes:
columns_with_iterables

Return a set of the columns that have iterables

exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets

Return the list of valid assets that have been parsed and validated

Methods

build()

Builds a datastore from a list of netCDF files or zarr stores.

clean_dataframe()

Clean the dataframe by excluding invalid assets and removing duplicate entries.

parse()

Parse metadata from assets.

parse_filename_freq(filename[, frequencies])

Parse an ACCESS model filename and return a file id and any time information

parse_ncfile(file[, time_dim])

Get Intake-ESM datastore entry info from a netcdf file

parser(file)

Parse info from a file asset

save(name, description[, directory, use_parquet])

Save datastore contents to a file.

validate_parser()

Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, ensemble, **kwargs)

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

build()

Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ACCESS-OM3 output: AccessOm3Builder#

class access_nri_intake.source.builders.AccessOm3Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets

Attributes:
columns_with_iterables

Return a set of the columns that have iterables

exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets

Return the list of valid assets that have been parsed and validated

Methods

build()

Builds a datastore from a list of netCDF files or zarr stores.

clean_dataframe()

Clean the dataframe by excluding invalid assets and removing duplicate entries.

parse()

Parse metadata from assets.

parse_filename_freq(filename[, frequencies])

Parse an ACCESS model filename and return a file id and any time information

parse_ncfile(file[, time_dim])

Get Intake-ESM datastore entry info from a netcdf file

parser(file)

Parse info from a file asset

save(name, description[, directory, use_parquet])

Save datastore contents to a file.

validate_parser()

Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a AccessOm3Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

build()

Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

MOM6 output: Mom6Builder#

class access_nri_intake.source.builders.Mom6Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for MOM6 COSIMA datasets

Attributes:
columns_with_iterables

Return a set of the columns that have iterables

exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets

Return the list of valid assets that have been parsed and validated

Methods

build()

Builds a datastore from a list of netCDF files or zarr stores.

clean_dataframe()

Clean the dataframe by excluding invalid assets and removing duplicate entries.

parse()

Parse metadata from assets.

parse_filename_freq(filename[, frequencies])

Parse an ACCESS model filename and return a file id and any time information

parse_ncfile(file[, time_dim])

Get Intake-ESM datastore entry info from a netcdf file

parser(file)

Parse info from a file asset

save(name, description[, directory, use_parquet])

Save datastore contents to a file.

validate_parser()

Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a Mom6Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

build()

Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ROMSIceShelf output: ROMSBuilder#

class access_nri_intake.source.builders.ROMSBuilder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ROMS datasets

See bkgf/ROMSIceShelf for details on the ROMSIceShelf model.

Attributes:
columns_with_iterables

Return a set of the columns that have iterables

exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets

Return the list of valid assets that have been parsed and validated

Methods

build()

Builds a datastore from a list of netCDF files or zarr stores.

clean_dataframe()

Clean the dataframe by excluding invalid assets and removing duplicate entries.

parse()

Parse metadata from assets.

parse_filename_freq(filename[, frequencies])

Parse an ACCESS model filename and return a file id and any time information

parse_ncfile(file[, time_dim])

Get Intake-ESM datastore entry info from a netcdf file

parser(file)

Parse info from a file asset

save(name, description[, directory, use_parquet])

Save datastore contents to a file.

validate_parser()

Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a AccessOm2Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

build()

Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

World Ocean Atlas output: WoaBuilder#

class access_nri_intake.source.builders.WoaBuilder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for WOA datasets

Attributes:
columns_with_iterables

Return a set of the columns that have iterables

exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets

Return the list of valid assets that have been parsed and validated

Methods

build()

Builds a datastore from a list of netCDF files or zarr stores.

clean_dataframe()

Clean the dataframe by excluding invalid assets and removing duplicate entries.

parse()

Parse metadata from assets.

parse_filename_freq(filename[, frequencies])

Parse an ACCESS model filename and return a file id and any time information

parse_ncfile(file[, time_dim])

Get Intake-ESM datastore entry info from a netcdf file

parser(file)

Overwrite the parser method to add a grid id to the output dictionary.

save(name, description[, directory, use_parquet])

Save datastore contents to a file.

validate_parser()

Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a WoaBuilder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

build()

Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

Note

If you have ACCESS model output that isn’t compatible with the existing set of Builders, check out the Creating a new Builder section or open an issue here.