Datastore Builders

Datastore Builders#

The same Python package that includes the ACCESS-NRI catalog, access-nri-intake, also includes a set of Intake-ESM datastore Builders for different ACCESS model outputs. In general, building an Intake-ESM datastore for your ACCESS model output should be as simple as passing your output base directory to an appropriate Builder.

The access-nri-intake package is installed in the xp65 analysis environment, or users can install it into their own environment (see Installing the catalog for details). The Builders can be imported from the access_nri_intake.source.builders submodule.

There are currently seven Builders available. Their core public APIs are given below (their full APIs can be found in API for access_nri_intake.source).

Note

These Builders are used by ACCESS-NRI to create the ACCESS-NRI catalog.

ACCESS-OM2 output: `AccessOm2Builder`#

class access_nri_intake.source.builders.AccessOm2Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets

Attributes:

columns_with_iterables: Return a set of the columns that have iterables
exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets: Return the list of valid assets that have been parsed and validated

Methods

`build`()	Builds a datastore from a list of netCDF files or zarr stores.
`clean_dataframe`()	Clean the dataframe by excluding invalid assets and removing duplicate entries.
`parse`()	Parse metadata from assets.
`parse_filename_freq`(filename[, frequencies])	Parse an ACCESS model filename and return a file id and any time information
`parse_ncfile`(file[, time_dim])	Get Intake-ESM datastore entry info from a netcdf file
`parser`(file)	Parse info from a file asset
`save`(name, description[, directory, use_parquet])	Save datastore contents to a file.
`validate_parser`()	Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a AccessOm2Builder

Parameters:

pathstr or list of str: Path or list of paths to crawl for assets/files.

build(): Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:

name: str: The name of the file to save the datastore to.
descriptionstr: Detailed multi-line description of the collection.
directory: str, optional: The directory to save the datastore to. If None, use the current directory.
use_parquet: bool, optional: Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ACCESS-ESM1.5 output: `AccessEsm15Builder`#

class access_nri_intake.source.builders.AccessEsm15Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-ESM1.5 datasets

Attributes:

columns_with_iterables: Return a set of the columns that have iterables
exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets: Return the list of valid assets that have been parsed and validated

Methods

`build`()	Builds a datastore from a list of netCDF files or zarr stores.
`clean_dataframe`()	Clean the dataframe by excluding invalid assets and removing duplicate entries.
`parse`()	Parse metadata from assets.
`parse_filename_freq`(filename[, frequencies])	Parse an ACCESS model filename and return a file id and any time information
`parse_ncfile`(file[, time_dim])	Get Intake-ESM datastore entry info from a netcdf file
`parser`(file)	Parse info from a file asset
`save`(name, description[, directory, use_parquet])	Save datastore contents to a file.
`validate_parser`()	Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, ensemble, **kwargs)

Initialise a AccessEsm15Builder

Parameters:

path: str or list of str: Path or list of paths to crawl for assets/files.
ensemble: boolean: Whether to treat each path as a separate member of an ensemble to join along a new member dimension

build(): Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:

name: str: The name of the file to save the datastore to.
descriptionstr: Detailed multi-line description of the collection.
directory: str, optional: The directory to save the datastore to. If None, use the current directory.
use_parquet: bool, optional: Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ACCESS-ESM1.6 output: `AccessEsm16Builder`#

class access_nri_intake.source.builders.AccessEsm16Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-ESM1.6 datasets

Attributes:

columns_with_iterables: Return a set of the columns that have iterables
exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets: Return the list of valid assets that have been parsed and validated

Methods

`build`()	Builds a datastore from a list of netCDF files or zarr stores.
`clean_dataframe`()	Clean the dataframe by excluding invalid assets and removing duplicate entries.
`parse`()	Parse metadata from assets.
`parse_filename_freq`(filename[, frequencies])	Parse an ACCESS model filename and return a file id and any time information
`parse_ncfile`(file[, time_dim])	Get Intake-ESM datastore entry info from a netcdf file
`parser`(file)	Get the realm and member/experiment id from the file name
`save`(name, description[, directory, use_parquet])	Save datastore contents to a file.
`validate_parser`()	Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, ensemble, **kwargs)

Initialise a AccessEsm15Builder

Parameters:

path: str or list of str: Path or list of paths to crawl for assets/files.
ensemble: boolean: Whether to treat each path as a separate member of an ensemble to join along a new member dimension

build(): Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:

name: str: The name of the file to save the datastore to.
descriptionstr: Detailed multi-line description of the collection.
directory: str, optional: The directory to save the datastore to. If None, use the current directory.
use_parquet: bool, optional: Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ACCESS-CM2 output: `AccessCm2Builder`#

class access_nri_intake.source.builders.AccessCm2Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-CM2 datasets

Attributes:

columns_with_iterables: Return a set of the columns that have iterables
exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets: Return the list of valid assets that have been parsed and validated

Methods

`build`()	Builds a datastore from a list of netCDF files or zarr stores.
`clean_dataframe`()	Clean the dataframe by excluding invalid assets and removing duplicate entries.
`parse`()	Parse metadata from assets.
`parse_filename_freq`(filename[, frequencies])	Parse an ACCESS model filename and return a file id and any time information
`parse_ncfile`(file[, time_dim])	Get Intake-ESM datastore entry info from a netcdf file
`parser`(file)	Parse info from a file asset
`save`(name, description[, directory, use_parquet])	Save datastore contents to a file.
`validate_parser`()	Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, ensemble, **kwargs)

Initialise a AccessEsm15Builder

Parameters:

path: str or list of str: Path or list of paths to crawl for assets/files.
ensemble: boolean: Whether to treat each path as a separate member of an ensemble to join along a new member dimension

build(): Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:

name: str: The name of the file to save the datastore to.
descriptionstr: Detailed multi-line description of the collection.
directory: str, optional: The directory to save the datastore to. If None, use the current directory.
use_parquet: bool, optional: Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ACCESS-OM3 output: `AccessOm3Builder`#

class access_nri_intake.source.builders.AccessOm3Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets

Attributes:

columns_with_iterables: Return a set of the columns that have iterables
exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets: Return the list of valid assets that have been parsed and validated

Methods

`build`()	Builds a datastore from a list of netCDF files or zarr stores.
`clean_dataframe`()	Clean the dataframe by excluding invalid assets and removing duplicate entries.
`parse`()	Parse metadata from assets.
`parse_filename_freq`(filename[, frequencies])	Parse an ACCESS model filename and return a file id and any time information
`parse_ncfile`(file[, time_dim])	Get Intake-ESM datastore entry info from a netcdf file
`parser`(file)	Parse info from a file asset
`save`(name, description[, directory, use_parquet])	Save datastore contents to a file.
`validate_parser`()	Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a AccessOm3Builder

Parameters:

pathstr or list of str: Path or list of paths to crawl for assets/files.

build(): Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:

name: str: The name of the file to save the datastore to.
descriptionstr: Detailed multi-line description of the collection.
directory: str, optional: The directory to save the datastore to. If None, use the current directory.
use_parquet: bool, optional: Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

MOM6 output: `Mom6Builder`#

class access_nri_intake.source.builders.Mom6Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for MOM6 COSIMA datasets

Attributes:

columns_with_iterables: Return a set of the columns that have iterables
exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets: Return the list of valid assets that have been parsed and validated

Methods

`build`()	Builds a datastore from a list of netCDF files or zarr stores.
`clean_dataframe`()	Clean the dataframe by excluding invalid assets and removing duplicate entries.
`parse`()	Parse metadata from assets.
`parse_filename_freq`(filename[, frequencies])	Parse an ACCESS model filename and return a file id and any time information
`parse_ncfile`(file[, time_dim])	Get Intake-ESM datastore entry info from a netcdf file
`parser`(file)	Parse info from a file asset
`save`(name, description[, directory, use_parquet])	Save datastore contents to a file.
`validate_parser`()	Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a Mom6Builder

Parameters:

pathstr or list of str: Path or list of paths to crawl for assets/files.

build(): Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:

name: str: The name of the file to save the datastore to.
descriptionstr: Detailed multi-line description of the collection.
directory: str, optional: The directory to save the datastore to. If None, use the current directory.
use_parquet: bool, optional: Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

ROMSIceShelf output: `ROMSBuilder`#

class access_nri_intake.source.builders.ROMSBuilder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for ROMS datasets

See bkgf/ROMSIceShelf for details on the ROMSIceShelf model.

Attributes:

columns_with_iterables: Return a set of the columns that have iterables
exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets: Return the list of valid assets that have been parsed and validated

Methods

`build`()	Builds a datastore from a list of netCDF files or zarr stores.
`clean_dataframe`()	Clean the dataframe by excluding invalid assets and removing duplicate entries.
`parse`()	Parse metadata from assets.
`parse_filename_freq`(filename[, frequencies])	Parse an ACCESS model filename and return a file id and any time information
`parse_ncfile`(file[, time_dim])	Get Intake-ESM datastore entry info from a netcdf file
`parser`(file)	Parse info from a file asset
`save`(name, description[, directory, use_parquet])	Save datastore contents to a file.
`validate_parser`()	Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a AccessOm2Builder

Parameters:

pathstr or list of str: Path or list of paths to crawl for assets/files.

build(): Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:

name: str: The name of the file to save the datastore to.
descriptionstr: Detailed multi-line description of the collection.
directory: str, optional: The directory to save the datastore to. If None, use the current directory.
use_parquet: bool, optional: Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

World Ocean Atlas output: `WoaBuilder`#

class access_nri_intake.source.builders.WoaBuilder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)

Intake-ESM datastore builder for WOA datasets

Attributes:

columns_with_iterables: Return a set of the columns that have iterables
exclude_patterns
include_patterns
joblib_parallel_kwargs
storage_options
valid_assets: Return the list of valid assets that have been parsed and validated

Methods

`build`()	Builds a datastore from a list of netCDF files or zarr stores.
`clean_dataframe`()	Clean the dataframe by excluding invalid assets and removing duplicate entries.
`parse`()	Parse metadata from assets.
`parse_filename_freq`(filename[, frequencies])	Parse an ACCESS model filename and return a file id and any time information
`parse_ncfile`(file[, time_dim])	Get Intake-ESM datastore entry info from a netcdf file
`parser`(file)	Overwrite the parser method to add a grid id to the output dictionary.
`save`(name, description[, directory, use_parquet])	Save datastore contents to a file.
`validate_parser`()	Run the parser on a single file and check the schema of the info being parsed

get_assets

__init__(path, **kwargs)

Initialise a WoaBuilder

Parameters:

pathstr or list of str: Path or list of paths to crawl for assets/files.

build(): Builds a datastore from a list of netCDF files or zarr stores.

save(name, description, directory=None, use_parquet=False)

Save datastore contents to a file.

Parameters:

name: str: The name of the file to save the datastore to.
descriptionstr: Detailed multi-line description of the collection.
directory: str, optional: The directory to save the datastore to. If None, use the current directory.
use_parquet: bool, optional: Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

Note

If you have ACCESS model output that isn’t compatible with the existing set of Builders, check out the Creating a new Builder section or open an issue here.

Datastore Builders

Contents

Datastore Builders#

ACCESS-OM2 output: AccessOm2Builder#

ACCESS-ESM1.5 output: AccessEsm15Builder#

ACCESS-ESM1.6 output: AccessEsm16Builder#

ACCESS-CM2 output: AccessCm2Builder#

ACCESS-OM3 output: AccessOm3Builder#

MOM6 output: Mom6Builder#

ROMSIceShelf output: ROMSBuilder#

World Ocean Atlas output: WoaBuilder#

ACCESS-OM2 output: `AccessOm2Builder`#

ACCESS-ESM1.5 output: `AccessEsm15Builder`#

ACCESS-ESM1.6 output: `AccessEsm16Builder`#

ACCESS-CM2 output: `AccessCm2Builder`#

ACCESS-OM3 output: `AccessOm3Builder`#

MOM6 output: `Mom6Builder`#

ROMSIceShelf output: `ROMSBuilder`#

World Ocean Atlas output: `WoaBuilder`#