access_nri_intake.source.builders

Contents

access_nri_intake.source.builders#

Builders for generating Intake-ESM datastores

Note: It looks like the {**default_kwargs, **kwargs} pattern is repeated a lot in the builders. The default kwargs all look very similar, but are not the same. Trying to unify them without a bunch of extra effort (probably making the deduplication effort wasted/more complex than it currently is) is not going to work.

Classes#

AccessOm2Builder

Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets

AccessOm3Builder

Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets

Mom6Builder

Intake-ESM datastore builder for MOM6 COSIMA datasets

AccessEsm15Builder

Intake-ESM datastore builder for ACCESS-ESM1.5 datasets

AccessCm2Builder

Intake-ESM datastore builder for ACCESS-CM2 datasets

AccessEsm16Builder

Intake-ESM datastore builder for ACCESS-ESM1.6 datasets

OnlineMltBuilder

Builder for the Mixed Layer Tracer Budget Diagnostics dataset located at

AccessCm3Builder

Intake-ESM datastore builder for ACCESS-CM3 datasets

ROMSBuilder

Intake-ESM datastore builder for ROMS datasets

WoaBuilder

Intake-ESM datastore builder for WOA datasets

Cmip6Builder

Intake-ESM datastore builder for CMIP6 datasets

Module Contents#

class access_nri_intake.source.builders.AccessOm2Builder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets

Initialise a AccessOm2Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.AccessOm3Builder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets

Initialise a AccessOm3Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.Mom6Builder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for MOM6 COSIMA datasets

Initialise a Mom6Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.AccessEsm15Builder(path, ensemble, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for ACCESS-ESM1.5 datasets

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.AccessCm2Builder(path, ensemble, **kwargs)#

Bases: AccessEsm15Builder

Intake-ESM datastore builder for ACCESS-CM2 datasets

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.AccessEsm16Builder(path, ensemble, **kwargs)#

Bases: AccessEsm15Builder

Intake-ESM datastore builder for ACCESS-ESM1.6 datasets

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

PATH_REGEX = '.*/output\\d+/([^/]*)(?:/[^/]*)?/.*\\.nc'#
REALM_MAPPING#
classmethod parser(file)#

Get the realm and member/experiment id from the file name

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.OnlineMltBuilder(path, ensemble, **kwargs)#

Bases: AccessEsm16Builder

Builder for the Mixed Layer Tracer Budget Diagnostics dataset located at /g/data/av17/access-nri/OM2/025deg_jra55_iaf_cycle6_online_mlt generated by Ryan Holmes (ryan.holmes@bom.gov.au)

Dataset constists of a trimmed down repeat of an existing experiment with additional diagnostics added.

These files are not added to the datastore: - output*/o2i.nc : these files have no calendar attribute on the ‘time’ axis

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

PATH_REGEX = '.*/(?:output\\d+|post_processed_diags|.*)/([^/]*)(?:/[^/]*)?/.*\\.nc'#
REALM_MAPPING#
classmethod parser(file)#

Get the realm and member/experiment id from the file name

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.AccessCm3Builder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for ACCESS-CM3 datasets

Initialise a AccessCm3Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.ROMSBuilder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for ROMS datasets

See bkgf/ROMSIceShelf for details on the ROMSIceShelf model.

Initialise a AccessOm2Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.WoaBuilder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for WOA datasets

Initialise a WoaBuilder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

classmethod parser(file)#

Overwrite the parser method to add a grid id to the output dictionary.

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.Cmip6Builder(path, ensemble, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for CMIP6 datasets

Initialise a Cmip6Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

ensemble: bool = True#
classmethod parser(file)#

No need to do much here - just parse the netCDF file and return the info as a dictionary. The realm is obtained from the file metadata following ACCESS-NRI/access-nri-intake-catalog#478.

PATTERNS: list = ['*.nc']#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None, use_parquet=False)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

use_parquet: bool, optional

Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
property valid_assets: list[str]#

Return the list of valid assets that have been parsed and validated

get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.