access_nri_intake.source.builders

Contents

access_nri_intake.source.builders#

Builders for generating Intake-ESM datastores

Classes#

AccessOm2Builder

Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets

AccessOm3Builder

Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets

Mom6Builder

Intake-ESM datastore builder for MOM6 COSIMA datasets

AccessEsm15Builder

Intake-ESM datastore builder for ACCESS-ESM1.5 datasets

AccessCm2Builder

Intake-ESM datastore builder for ACCESS-CM2 datasets

Module Contents#

class access_nri_intake.source.builders.AccessOm2Builder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets

Initialise a AccessOm2Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

PATTERNS#
classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

TIME_PARSER#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

patterns: list of str, optional

A list of regex patterns to match against the filename. If None, use the class PATTERNS

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
file_id: str

The file id constructed by redacting time information and replacing non-python characters with underscores

timestamp: str | None

A string of the redacted time information (e.g. “1990-01”) if available, otherwise None

frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.AccessOm3Builder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets

Initialise a AccessOm3Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

PATTERNS#
classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

TIME_PARSER#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

patterns: list of str, optional

A list of regex patterns to match against the filename. If None, use the class PATTERNS

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
file_id: str

The file id constructed by redacting time information and replacing non-python characters with underscores

timestamp: str | None

A string of the redacted time information (e.g. “1990-01”) if available, otherwise None

frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.Mom6Builder(path, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for MOM6 COSIMA datasets

Initialise a Mom6Builder

Parameters:
pathstr or list of str

Path or list of paths to crawl for assets/files.

PATTERNS#
TIME_PARSER#
classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

patterns: list of str, optional

A list of regex patterns to match against the filename. If None, use the class PATTERNS

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
file_id: str

The file id constructed by redacting time information and replacing non-python characters with underscores

timestamp: str | None

A string of the redacted time information (e.g. “1990-01”) if available, otherwise None

frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.AccessEsm15Builder(path, ensemble, **kwargs)#

Bases: BaseBuilder

Intake-ESM datastore builder for ACCESS-ESM1.5 datasets

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

PATTERNS#
classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

TIME_PARSER#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

patterns: list of str, optional

A list of regex patterns to match against the filename. If None, use the class PATTERNS

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
file_id: str

The file id constructed by redacting time information and replacing non-python characters with underscores

timestamp: str | None

A string of the redacted time information (e.g. “1990-01”) if available, otherwise None

frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.

class access_nri_intake.source.builders.AccessCm2Builder(path, ensemble, **kwargs)#

Bases: AccessEsm15Builder

Intake-ESM datastore builder for ACCESS-CM2 datasets

Initialise a AccessEsm15Builder

Parameters:
path: str or list of str

Path or list of paths to crawl for assets/files.

ensemble: boolean

Whether to treat each path as a separate member of an ensemble to join along a new member dimension

PATTERNS#
classmethod parser(file)#

Parse info from a file asset

Parameters:
file: str

The path to the file

TIME_PARSER#
paths#
depth = 0#
exclude_patterns = None#
include_patterns = None#
data_format = 'netcdf'#
groupby_attrs = None#
aggregations = None#
storage_options = None#
joblib_parallel_kwargs#
parse()#

Parse metadata from assets.

save(name, description, directory=None)#

Save datastore contents to a file.

Parameters:
name: str

The name of the file to save the datastore to.

descriptionstr

Detailed multi-line description of the collection.

directory: str, optional

The directory to save the datastore to. If None, use the current directory.

validate_parser()#

Run the parser on a single file and check the schema of the info being parsed

build()#

Builds a datastore from a list of netCDF files or zarr stores.

property columns_with_iterables#

Return a set of the columns that have iterables

classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#

Parse an ACCESS model filename and return a file id and any time information

Parameters:
filename: str

The filename to parse with the extension removed

patterns: list of str, optional

A list of regex patterns to match against the filename. If None, use the class PATTERNS

frequencies: dict, optional

A dictionary of regex patterns to match against the filename to determine the frequency

redaction_fill: str, optional

The character to replace time information with. Defaults to “X”

Returns:
file_id: str

The file id constructed by redacting time information and replacing non-python characters with underscores

timestamp: str | None

A string of the redacted time information (e.g. “1990-01”) if available, otherwise None

frequency: str | None

The frequency of the file if available in the filename, otherwise None

classmethod parse_ncfile(file, time_dim='time')#

Get Intake-ESM datastore entry info from a netcdf file

Parameters:
fname: str

The path to the netcdf file

time_dim: str

The name of the time dimension

Returns:
output_nc_info: _NCFileInfo

A dataclass containing the information parsed from the file

Raises:
EmptyFileError: If the file contains no variables
get_assets()#
clean_dataframe()#

Clean the dataframe by excluding invalid assets and removing duplicate entries.