access_nri_intake.source.builders#
Builders for generating Intake-ESM datastores
Classes#
Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets |
|
Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets |
|
Intake-ESM datastore builder for MOM6 COSIMA datasets |
|
Intake-ESM datastore builder for ACCESS-ESM1.5 datasets |
|
Intake-ESM datastore builder for ACCESS-CM2 datasets |
Module Contents#
- class access_nri_intake.source.builders.AccessOm2Builder(path, **kwargs)#
Bases:
BaseBuilder
Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets
Initialise a AccessOm2Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- PATTERNS#
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- TIME_PARSER#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- patterns: list of str, optional
A list of regex patterns to match against the filename. If None, use the class PATTERNS
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- file_id: str
The file id constructed by redacting time information and replacing non-python characters with underscores
- timestamp: str | None
A string of the redacted time information (e.g. “1990-01”) if available, otherwise None
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.AccessOm3Builder(path, **kwargs)#
Bases:
BaseBuilder
Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets
Initialise a AccessOm3Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- PATTERNS#
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- TIME_PARSER#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- patterns: list of str, optional
A list of regex patterns to match against the filename. If None, use the class PATTERNS
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- file_id: str
The file id constructed by redacting time information and replacing non-python characters with underscores
- timestamp: str | None
A string of the redacted time information (e.g. “1990-01”) if available, otherwise None
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.Mom6Builder(path, **kwargs)#
Bases:
BaseBuilder
Intake-ESM datastore builder for MOM6 COSIMA datasets
Initialise a Mom6Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- PATTERNS#
- TIME_PARSER#
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- patterns: list of str, optional
A list of regex patterns to match against the filename. If None, use the class PATTERNS
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- file_id: str
The file id constructed by redacting time information and replacing non-python characters with underscores
- timestamp: str | None
A string of the redacted time information (e.g. “1990-01”) if available, otherwise None
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.AccessEsm15Builder(path, ensemble, **kwargs)#
Bases:
BaseBuilder
Intake-ESM datastore builder for ACCESS-ESM1.5 datasets
Initialise a AccessEsm15Builder
- Parameters:
- path: str or list of str
Path or list of paths to crawl for assets/files.
- ensemble: boolean
Whether to treat each path as a separate member of an ensemble to join along a new member dimension
- PATTERNS#
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- TIME_PARSER#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- patterns: list of str, optional
A list of regex patterns to match against the filename. If None, use the class PATTERNS
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- file_id: str
The file id constructed by redacting time information and replacing non-python characters with underscores
- timestamp: str | None
A string of the redacted time information (e.g. “1990-01”) if available, otherwise None
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.AccessCm2Builder(path, ensemble, **kwargs)#
Bases:
AccessEsm15Builder
Intake-ESM datastore builder for ACCESS-CM2 datasets
Initialise a AccessEsm15Builder
- Parameters:
- path: str or list of str
Path or list of paths to crawl for assets/files.
- ensemble: boolean
Whether to treat each path as a separate member of an ensemble to join along a new member dimension
- PATTERNS#
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- TIME_PARSER#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename(filename, patterns=None, frequencies=FREQUENCIES, redaction_fill='X')#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- patterns: list of str, optional
A list of regex patterns to match against the filename. If None, use the class PATTERNS
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- file_id: str
The file id constructed by redacting time information and replacing non-python characters with underscores
- timestamp: str | None
A string of the redacted time information (e.g. “1990-01”) if available, otherwise None
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.