access_nri_intake.source.builders#
Builders for generating Intake-ESM datastores
Note: It looks like the {**default_kwargs, **kwargs} pattern is repeated a lot in the builders. The default kwargs all look very similar, but are not the same. Trying to unify them without a bunch of extra effort (probably making the deduplication effort wasted/more complex than it currently is) is not going to work.
Classes#
Intake-ESM datastore builder for ACCESS-OM2 COSIMA datasets |
|
Intake-ESM datastore builder for ACCESS-OM3 COSIMA datasets |
|
Intake-ESM datastore builder for MOM6 COSIMA datasets |
|
Intake-ESM datastore builder for ACCESS-ESM1.5 datasets |
|
Intake-ESM datastore builder for ACCESS-CM2 datasets |
|
Intake-ESM datastore builder for ACCESS-ESM1.6 datasets |
|
Builder for the Mixed Layer Tracer Budget Diagnostics dataset located at |
|
Intake-ESM datastore builder for ACCESS-CM3 datasets |
|
Intake-ESM datastore builder for ROMS datasets |
|
Intake-ESM datastore builder for WOA datasets |
|
Intake-ESM datastore builder for CMIP6 datasets |
Module Contents#
- class access_nri_intake.source.builders.AccessOm2Builder(path, **kwargs)#
Bases:
BaseBuilderIntake-ESM datastore builder for ACCESS-OM2 COSIMA datasets
Initialise a AccessOm2Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.AccessOm3Builder(path, **kwargs)#
Bases:
BaseBuilderIntake-ESM datastore builder for ACCESS-OM3 COSIMA datasets
Initialise a AccessOm3Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.Mom6Builder(path, **kwargs)#
Bases:
BaseBuilderIntake-ESM datastore builder for MOM6 COSIMA datasets
Initialise a Mom6Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.AccessEsm15Builder(path, ensemble, **kwargs)#
Bases:
BaseBuilderIntake-ESM datastore builder for ACCESS-ESM1.5 datasets
Initialise a AccessEsm15Builder
- Parameters:
- path: str or list of str
Path or list of paths to crawl for assets/files.
- ensemble: boolean
Whether to treat each path as a separate member of an ensemble to join along a new member dimension
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.AccessCm2Builder(path, ensemble, **kwargs)#
Bases:
AccessEsm15BuilderIntake-ESM datastore builder for ACCESS-CM2 datasets
Initialise a AccessEsm15Builder
- Parameters:
- path: str or list of str
Path or list of paths to crawl for assets/files.
- ensemble: boolean
Whether to treat each path as a separate member of an ensemble to join along a new member dimension
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.AccessEsm16Builder(path, ensemble, **kwargs)#
Bases:
AccessEsm15BuilderIntake-ESM datastore builder for ACCESS-ESM1.6 datasets
Initialise a AccessEsm15Builder
- Parameters:
- path: str or list of str
Path or list of paths to crawl for assets/files.
- ensemble: boolean
Whether to treat each path as a separate member of an ensemble to join along a new member dimension
- PATH_REGEX = '.*/output\\d+/([^/]*)(?:/[^/]*)?/.*\\.nc'#
- REALM_MAPPING#
- classmethod parser(file)#
Get the realm and member/experiment id from the file name
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.OnlineMltBuilder(path, ensemble, **kwargs)#
Bases:
AccessEsm16BuilderBuilder for the Mixed Layer Tracer Budget Diagnostics dataset located at /g/data/av17/access-nri/OM2/025deg_jra55_iaf_cycle6_online_mlt generated by Ryan Holmes (ryan.holmes@bom.gov.au)
Dataset constists of a trimmed down repeat of an existing experiment with additional diagnostics added.
These files are not added to the datastore: - output*/o2i.nc : these files have no calendar attribute on the ‘time’ axis
Initialise a AccessEsm15Builder
- Parameters:
- path: str or list of str
Path or list of paths to crawl for assets/files.
- ensemble: boolean
Whether to treat each path as a separate member of an ensemble to join along a new member dimension
- PATH_REGEX = '.*/(?:output\\d+|post_processed_diags|.*)/([^/]*)(?:/[^/]*)?/.*\\.nc'#
- REALM_MAPPING#
- classmethod parser(file)#
Get the realm and member/experiment id from the file name
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.AccessCm3Builder(path, **kwargs)#
Bases:
BaseBuilderIntake-ESM datastore builder for ACCESS-CM3 datasets
Initialise a AccessCm3Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.ROMSBuilder(path, **kwargs)#
Bases:
BaseBuilderIntake-ESM datastore builder for ROMS datasets
See bkgf/ROMSIceShelf for details on the ROMSIceShelf model.
Initialise a AccessOm2Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- classmethod parser(file)#
Parse info from a file asset
- Parameters:
- file: str
The path to the file
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.WoaBuilder(path, **kwargs)#
Bases:
BaseBuilderIntake-ESM datastore builder for WOA datasets
Initialise a WoaBuilder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- classmethod parser(file)#
Overwrite the parser method to add a grid id to the output dictionary.
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.
- class access_nri_intake.source.builders.Cmip6Builder(path, ensemble, **kwargs)#
Bases:
BaseBuilderIntake-ESM datastore builder for CMIP6 datasets
Initialise a Cmip6Builder
- Parameters:
- pathstr or list of str
Path or list of paths to crawl for assets/files.
- ensemble: bool = True#
- classmethod parser(file)#
No need to do much here - just parse the netCDF file and return the info as a dictionary. The realm is obtained from the file metadata following ACCESS-NRI/access-nri-intake-catalog#478.
- PATTERNS: list = ['*.nc']#
- paths#
- depth = 0#
- exclude_patterns = None#
- include_patterns = None#
- data_format = 'netcdf'#
- groupby_attrs = None#
- aggregations = None#
- storage_options = None#
- joblib_parallel_kwargs#
- parse()#
Parse metadata from assets.
- save(name, description, directory=None, use_parquet=False)#
Save datastore contents to a file.
- Parameters:
- name: str
The name of the file to save the datastore to.
- descriptionstr
Detailed multi-line description of the collection.
- directory: str, optional
The directory to save the datastore to. If None, use the current directory.
- use_parquet: bool, optional
Whether to save the datastore as a parquet file. Defaults to False, which saves as a CSV file. Parquet is both faster and saves space, but unlike CSV is not human-readable.
- validate_parser()#
Run the parser on a single file and check the schema of the info being parsed
- build()#
Builds a datastore from a list of netCDF files or zarr stores.
- property columns_with_iterables#
Return a set of the columns that have iterables
- classmethod parse_filename_freq(filename, frequencies=FREQUENCIES)#
Parse an ACCESS model filename and return a file id and any time information
- Parameters:
- filename: str
The filename to parse with the extension removed
- frequencies: dict, optional
A dictionary of regex patterns to match against the filename to determine the frequency
- redaction_fill: str, optional
The character to replace time information with. Defaults to “X”
- Returns:
- frequency: str | None
The frequency of the file if available in the filename, otherwise None
- classmethod parse_ncfile(file, time_dim='time')#
Get Intake-ESM datastore entry info from a netcdf file
- Parameters:
- fname: str
The path to the netcdf file
- time_dim: str
The name of the time dimension
- Returns:
- output_nc_info: _NCFileInfo
A dataclass containing the information parsed from the file
- Raises:
- EmptyFileError: If the file contains no variables
- property valid_assets: list[str]#
Return the list of valid assets that have been parsed and validated
- get_assets()#
- clean_dataframe()#
Clean the dataframe by excluding invalid assets and removing duplicate entries.