access_nri_intake.experiment.utils#

Exceptions#

DataStoreWarning

Base class for warnings about dubious runtime behavior.

DataStoreError

Unspecified run-time error.

MultipleDataStoreError

Unspecified run-time error.

Classes#

DataStoreInvalidCause

Enum to store the causes of invalid datastores.

DatastoreInfo

Dataclass to group json & csv file handles for a datastore, along with it's

Functions#

verify_ds_current(ds_info, experiment_files)

Verify if the datastore is current, testing for missing/extra files, and files

hash_catalog(catalog_dir, datastore_name, builder_instance)

Use yamanifest to hash the files contained in the builder, and then stick that in a

find_experiment_files(builder, experiment_dir[, ...])

Find all the relevant files in the experiment directory and return them as a set, using

parse_kwarg(kwarg)

Builder kwargs can be passed as --builder-kwargs arg1=val1 arg2=val2 etc.

validate_args(builder, builder_kwargs)

Take a builder and validate the kwargs provided against the builder's signature.

Module Contents#

exception access_nri_intake.experiment.utils.DataStoreWarning#

Bases: RuntimeWarning

Base class for warnings about dubious runtime behavior.

Initialize self. See help(type(self)) for accurate signature.

exception access_nri_intake.experiment.utils.DataStoreError#

Bases: RuntimeError

Unspecified run-time error.

Initialize self. See help(type(self)) for accurate signature.

exception access_nri_intake.experiment.utils.MultipleDataStoreError#

Bases: DataStoreError

Unspecified run-time error.

Initialize self. See help(type(self)) for accurate signature.

class access_nri_intake.experiment.utils.DataStoreInvalidCause#

Bases: str, enum.Enum

Enum to store the causes of invalid datastores.

Initialize self. See help(type(self)) for accurate signature.

NO_ISSUE = ''#
UNKNOWN_ISSUE = 'unknown issue'#
MISMATCH_NAME = 'mismatch between json and csv.gz file names'#
JSON_CORRUPTED = 'datastore JSON corrupted'#
PATH_MISMATCH = 'path in JSON does not match csv.gz'#
CATALOG_MISMATCH = 'catalog_filename in JSON does not match csv.gz filename'#
COLUMN_MISMATCH = 'columns specified in JSON do not match csv.gz file'#
class access_nri_intake.experiment.utils.DatastoreInfo#

Dataclass to group json & csv file handles for a datastore, along with it’s validity and any straightforwardly identifiable issues with the datastore.

json_handle: pathlib.Path | str#
csv_handle: pathlib.Path | str#
valid: bool = True#
invalid_ds_cause: str = ''#
match_broken_internal_path(ds_json)#

If our internal reference starts with file:///, then we need to ensure that the rest of this perfectly matches the csv file or the datastore will break when we try to open it.

The internal reference (on Gadi) typically starts with file:///path/filename.csv.gz What this means is that we might need to be careful if a datastore is moved. What intake_esm does is: - look at ds_json[“catalog_file”] and check that this exists, using a fsspec get_mapper. - If it doesn’t exist, then it prepends the dirname of fsspec.get_mapper().root to the path, which winds up creating a horrendously bundled path, something like ‘/home/189/ct1163/experiments_274/file:///home/189/ct1163/test_datastore_built_in_homedir.csv.gz

  • We need to be careful, because here the .name attribute of the Path object

might still match, even though the handles are invalid

Parameters:
ds_jsondict

The json object of the datastore.

Returns:
bool

Whether the internal path is broken.

access_nri_intake.experiment.utils.verify_ds_current(ds_info, experiment_files)#

Verify if the datastore is current, testing for missing/extra files, and files that appear to have changed since the datastore was built.

Parameters:
ds_infoDatastoreInfo

The datastore information object.

experiment_filesset[Path]

The set of files found in the experiment directory. These are typically going to be generated by the find_experiment_files function.

Returns:
bool

Whether the datastore is valid and up to date.

access_nri_intake.experiment.utils.hash_catalog(catalog_dir, datastore_name, builder_instance)#

Use yamanifest to hash the files contained in the builder, and then stick that in a .$datastore_name.hash file in the catalog_dir. This will be used to check if the datastore is current.

access_nri_intake.experiment.utils.find_experiment_files(builder, experiment_dir, builder_kwargs=None)#

Find all the relevant files in the experiment directory and return them as a set, using the builder.get_assets() method.

Parameters:
builderBuilder

The builder object that will be used to build the datastore.

experiment_dirPath

The directory containing the experiment.

builder_kwargsdict, optional

Any additional keyword arguments to pass to the builder

Returns:
set[str]

A set of the full paths of the files in the experiment directory.

access_nri_intake.experiment.utils.parse_kwarg(kwarg)#

Builder kwargs can be passed as –builder-kwargs arg1=val1 arg2=val2 etc. The argparse.parse_args() function will return these as a list of strings - eg [‘arg1=val1’, ‘arg2=val2’]. This function parses one of these strings into a tuple, which is later converted to a dictionary. It will require some additional type coercion to pass on non string kwargs.

The builders we use only take either a path, list of paths, or an ensemble kwarg. Ensemble is a boolean.

access_nri_intake.experiment.utils.validate_args(builder, builder_kwargs)#

Take a builder and validate the kwargs provided against the builder’s signature.

This is provided to smooth debugging when wrong kwargs are passed from the command line.

Parameters:
builderBuilder

The builder object that will be used to build the datastore.

builder_kwargsdict[str, Any]

The keyword arguments to pass to the builder.

Returns:
None
Raises:
TypeError

If the builder_kwargs do not match the builder’s signature.