access_nri_intake.experiment.utils#
Exceptions#
Base class for warnings about dubious runtime behavior. |
|
Unspecified run-time error. |
|
Unspecified run-time error. |
Classes#
Enum to store the causes of invalid datastores. |
|
Dataclass to group json & csv file handles for a datastore, along with it's |
Functions#
|
Verify if the datastore is current, testing for missing/extra files, and files |
|
Use yamanifest to hash the files contained in the builder, and then stick that in a |
|
Find all the relevant files in the experiment directory and return them as a set, using |
|
Builder kwargs can be passed as --builder-kwargs arg1=val1 arg2=val2 etc. |
|
Take a builder and validate the kwargs provided against the builder's signature. |
Module Contents#
- exception access_nri_intake.experiment.utils.DataStoreWarning#
Bases:
RuntimeWarningBase class for warnings about dubious runtime behavior.
Initialize self. See help(type(self)) for accurate signature.
- exception access_nri_intake.experiment.utils.DataStoreError#
Bases:
RuntimeErrorUnspecified run-time error.
Initialize self. See help(type(self)) for accurate signature.
- exception access_nri_intake.experiment.utils.MultipleDataStoreError#
Bases:
DataStoreErrorUnspecified run-time error.
Initialize self. See help(type(self)) for accurate signature.
- class access_nri_intake.experiment.utils.DataStoreInvalidCause#
Bases:
str,enum.EnumEnum to store the causes of invalid datastores.
Initialize self. See help(type(self)) for accurate signature.
- NO_ISSUE = ''#
- UNKNOWN_ISSUE = 'unknown issue'#
- MISMATCH_NAME = 'mismatch between json and csv.gz file names'#
- JSON_CORRUPTED = 'datastore JSON corrupted'#
- PATH_MISMATCH = 'path in JSON does not match csv.gz'#
- CATALOG_MISMATCH = 'catalog_filename in JSON does not match csv.gz filename'#
- COLUMN_MISMATCH = 'columns specified in JSON do not match csv.gz file'#
- class access_nri_intake.experiment.utils.DatastoreInfo#
Dataclass to group json & csv file handles for a datastore, along with it’s validity and any straightforwardly identifiable issues with the datastore.
- json_handle: pathlib.Path | str#
- csv_handle: pathlib.Path | str#
- valid: bool = True#
- invalid_ds_cause: str = ''#
- match_broken_internal_path(ds_json)#
If our internal reference starts with file:///, then we need to ensure that the rest of this perfectly matches the csv file or the datastore will break when we try to open it.
The internal reference (on Gadi) typically starts with file:///path/filename.csv.gz What this means is that we might need to be careful if a datastore is moved. What intake_esm does is: - look at ds_json[“catalog_file”] and check that this exists, using a fsspec get_mapper. - If it doesn’t exist, then it prepends the dirname of fsspec.get_mapper().root to the path, which winds up creating a horrendously bundled path, something like ‘/home/189/ct1163/experiments_274/file:///home/189/ct1163/test_datastore_built_in_homedir.csv.gz
We need to be careful, because here the .name attribute of the Path object
might still match, even though the handles are invalid
- Parameters:
- ds_jsondict
The json object of the datastore.
- Returns:
- bool
Whether the internal path is broken.
- access_nri_intake.experiment.utils.verify_ds_current(ds_info, experiment_files)#
Verify if the datastore is current, testing for missing/extra files, and files that appear to have changed since the datastore was built.
- Parameters:
- ds_infoDatastoreInfo
The datastore information object.
- experiment_filesset[Path]
The set of files found in the experiment directory. These are typically going to be generated by the find_experiment_files function.
- Returns:
- bool
Whether the datastore is valid and up to date.
- access_nri_intake.experiment.utils.hash_catalog(catalog_dir, datastore_name, builder_instance)#
Use yamanifest to hash the files contained in the builder, and then stick that in a .$datastore_name.hash file in the catalog_dir. This will be used to check if the datastore is current.
- access_nri_intake.experiment.utils.find_experiment_files(builder, experiment_dir, builder_kwargs=None)#
Find all the relevant files in the experiment directory and return them as a set, using the builder.get_assets() method.
- Parameters:
- builderBuilder
The builder object that will be used to build the datastore.
- experiment_dirPath
The directory containing the experiment.
- builder_kwargsdict, optional
Any additional keyword arguments to pass to the builder
- Returns:
- set[str]
A set of the full paths of the files in the experiment directory.
- access_nri_intake.experiment.utils.parse_kwarg(kwarg)#
Builder kwargs can be passed as –builder-kwargs arg1=val1 arg2=val2 etc. The argparse.parse_args() function will return these as a list of strings - eg [‘arg1=val1’, ‘arg2=val2’]. This function parses one of these strings into a tuple, which is later converted to a dictionary. It will require some additional type coercion to pass on non string kwargs.
The builders we use only take either a path, list of paths, or an ensemble kwarg. Ensemble is a boolean.
- access_nri_intake.experiment.utils.validate_args(builder, builder_kwargs)#
Take a builder and validate the kwargs provided against the builder’s signature.
This is provided to smooth debugging when wrong kwargs are passed from the command line.
- Parameters:
- builderBuilder
The builder object that will be used to build the datastore.
- builder_kwargsdict[str, Any]
The keyword arguments to pass to the builder.
- Returns:
- None
- Raises:
- TypeError
If the builder_kwargs do not match the builder’s signature.