access_nri_intake.experiment.utils
==================================

.. py:module:: access_nri_intake.experiment.utils


Exceptions
----------

.. autoapisummary::

   access_nri_intake.experiment.utils.DataStoreWarning
   access_nri_intake.experiment.utils.DataStoreError
   access_nri_intake.experiment.utils.MultipleDataStoreError


Classes
-------

.. autoapisummary::

   access_nri_intake.experiment.utils.DataStoreInvalidCause
   access_nri_intake.experiment.utils.DatastoreInfo


Functions
---------

.. autoapisummary::

   access_nri_intake.experiment.utils.verify_ds_current
   access_nri_intake.experiment.utils.hash_catalog
   access_nri_intake.experiment.utils.find_experiment_files
   access_nri_intake.experiment.utils.parse_kwarg
   access_nri_intake.experiment.utils.validate_args


Module Contents
---------------

.. py:exception:: DataStoreWarning

   Bases: :py:obj:`RuntimeWarning`


   Base class for warnings about dubious runtime behavior.


   Initialize self.  See help(type(self)) for accurate signature.
















   ..
       !! processed by numpydoc !!


.. py:exception:: DataStoreError

   Bases: :py:obj:`RuntimeError`


   Unspecified run-time error.


   Initialize self.  See help(type(self)) for accurate signature.
















   ..
       !! processed by numpydoc !!


.. py:exception:: MultipleDataStoreError

   Bases: :py:obj:`DataStoreError`


   Unspecified run-time error.


   Initialize self.  See help(type(self)) for accurate signature.
















   ..
       !! processed by numpydoc !!


.. py:class:: DataStoreInvalidCause

   Bases: :py:obj:`str`, :py:obj:`enum.Enum`


   Enum to store the causes of invalid datastores.


   Initialize self.  See help(type(self)) for accurate signature.
















   ..
       !! processed by numpydoc !!


   .. py:attribute:: NO_ISSUE
      :value: ''



   .. py:attribute:: UNKNOWN_ISSUE
      :value: 'unknown issue'



   .. py:attribute:: MISMATCH_NAME
      :value: 'mismatch between json and csv.gz file names'



   .. py:attribute:: JSON_CORRUPTED
      :value: 'datastore JSON corrupted'



   .. py:attribute:: PATH_MISMATCH
      :value: 'path in JSON does not match csv.gz'



   .. py:attribute:: CATALOG_MISMATCH
      :value: 'catalog_filename in JSON does not match csv.gz filename'



   .. py:attribute:: COLUMN_MISMATCH
      :value: 'columns specified in JSON do not match csv.gz file'



.. py:class:: DatastoreInfo

   
   Dataclass to group json & csv file handles for a datastore, along with it's
   validity and any straightforwardly identifiable issues with the datastore.
















   ..
       !! processed by numpydoc !!

   .. py:attribute:: json_handle
      :type:  pathlib.Path | str


   .. py:attribute:: csv_handle
      :type:  pathlib.Path | str


   .. py:attribute:: valid
      :type:  bool
      :value: True



   .. py:attribute:: invalid_ds_cause
      :type:  str
      :value: ''



   .. py:method:: match_broken_internal_path(ds_json)

      
      If our internal reference starts with file:///, then we need to
      ensure that the rest of this *perfectly* matches the csv file or the
      datastore will break when we try to open it.

      The internal reference (on Gadi) typically starts with file:///path/filename.csv.gz
      What this means is that we might need to be careful if a datastore is moved.
      What intake_esm does is:
      - look at ds_json["catalog_file"] and check that this exists, using a fsspec
      get_mapper.
      - If it doesn't exist, then it prepends the dirname of fsspec.get_mapper().root
      to the path, which winds up creating a horrendously bundled path, something
      like '/home/189/ct1163/experiments_274/file:///home/189/ct1163/test_datastore_built_in_homedir.csv.gz

      - We need to be careful, because here the .name attribute of the Path object
      might still match, even though the handles are invalid

      :Parameters:

          **ds_json** : dict
              The json object of the datastore.



      :Returns:

          bool
              Whether the internal path is broken.











      ..
          !! processed by numpydoc !!


.. py:function:: verify_ds_current(ds_info, experiment_files)

   
   Verify if the datastore is current, testing for missing/extra files, and files
   that appear to have changed since the datastore was built.


   :Parameters:

       **ds_info** : DatastoreInfo
           The datastore information object.

       **experiment_files** : set[Path]
           The set of files found in the experiment directory. These are typically going
           to be generated by the find_experiment_files function.



   :Returns:

       bool
           Whether the datastore is valid and up to date.











   ..
       !! processed by numpydoc !!

.. py:function:: hash_catalog(catalog_dir, datastore_name, builder_instance)

   
   Use yamanifest to hash the files contained in the builder, and then stick that in a
   .$datastore_name.hash file in the catalog_dir. This will be used to check if the datastore
   is current.
















   ..
       !! processed by numpydoc !!

.. py:function:: find_experiment_files(builder, experiment_dir, builder_kwargs = None)

   
   Find all the relevant files in the experiment directory and return them as a set, using
   the builder.get_assets() method.


   :Parameters:

       **builder** : Builder
           The builder object that will be used to build the datastore.

       **experiment_dir** : Path
           The directory containing the experiment.

       **builder_kwargs** : dict, optional
           Any additional keyword arguments to pass to the builder



   :Returns:

       set[str]
           A set of the full paths of the files in the experiment directory.











   ..
       !! processed by numpydoc !!

.. py:function:: parse_kwarg(kwarg)

   
   Builder kwargs can be passed as `--builder-kwargs arg1=val1 arg2=val2` etc.
   The argparse.parse_args() function will return these as a list of strings -
   eg ['arg1=val1', 'arg2=val2'].  This function parses one of these strings into
   a tuple, which is later converted to a dictionary.  It will require some
   additional type coercion to pass on non string kwargs.

   The builders we use only take either a path, list of paths, or an `ensemble`
   kwarg. Ensemble is a boolean.















   ..
       !! processed by numpydoc !!

.. py:function:: validate_args(builder, builder_kwargs)

   
   Take a builder and validate the kwargs provided against the builder's signature.

   This is provided to smooth debugging when wrong kwargs are passed from the command
   line.

   :Parameters:

       **builder** : Builder
           The builder object that will be used to build the datastore.

       **builder_kwargs** : dict[str, Any]
           The keyword arguments to pass to the builder.



   :Returns:

       None
           ..




   :Raises:

       TypeError
           If the builder_kwargs do not match the builder's signature.







   ..
       !! processed by numpydoc !!

