Adding sources#
The access_nri_intake.catalog
sub-package contains tools to create/extend/trim
intake-dataframe-catalogs of Intake-ESM datastores. The
access_nri_intake.catalog.manager.CatalogManager
class can be used to create a new
intake-dataframe-catalog or load an existing one. Intake-ESM datastore sources can be built (using an
access_nri_intake.source.builders
Builder) or loaded and then added to the catalog.
Translators are specified to translate the metadata in source datastores to be compatible with the
catalog schema.
When access_nri_intake.catalog
is first imported, it downloads and parses a specific commit of the
schema at ACCESS-NRI/schema.
The raw schema is stored in the variable access_nri_intake.catalog.EXP_JSONSCHEMA
(more on this later)
and a version with the “required” field replaced with access_nri_intake.catalog.CORE_COLUMNS
is stored
in the variable access_nri_intake.catalog.CATALOG_JSONSCHEMA
(this is to allow this field to be
customized). The latter defines what metadata must be included in the intake-dataframe-catalog, and what
types and fields are allowed. Subsequent imports read the downloaded schema, unless the schema is changed
(see Catalog schema), in which case the new schema is downloaded.
Translators#
Translators receive an Intake source to translate from and a list of metadata columns to target (these
are the columns in the intake-dataframe-catalog), and return a dataframe of translated data when their
translate
method is called. The returned dataframe has rows containing tuples of unique values
of the translated metadata after grouping by the metadata columns specified in
access_nri_intake.catalog.TRANSLATOR_GROUPBY_COLUMNS
.
When a source is added to the catalog and no translator
is specified, the translator defaults to access_nri_intake.catalog.translators.DefaultTranslator
which operates as follows:
If the input source is an Intake-ESM datastore, the translator will first look for the column in the
esmcat.df
attribute, casting iterable columns to tuples. If the source is not an Intake-ESM datastore, this step is skipped.If that fails, the translator will then look for the column name as an attribute on the source itself
If that fails, the translator will then look for the column name in the
metadata
attribute of the source
The access_nri_intake.catalog.translators.DefaultTranslator
is appropriate for Intake-ESM
datastore sources built using access_nri_intake.source.builders
because the schema used to validate
the datastores is consistent with the schema used to validate the catalog.
When adding a pre-generated Intake-ESM datastore to the catalog, a dedicated Translator may be required.
For example, the
CMIP5 and CMIP6 NCI-managed Intake-ESM datastores
that are included in the ACCESS-NRI catalog use dedicated translators which implement specific translations
from the CMIP vocabulary used in the Intake-ESM datastore to the vocabulary used in the catalog schema (e.g.
see access_nri_intake.catalog.translators.Cmip6Translator
).
Creating a new Translator#
New Translators should inherit from access_nri_intake.catalog.translators.DefaultTranslator
. The
general approach to creating a new translator is to create a specific translator method for each input
column that cannot use the default translator. These methods should return a dataframe object. Take a look
at the existing Translator class implementations for examples.
API for access_nri_intake.catalog
#
This documentation has been auto-generated using sphinx-autoapi
- access_nri_intake.catalog
- Submodules
- access_nri_intake.catalog.manager
- access_nri_intake.catalog.translators
- access_nri_intake.catalog.utils
- Attributes
- Package Contents
- Submodules