carpet_concentrations.input4MIPs.dataset

Input4MIPsDataset and associated metadata

Input4MIPsMetadata

class Input4MIPsMetadata(activity_id, contact, Conventions, dataset_category, frequency, further_info_url, grid_label, institution, institution_id, mip_era, nominal_resolution, realm, source_version, source_id, source, target_mip, title)[source]

Bases: object

Input4MIPs metadata

These are all required fields.

Notes

variable_id is not included here because it should be derived from the data (which is combined with the metadata elsewhere).

Conventions: str

CF conventions adhered to by the dataset

activity_id: str

Activity ID of the dataset

contact: str

Contact for the dataset

dataset_category: str

Datset category

frequency: str

Time frequency of the dataset

further_info_url: str

URL with further information

grid_label: str

Grid label of the dataset

institution: str

Institution that produced the dataset

institution_id: str

Unique ID of the institution that produced the dataset

mip_era: str

MIP era of the dataset

nominal_resolution: str

Nominal resolution of the dataset

realm: str

Realm of the dataset

source: str

Source of the dataset (human-readable)

source_id: str

Source id of the dataset

source_version: str

Version of the dataset

target_mip: str

Target MIP of the dataset

title: str

Title of the dataset (human-readable)

to_dataset_attributes()[source]

Convert to a format that can be used as dataset attributes

Return type

dict[str, str]

Input4MIPsMetadataOptional

class Input4MIPsMetadataOptional(comment=None, data_specs_version=None, external_variables=None, grid=None, history=None, product=None, references=None, region=None, release_year=None, source_description=None, source_type=None, table_id=None, table_info=None, license=None)[source]

Bases: object

Input4MIPs optional metadata

These are all optional fields.

Notes

This is currently written such that no fields outside of these can be provided. We don’t fully understand the input4MIPs rules, so this could easily be the wrong choice. Refactoring should be relatively straightforward if needed. It would make sense that these fields are locked to avoid clashes with compulsory metadata…?

comment: str | None

Comment on the dataset

data_specs_version: str | None

Data specs version used when creating the dataset

external_variables: str | None

Variables relevant to the dataset that aren’t included in the dataset itself

For example, cell area variables like ‘areacella’

grid: str | None

Human-readable version of the grid on which the dataset applies

history: str | None

File modification history

license: str | None

License information

product: str | None

Product the data represents

references: str | None

References related to the dataset

region: str | None

Region to which the dataset applies

release_year: str | None

Release year of the dataset

source_description: str | None

Description of the dataset’s source

source_type: str | None

Description of the type of the dataset’s source

table_id: str | None

No idea, maybe the CMOR table used to write the dataset

table_info: str | None

No idea, maybe info about the CMOR table used to write the dataset

to_dataset_attributes()[source]

Convert to a format that can be used as dataset attributes

Return type

dict[str, str]

Input4MIPsDataset

class Input4MIPsDataset(ds, directory_template='{activity_id}/{mip_era}/{target_mip}/{institution_id}/{source_id}/{realm}/{frequency}/{variable_id}/{grid_label}/v{version}', filename_template='{variable_id}_{activity_id}_{dataset_category}_{target_mip}_{source_id}_{grid_label}_{start_date}_{end_date}.nc')[source]

Bases: object

Input4MIPs dataset

Holds input4MIPs data and also helps write them to disk in a way that conforms to input4MIPs standards

directory_template: str

Template used to determine the directory in which to save the data

ds: xarray.core.dataset.Dataset

Dataset

filename_template: str

Template used to determine the filename when saving the data

classmethod from_metadata_autoadd_bounds_to_dimensions(ds, dimensions, metadata, metadata_optional=None, time_dimension='time', monthly_time_bounds=True, copy=True, **kwargs)[source]

Create instance from metadata and an unbounded dataset

For the given dimensions, bounds are checked and added if needed. The metadata is then used to fill out ds’s metadata before initialising.

Parameters
  • ds (xr.Dataset) – Dataset

  • dimensions (tuple[str, ...]) – Dimensions of the dataset, these are checked for appropriate bounds.

  • metadata (Input4MIPsMetadata) – Metadata (required)

  • metadata_optional (Input4MIPsMetadataOptional | None) – Optional metadata

  • time_dimension (str) – The name of the time dimension. This is provided to give full control of the application of monthly_time_bounds to the user.

  • monthly_time_bounds (bool) – Should added time bounds cover each month? This is needed for data on a monthly timestep because the middle of each timestep is not the start and end of the month in the case when subsequent months don’t have the same number of days.

  • copy (bool) – Should a copy of the dataset be made? If no, the data is modified in place which can cause unexpected changes if references are not appropriately managed.

  • **kwargs (Any) – Other initialisation arguments for the instance. They are passed directly to the constructor.

Returns

Input4MIPsDataset – Prepared instance

Raises

AssertionErrords.attrs is already set or there is more than one variable in ds

get_filepath(ds_disk, root_data_dir)[source]

Get filepath

Parameters
Returns

pathlib.Path – Filepath

write(root_data_dir, unlimited_dims=('time',), encoding_kwargs=None)[source]

Write to disk

Parameters
  • root_data_dir (Path) – Root directory in which to write the file

  • unlimited_dims (tuple[str, ...]) – Dimensions which should be unlimited

  • encoding_kwargs (dict[str, Any] | None) – Kwargs to use when encoding to disk. These are passed to xr.Dataset.to_netcdf()

Returns

Path – Where the file was written

format_date

format_date(date, ds_frequency)[source]

Format date for filepath

Parameters
  • date (cftime.datetime | dt.datetime) – Date to format

  • ds_frequency (str) – Frequency of the underlying dataset

Returns

str – Formatted date

get_version

get_version(creation_date)[source]

Get version string for filepath

Parameters

creation_date (str) – Creation date

Returns

str – Version string

add_time_bounds

add_time_bounds(ds, monthly_time_bounds=False, output_dim='bounds')[source]

Add time bounds to a dataset

This should be pushed upstream to cf-xarray at some point probably

Parameters
  • ds (xarray.core.dataset.Dataset) – Dataset to which to add time bounds

  • monthly_time_bounds (bool) – Are we looking at monthly data i.e. should the time bounds run from the start of one month to the next (which isn’t regular spacing but is most often what is desired/required)

Returns

xarray.core.dataset.Dataset – Dataset with time bounds

Notes

There is no copy here, ds is modified in place (call xarray.Dataset.copy() before passing if you don’t want this).

verify_disk_ready

verify_disk_ready(ds)[source]

Verify that a dataset is disk ready

Parameters

ds (xarray.core.dataset.Dataset) – Dataset to check

Return type

None

Notes

Very rough, doesn’t really do anything right now

generate_tracking_id

generate_tracking_id()[source]

Generate tracking ID

Returns

str – Tracking ID

generate_creation_timestamp

generate_creation_timestamp()[source]

Generate creation timestamp, formatted as needed for input4MIPs files

Returns

str – Creation timestamp