carpet_concentrations.input4MIPs.dataset

Input4MIPsDataset and associated metadata

Input4MIPsMetadata

class Input4MIPsMetadata(activity_id, contact, Conventions, dataset_category, frequency, further_info_url, grid_label, institution, institution_id, mip_era, nominal_resolution, realm, source_version, source_id, source, target_mip, title)[source]

Bases: object

Input4MIPs metadata

These are all required fields.

Notes

variable_id is not included here because it should be derived from the data (which is combined with the metadata elsewhere).

Conventions: str: CF conventions adhered to by the dataset

activity_id: str: Activity ID of the dataset

contact: str: Contact for the dataset

dataset_category: str: Datset category

frequency: str: Time frequency of the dataset

further_info_url: str: URL with further information

grid_label: str: Grid label of the dataset

institution: str: Institution that produced the dataset

institution_id: str: Unique ID of the institution that produced the dataset

mip_era: str: MIP era of the dataset

nominal_resolution: str: Nominal resolution of the dataset

realm: str: Realm of the dataset

source: str: Source of the dataset (human-readable)

source_id: str: Source id of the dataset

source_version: str: Version of the dataset

target_mip: str: Target MIP of the dataset

title: str: Title of the dataset (human-readable)

to_dataset_attributes()[source]

Convert to a format that can be used as dataset attributes

Return type: dict[str, str]

Input4MIPsMetadataOptional

class Input4MIPsMetadataOptional(comment=None, data_specs_version=None, external_variables=None, grid=None, history=None, product=None, references=None, region=None, release_year=None, source_description=None, source_type=None, table_id=None, table_info=None, license=None)[source]

Bases: object

Input4MIPs optional metadata

These are all optional fields.

Notes

This is currently written such that no fields outside of these can be provided. We don’t fully understand the input4MIPs rules, so this could easily be the wrong choice. Refactoring should be relatively straightforward if needed. It would make sense that these fields are locked to avoid clashes with compulsory metadata…?

comment: str | None: Comment on the dataset

data_specs_version: str | None: Data specs version used when creating the dataset

external_variables: str | None

Variables relevant to the dataset that aren’t included in the dataset itself

For example, cell area variables like ‘areacella’

grid: str | None: Human-readable version of the grid on which the dataset applies

history: str | None: File modification history

license: str | None: License information

product: str | None: Product the data represents

references: str | None: References related to the dataset

region: str | None: Region to which the dataset applies

release_year: str | None: Release year of the dataset

source_description: str | None: Description of the dataset’s source

source_type: str | None: Description of the type of the dataset’s source

table_id: str | None: No idea, maybe the CMOR table used to write the dataset

table_info: str | None: No idea, maybe info about the CMOR table used to write the dataset

to_dataset_attributes()[source]

Convert to a format that can be used as dataset attributes

Return type: dict[str, str]

Input4MIPsDataset

class Input4MIPsDataset(ds, directory_template='{activity_id}/{mip_era}/{target_mip}/{institution_id}/{source_id}/{realm}/{frequency}/{variable_id}/{grid_label}/v{version}', filename_template='{variable_id}_{activity_id}_{dataset_category}_{target_mip}_{source_id}_{grid_label}_{start_date}_{end_date}.nc')[source]

Bases: object

Input4MIPs dataset

Holds input4MIPs data and also helps write them to disk in a way that conforms to input4MIPs standards

directory_template: str: Template used to determine the directory in which to save the data

ds: xarray.core.dataset.Dataset: Dataset

filename_template: str: Template used to determine the filename when saving the data

classmethod from_metadata_autoadd_bounds_to_dimensions(ds, dimensions, metadata, metadata_optional=None, time_dimension='time', monthly_time_bounds=True, copy=True, **kwargs)[source]

Create instance from metadata and an unbounded dataset

For the given dimensions, bounds are checked and added if needed. The metadata is then used to fill out ds’s metadata before initialising.

Parameters

ds (xr.Dataset) – Dataset
dimensions (tuple[str, ...]) – Dimensions of the dataset, these are checked for appropriate bounds.
metadata (Input4MIPsMetadata) – Metadata (required)
metadata_optional (Input4MIPsMetadataOptional | None) – Optional metadata
time_dimension (str) – The name of the time dimension. This is provided to give full control of the application of monthly_time_bounds to the user.
monthly_time_bounds (bool) – Should added time bounds cover each month? This is needed for data on a monthly timestep because the middle of each timestep is not the start and end of the month in the case when subsequent months don’t have the same number of days.
copy (bool) – Should a copy of the dataset be made? If no, the data is modified in place which can cause unexpected changes if references are not appropriately managed.
**kwargs (Any) – Other initialisation arguments for the instance. They are passed directly to the constructor.

Returns

Input4MIPsDataset – Prepared instance

Raises

AssertionError – ds.attrs is already set or there is more than one variable in ds

get_filepath(ds_disk, root_data_dir)[source]

Get filepath

Parameters

ds_disk (xarray.core.dataset.Dataset) – Disk ready dataset
root_data_dir (pathlib.Path) – Root directory in which to generate the filepath

Returns

pathlib.Path – Filepath

write(root_data_dir, unlimited_dims=('time',), encoding_kwargs=None)[source]

Write to disk

Parameters

root_data_dir (Path) – Root directory in which to write the file
unlimited_dims (tuple[str, ...]) – Dimensions which should be unlimited
encoding_kwargs (dict[str, Any] | None) – Kwargs to use when encoding to disk. These are passed to xr.Dataset.to_netcdf()

Returns

Path – Where the file was written

format_date

format_date(date, ds_frequency)[source]

Format date for filepath

Parameters

date (cftime.datetime | dt.datetime) – Date to format
ds_frequency (str) – Frequency of the underlying dataset

Returns

str – Formatted date

get_version

get_version(creation_date)[source]

Get version string for filepath

Parameters: creation_date (str) – Creation date
Returns: str – Version string

add_time_bounds

add_time_bounds(ds, monthly_time_bounds=False, output_dim='bounds')[source]

Add time bounds to a dataset

This should be pushed upstream to cf-xarray at some point probably

Parameters

ds (xarray.core.dataset.Dataset) – Dataset to which to add time bounds
monthly_time_bounds (bool) – Are we looking at monthly data i.e. should the time bounds run from the start of one month to the next (which isn’t regular spacing but is most often what is desired/required)

Returns

xarray.core.dataset.Dataset – Dataset with time bounds

Notes

There is no copy here, ds is modified in place (call xarray.Dataset.copy() before passing if you don’t want this).

verify_disk_ready

verify_disk_ready(ds)[source]

Verify that a dataset is disk ready

Parameters: ds (xarray.core.dataset.Dataset) – Dataset to check
Return type: None

Notes

Very rough, doesn’t really do anything right now

generate_tracking_id

generate_tracking_id()[source]

Generate tracking ID

Returns: str – Tracking ID

generate_creation_timestamp

generate_creation_timestamp()[source]

Generate creation timestamp, formatted as needed for input4MIPs files

Returns: str – Creation timestamp