Skip to main content

icoscp_core

Project description

icoscp_core

A foundational ICOS Carbon Portal core products Python library for metadata and data access, designed to work with multiple data repositories who use ICOS Carbon Portal core server software stack to host and serve their data. At the moment, three repositories are supported: ICOS, SITES, and ICOS Cities.

Design goals

  • good alignment with the server APIs
  • offer basic functionality, but in a robust way, and without sacrifices in performance
  • avoid unnecessary dependencies (only depend on numpy and a small library dacite), but aim for good integration with pandas
  • provide a solid foundation for future versions of icoscp—an ICOS-specific meta- and data access library developed by the Elaborated Products team
  • extensive use of type annotations and Python data classes, to safeguard agains preventable bugs, both in the library itself, and in the tools and apps written on top of it; a goal is to satisfy the typechecker in strict mode
  • usage of autogenerated data classes produced from Scala back end code representing various metadata entities (e.g. data objects, stations) and their parts
  • simultaneous support of three cross-cutting concerns:
    • multiple repositories (ICOS, SITES, ICOS Cities)
    • multiple ways of authentication
    • data access through the HTTP API (on an arbitrary machine) and through file system (on a Jupyter notebook with "backdoor" data access); in the latter case the library is responsible for reporting the data usage event.

Getting started

The library is available on PyPI, can be installed with pip:

$ pip install icoscp_core

The code examples below are usually provided for ICOS. For other Repositories (SITES or ICOS Cities), in the import directives, use icoscp_core.sites or icoscp_core.cities, respectively, instead of icoscp_core.icos.

Authentication

Metadata access does not require authentication, and is achieved by a simple import:

from icoscp_core.icos import meta

When using the library on an accordingly configured Jupyter notebook service hosted by the ICOS Carbon Portal (https://exploretest.icos-cp.eu/ at the time of this writing), authentication is not required for certain kinds of data access (specifically methods get_columns_as_arrays and batch_get_columns_as_arrays).

Authentication can be initialized in a number of ways.

Credentials and token cache file (default)

This approach should only be used on machines the developer trusts.

A username/password account with the respective authentication service (links for: ICOS, SITES, ICOS Cities) is required for this. Obfuscated (not readable by humans) password is stored in a file on the local machine in a default user-specific folder. To initialize this file, run the following code interactively (only needs to be once for every machine):

from icoscp_core.icos import auth

auth.init_config_file()

After the initialization step is done, access to the metadata and data services is achieved by a simple import:

from icoscp_core.icos import meta, data

As an alternative, the developer may choose to use a specific file to store the credentials and token cache. In this scenario, data service needs to be initialized as follows:

from icoscp_core.icos import bootstrap
auth, meta, data = bootstrap.fromPasswordFile("<desired path to the file>")

# the next line needs to be run interactively (only once per file)
auth.init_config_file()

Static authentication token (prototyping)

This option is good for testing, on a public machine or in general. Its only disadvantage is that the tokens have limited period of validity (100000 seconds, less than 28 hours), but this is precisely what makes it acceptable to include them directly in the Python source code.

The token can be obtained from the "My Account" page (links for: ICOS, SITES, ICOS Cities), which can be accessed by logging in using one of the supported authentication mechanisms (username/password, university sign-in, OAuth sign in). After this the bootstrapping can be done as follows:

from icoscp_core.icos import bootstrap
cookie_token = 'cpauthToken=WzE2OTY2NzQ5OD...'
meta, data = bootstrap.fromCookieToken(cookie_token)

Explicit credentials (advanced option)

The user may choose to use their own mechanism of providing the credentials to initialize the authentication. This should be considered as an advanced option. (Please do not put your password as clear text in your Python code!) This can be achieved as follows:

from icoscp_core.icos import bootstrap
meta, data = bootstrap.fromCredentials(username_variable, password_containing_variable)

Metadata access

from icoscp_core.icos import meta, ATMO_STATION
from icoscp_core.metaclient import TimeFilter, SizeFilter, SamplingHeightFilter

# fetches the list of known data types, including metadata associated with them
all_datatypes = meta.list_datatypes()

# data types with structured data access
previewable_datatypes = [dt for dt in all_datatypes if dt.has_data_access]

# fetch lists of stations
icos_stations = meta.list_stations()
atmo_stations = meta.list_stations(ATMO_STATION)
all_known_stations = meta.list_stations(False)

# list data objects; a contrived, complicated example to demonstrate the possibilities
# all the arguments are optional; see Python help for the method for more details
filtered_atc_co2 = meta.list_data_objects(
	datatype = [
		"http://meta.icos-cp.eu/resources/cpmeta/atcCo2L2DataObject",
		"http://meta.icos-cp.eu/resources/cpmeta/atcCo2NrtGrowingDataObject"
	],
	station = "http://meta.icos-cp.eu/resources/stations/AS_GAT",
	filters = [
		TimeFilter("submTime", ">", "2023-07-01T12:00:00Z"),
		TimeFilter("submTime", "<", "2023-07-10T12:00:00Z"),
		SizeFilter(">", 50000),
		SamplingHeightFilter("=", 216)
	],
	include_deprecated = True,
	order_by = "fileName",
	limit = 50
)

# get detailed metadata for a data object
dobj_uri = 'https://meta.icos-cp.eu/objects/BbEO5i3rDLhS_vR-eNNLjp3Q'
dobj_detailed_meta = meta.get_dobj_meta(dobj_uri)

Detailed help on the available metadata access methods can be obtained from help(meta) call.


Data access

To fetch data (after having located interesting data objects in the previous step):

from icoscp_core.icos import data
import pandas as pd

# save the original data object contents to a folder on your machine
filename = data.save_to_folder(dobj_uri, '/myhome/icosdata/')

# get CSV representation of all previewable columns, parse it with pandas
csv_stream = data.get_csv_byte_stream(dobj_uri)
df = pd.read_csv(csv_stream)

# get dataset columns as typed arrays, ready to be imported into pandas
dobj_arrays = data.get_columns_as_arrays(dobj_detailed_meta)
df = pd.DataFrame(dobj_arrays)

# efficiently batch-fetch multiple data objects
multi_dobjs = data.batch_get_columns_as_arrays(filtered_atc_co2)
multi_df = ( (dobj, pd.DataFrame(arrs)) for dobj, arrs in multi_dobjs)

Downloading the original object is possible for all data objects. Structured data access, however, is limited to data objects whose data types' has_data_access property equals True.

Advanced metadata access (SPARQL)

For specialized metadata enquiries not offered by the API explicitly, it is often possible to design a SPARQL query that would provide the required information. The query can be run with sparql_select method of MetadataClient, and the output of the latter can be parsed using "as_<rdf_datatype>"-named methods in icoscp_core.sparql module. For example:

from icoscp_core.icos import meta
from icoscp_core.sparql import as_string, as_uri

query = """prefix cpmeta: <http://meta.icos-cp.eu/ontologies/cpmeta/>
	select *
	from <http://meta.icos-cp.eu/documents/>
	where{
		?doc a cpmeta:DocumentObject .
		FILTER NOT EXISTS {[] cpmeta:isNextVersionOf ?doc}
		?doc cpmeta:hasDoi ?doi .
		?doc cpmeta:hasName ?filename .
	}"""
latest_docs_with_dois = [
	{
		"uri": as_uri("doc", row),
		"filename": as_string("filename", row),
		"doi": as_string("doi", row)
	}
	for row in meta.sparql_select(query).bindings
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icoscp_core-0.3.0.tar.gz (38.2 kB view hashes)

Uploaded Source

Built Distribution

icoscp_core-0.3.0-py3-none-any.whl (41.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page