Helper to download and subset sparse data that has been Arcoified and are available through STAC and sqlite formated data

These details have not been verified by PyPI

Project description

arcosparse: A Python library for ARCO sparse datasets subsetting

[!WARNING] This library is still in development. Breaking changes might be introduced from version 0.y.z to 0.y+1.z.

Usage

Main functions

`arcosparse.subset_and_return_dataframe`

Subset the data based on the input and return a dataframe.

`arcosparse.subset_and_save`

Subset the data based on the input and return data as a partitioned parquet file. It means that the data is saved in one folder and in this folder there are many small parquet files. Though, you can open all the data at once.

To open the data into a dataframe, use this snippet:

import glob

output_path = "some_folder" 

# Get all partitioned Parquet files
parquet_files = glob.glob(f"{output_path}/*.parquet")

# # Read all files into a single dataframe
df = pd.concat(pd.read_parquet(file) for file in parquet_files)

`arcosparse.get_entities`

A function to get the metadata about the entities that are available in the dataset. Since all the information is retrieved from the metadata, the argument is the url_metadata, the same used for the subset. Returns a list of arcosparse.Entity. It contains information about the entities available in the dataset:

entity_id: same as the entity_id column in the result of a subset.
entity_type: same as the entity_type column in the result of a subset.
doi: the DOI of the entity.
institution: the institution associated with the entity.

`arcosparse.get_dataset_metadata`

A function to get the metadata about the dataset. Since all the information is retrieved from the metadata, the argument is the url_metadata, the same used for the subset.

Returns an object arcosparse.Dataset. It contains information about the dataset:

dataset_id: the ID of the dataset.
variables: a list of the names of the variables available in the dataset.
assets: a list of the names of the assets available in the dataset.
coordinates: a list of arcosparse.DatasetCoordinate objects. Each object contains the following information:
- coordinate_id: the ID of the coordinate.
- unit: the unit of the coordinate.
- minimum: the minimum value of the coordinate.
- maximum: the maximum value of the coordinate.
- step: the step of the coordinate.
- values: the values of the coordinate.

Changelog

0.4.2

0.4.2: Bug fixes

Fix a bug where dates in the metadata like "2025-06-25T07:43:54.514180Z" would not be parsed and raised an error. Now, it uses dateutil.parser to parse the date strings correctly.

0.4.1

0.4.1: New features

Added function get_dataset_metadata. It returns an arcosparse.Dataset object.

0.4.0

Breaking Changes

Deleted function get_entities_ids. Use get_entities as a replacement. Example:

# old code
my_entities = get_entities_ids(url_metadata)

# new code
my_entities = [entity.entity_id for entity in get_entities(url_metadata)]

New features

Added function get_entities. It returns a list of Entity objects.

Bug fixes

Fix a bug where arcosparse would modify the dict that users input in the columns_rename argument. Now, it deepcopy it to modify it after that.

0.3.5

Return all the columns even if full of NaNs.

0.3.4

Deleted deprecated get_platforms_names function
Fix an issue when query on the chunk would not be correct if the requested subset is 0.

0.3.3

Add GPLv3 license

0.3.2

Fixes an issue on Windows where deleting a file is not permited if we don't close explicitly the sql connection.

0.3.1

Reindex when concatenate. Fixes issue when indexes wouldn't be unique.
Fixes an issue on Windows where datetime.to_timestamp does not support dates before 1970-1-1 (i.e. negative values for timestamps).
Fixes an issue on Windows where a temporary sqlite file cannot be opened while it's already open in the process.

0.3.0

Change columns output: from "platform_id" to "entity_id" and from "platform_type" to "entity_type".
Document the expected column names in the doc of the functions.
Add columns_rename argument to subset_and_return_dataframe and subset_and_save to be able to choose the names of the columns in the output.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.1

Mar 9, 2026

0.5.0

Feb 27, 2026

This version

0.4.2

Jun 25, 2025

0.4.1

May 27, 2025

0.4.0

Apr 25, 2025

0.3.5

Apr 24, 2025

0.3.4

Apr 22, 2025

0.3.3

Apr 1, 2025

0.3.2

Mar 31, 2025

0.3.1

Mar 31, 2025

0.3.0

Mar 26, 2025

0.2.1

Mar 21, 2025

0.2.0

Mar 17, 2025

0.1.4

Mar 4, 2025

0.1.3

Mar 4, 2025

0.1.2

Feb 28, 2025

0.1.1

Feb 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcosparse-0.4.2.tar.gz (24.4 kB view details)

Uploaded Jun 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arcosparse-0.4.2-py3-none-any.whl (26.7 kB view details)

Uploaded Jun 25, 2025 Python 3

File details

Details for the file arcosparse-0.4.2.tar.gz.

File metadata

Download URL: arcosparse-0.4.2.tar.gz
Upload date: Jun 25, 2025
Size: 24.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.0 CPython/3.12.3 Linux/6.11.0-1015-azure

File hashes

Hashes for arcosparse-0.4.2.tar.gz
Algorithm	Hash digest
SHA256	`67c356f9db02dee5e4d74d64afcb73b208c0a056f828d746931c85909e548450`
MD5	`b7e6bf0600077273b86d788bcf99c085`
BLAKE2b-256	`220357cb3f2aae3cbc239ab9c95709658ef9c93e7f73fb93e63ce6e6ad7ad209`

See more details on using hashes here.

File details

Details for the file arcosparse-0.4.2-py3-none-any.whl.

File metadata

Download URL: arcosparse-0.4.2-py3-none-any.whl
Upload date: Jun 25, 2025
Size: 26.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.0 CPython/3.12.3 Linux/6.11.0-1015-azure

File hashes

Hashes for arcosparse-0.4.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f2eced580b69493553fbc07ff9d34147a1fb38bcdd530e1a9ac5ad961bd6dd2b`
MD5	`81894d21a1c9417f0eba1cb2b1a6f93b`
BLAKE2b-256	`34ecc764296b694b425853baf444d9e4643949d2e16479f5b192f47fa24ba1fe`

See more details on using hashes here.

arcosparse 0.4.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

arcosparse: A Python library for ARCO sparse datasets subsetting

Usage

Main functions

arcosparse.subset_and_return_dataframe

arcosparse.subset_and_save

arcosparse.get_entities

arcosparse.get_dataset_metadata

Changelog

0.4.2

0.4.2: Bug fixes

0.4.1

0.4.1: New features

0.4.0

Breaking Changes

New features

Bug fixes

0.3.5

0.3.4

0.3.3

0.3.2

0.3.1

0.3.0

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`arcosparse.subset_and_return_dataframe`

`arcosparse.subset_and_save`

`arcosparse.get_entities`

`arcosparse.get_dataset_metadata`