Helper to download and subset sparse data that has been Arcoified and are available through STAC and sqlite formated data
Project description
arcosparse: A Python library for ARCO sparse datasets subsetting
Usage
Main functions
arcosparse.subset_and_return_dataframe
Subset the data based on the input and return a dataframe.
arcosparse.subset_and_save
Subset the data based on the input and return data as a partitioned parquet file.
It means that the data is saved in one folder and in this folder there are many small parquet files. Though, you can open all the data at once.
To open the data into a dataframe, use this snippet:
import glob
output_path = "some_folder"
# Get all partitioned Parquet files
parquet_files = glob.glob(f"{output_path}/*.parquet")
# # Read all files into a single dataframe
df = pd.concat(pd.read_parquet(file) for file in parquet_files)
Changelog
0.3.5
- Return all the columns even if full of NaNs.
0.3.4
- Deleted deprecated
get_platforms_namesfunction - Fix an issue when query on the chunk would not be correct if the requested subset is 0.
0.3.3
- Add GPLv3 license
0.3.2
- Fixes an issue on Windows where deleting a file is not permited if we don't close explicitly the sql connection.
0.3.1
- Reindex when concatenate. Fixes issue when indexes wouldn't be unique.
- Fixes an issue on Windows where
datetime.to_timestampdoes not support dates before 1970-1-1 (i.e. negative values for timestamps). - Fixes an issue on Windows where a temporary sqlite file cannot be opened while it's already open in the process.
0.3.0
- Change columns output: from "platform_id" to "entity_id" and from "platform_type" to "entity_type".
- Document the expected column names in the doc of the functions.
- Add
columns_renameargument tosubset_and_return_dataframeandsubset_and_saveto be able to choose the names of the columns in the output.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arcosparse-0.3.5.tar.gz.
File metadata
- Download URL: arcosparse-0.3.5.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.0 CPython/3.12.3 Linux/6.11.0-1012-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e04f093763a72eb045491d749c3a76cea5aa72d7a9bd5cc76c574e2975b5e74b
|
|
| MD5 |
dd11616f7eb9b06b3040bcad8f65bf4b
|
|
| BLAKE2b-256 |
4f05d86f6a7bc6bf7d0137138a92cb2ccd10f27983733bb5acdbf1e99afb7dd9
|
File details
Details for the file arcosparse-0.3.5-py3-none-any.whl.
File metadata
- Download URL: arcosparse-0.3.5-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.0 CPython/3.12.3 Linux/6.11.0-1012-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
046bfc93c9567739d9ce4aef390f23000934f44c60b15b9af86ea677f96f89a6
|
|
| MD5 |
8be619aebe99b767d17dc004b8bdae67
|
|
| BLAKE2b-256 |
d52518c365b3aade36babe26b2845cc43c5c755e26cdecf5c5125bcd0fdc8cf6
|