Skip to main content

Helper to download and subset sparse data that has been Arcoified and are available through STAC and sqlite formated data

Project description

arcosparse: A Python library for ARCO sparse datasets subsetting

Usage

Main functions

arcosparse.subset_and_return_dataframe

Subset the data based on the input and return a dataframe.

arcosparse.subset_and_save

Subset the data based on the input and return data as a partitioned parquet file. It means that the data is saved in one folder and in this folder there are many small parquet files. Though, you can open all the data at once.

To open the data into a dataframe, use this snippet:

import glob

output_path = "some_folder" 

# Get all partitioned Parquet files
parquet_files = glob.glob(f"{output_path}/*.parquet")

# # Read all files into a single dataframe
df = pd.concat(pd.read_parquet(file) for file in parquet_files)

Changelog

0.3.5

  • Return all the columns even if full of NaNs.

0.3.4

  • Deleted deprecated get_platforms_names function
  • Fix an issue when query on the chunk would not be correct if the requested subset is 0.

0.3.3

  • Add GPLv3 license

0.3.2

  • Fixes an issue on Windows where deleting a file is not permited if we don't close explicitly the sql connection.

0.3.1

  • Reindex when concatenate. Fixes issue when indexes wouldn't be unique.
  • Fixes an issue on Windows where datetime.to_timestamp does not support dates before 1970-1-1 (i.e. negative values for timestamps).
  • Fixes an issue on Windows where a temporary sqlite file cannot be opened while it's already open in the process.

0.3.0

  • Change columns output: from "platform_id" to "entity_id" and from "platform_type" to "entity_type".
  • Document the expected column names in the doc of the functions.
  • Add columns_rename argument to subset_and_return_dataframe and subset_and_save to be able to choose the names of the columns in the output.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcosparse-0.3.5.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arcosparse-0.3.5-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file arcosparse-0.3.5.tar.gz.

File metadata

  • Download URL: arcosparse-0.3.5.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.0 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for arcosparse-0.3.5.tar.gz
Algorithm Hash digest
SHA256 e04f093763a72eb045491d749c3a76cea5aa72d7a9bd5cc76c574e2975b5e74b
MD5 dd11616f7eb9b06b3040bcad8f65bf4b
BLAKE2b-256 4f05d86f6a7bc6bf7d0137138a92cb2ccd10f27983733bb5acdbf1e99afb7dd9

See more details on using hashes here.

File details

Details for the file arcosparse-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: arcosparse-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.0 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for arcosparse-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 046bfc93c9567739d9ce4aef390f23000934f44c60b15b9af86ea677f96f89a6
MD5 8be619aebe99b767d17dc004b8bdae67
BLAKE2b-256 d52518c365b3aade36babe26b2845cc43c5c755e26cdecf5c5125bcd0fdc8cf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page