Skip to main content

This repository contains the implementation of dataset loader for Parquet files.

Project description

aircheckdata: AIRCHECK Parquet Dataset Loader

A lightweight Python package and CLI tool for listing and loading AIRCHECK datasets, with built-in support for column selection, progress tracking, and automatic local caching. This is the Pythonic way to programmatically access datasets that are also available for download via the AIRCHECK website. Before using any dataset, please ensure you have read and agreed to the dataset agreement HitGen End User License Agreement (EULA)


✅ Best Practices

  • Use virtual environments to avoid dependency conflicts:

    python -m venv .venv
    source .venv/bin/activate  # On Windows use .venv\Scripts\activate
    
  • Always validate that your code respects data privacy and licensing terms.

  • Avoid storing large datasets in version control. Let aircheckdata handle caching.


📦 Installation

You can install the package from PyPI:

pip install aircheckdata

For development and testing (optional):

pip install -e ".[dev]"

Installation verification (optional)

Verify that the installation was successful by running unit tests

pytest tests/

🔧 Usage in a Python Project (Virtual Environment)

aircheckdata can be used directly from your Python environment to:

  • List pre-configured datasets
  • View available columns and metadata
  • Load datasets with optional filtering and progress indicators

Quick Start

List Datasets

from aircheckdata import list_datasets

datasets = list_datasets()
for name, desc in datasets.items():
    print(f"{name}: {desc}")

View Available Columns

from aircheckdata import get_columns

columns = get_columns('HitGen','WDR91')
for col, desc in columns.items():
    print(f"{col}: {desc}")

Load dataset

from aircheckdata import load_dataset

df = load_dataset('HitGen','WDR91', columns=['ECFP6','ECFP4','LABEL'], show_progress=False) or
df = load_dataset('HitGen','WDR91', columns=['ECFP6','ECFP4','LABEL']) # Download specified data columns with progressbar
df = load_dataset()  # Download once, then cache locally (by default it loads WDR91 Target)
print(df.head())

Advanced Usage

# Load only selected columns
df = load_dataset('WDR91', columns=['ECFP6', 'ECFP4', 'LABEL'])

# Show progress while loading
df = load_dataset('WDR91', show_progress=True)

💻 CLI Usage

The aircheckdata CLI enables quick access to datasets via command-line:

aircheckdata --help

Options and Examples

Option Description
list List all available datasets
load name Load a specific dataset (defaults to WDR91)
--columns col1,col2 Select columns to load or list columns of a dataset

Examples

# List datasets
aircheckdata list

# Load the default dataset (WDR91)
aircheckdata --load

# Load a dataset with selected columns
aircheckdata --load "WDR91" --columns ECFP6,ECFP4

# View available columns (defaults to WDR91 if no name is given)
aircheckdata columns

📜 License and Terms of Use

This package is distributed under the MIT License. However, the datasets it provides access to are subject to the HitGen End User License Agreement (EULA).

⚠️ By using any dataset accessed via aircheckdata, you agree to abide by the HitGen EULA.

Please refer to the full license terms and conditions here: 👉 https://www.aircheck.ai/docs/HitGen.pdf


📚 Pre-configured Datasets

Currently available datasets include:

  • WDR91: A curated Parquet dataset provided by HitGen

🛠 Requirements

  • Python 3.7+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airctest-1.0.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airctest-1.0.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file airctest-1.0.0.tar.gz.

File metadata

  • Download URL: airctest-1.0.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for airctest-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d3397f779520228bf8f35ebdbed00f848f6acca05bfba6a4fc644f7688bb9b37
MD5 7cec3e397503818b4ec68e0348f30f9f
BLAKE2b-256 f83d84d9e1556793c6812e96f6887c19ca3a3c4d78581d5c593e42b4ea7dd5e7

See more details on using hashes here.

File details

Details for the file airctest-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: airctest-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for airctest-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 096bb2424dbfaf3391e402ddfb2909cf37e1f84d3d1a5540dab7f4c5d046a852
MD5 8291b4fb3f5d4b002ac7a957dfdf5c02
BLAKE2b-256 a9b0b03ade88b429005b00178d893e0cf9e7d432236f3d500bf2f11e2ef4f2ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page