Skip to main content

A lightweight package to access R::archdata archaeological datasets in Python

Project description

ArchDataPy

ArchDataPy is a lightweight Python package for accessing archaeological datasets from R package archives in Python. It can download registered CRAN source packages, extract their .rda data files, and load those files with pyreadr. It also includes a small dataset registry for direct access to selected datasets as pandas DataFrames.

Features

  • Download registered R package archives without needing R installed.
  • List available package sources, including archdata and folio.
  • Load .rda files into Python using pyreadr.
  • Load selected datasets directly from the dataset registry as pandas DataFrames.
  • Use custom sources by passing a CRAN archive URL or local .tar.gz package archive.

Installation

You can install ArchDataPy from PyPI:

pip install archdatapy

For local development, clone the repository and install it in editable mode:

git clone https://github.com/wccarleton/archdatapy.git
cd archdatapy
pip install -e .

Dependencies

This package requires the following Python libraries:

requests
pyreadr
pandas

These dependencies are automatically installed when you install the package.

Usage

1. List registered package sources

The package ships with a registry in package_registry.json. Registered package keys currently include archdata and folio.

from archdatapy import list_available_packages

print(list_available_packages())

2. Download a package and build a manifest

The get_archdata function accepts either:

  • a registry key for a known CRAN package, or
  • a direct package archive URL or local archive path.

It returns a manifest mapping dataset names to .rda file paths, along with package metadata.

from archdatapy import get_archdata

# Download the default registered package, archdata
manifest = get_archdata()
print(manifest.package_name)
print(manifest.source_url)
print(manifest.keys())

To download another registered package, pass its registry key:

from archdatapy import get_archdata

manifest = get_archdata(data_url="folio")
print(manifest.package_name)
print(manifest.keys())

3. Load a specific .rda file from the manifest

Use load_archdata with a path from the returned manifest.

from archdatapy import load_archdata

dataset_name = 'Acheulean'  # Example key from the manifest
data = load_archdata(manifest[dataset_name])
print(data)

pyreadr.read_r() returns a dictionary-like object because a single .rda file can contain one or more R objects.

4. Load a selected dataset directly

The package also ships with a smaller dataset registry in datasets.json. These entries point directly to individual dataset files and can be loaded with get_dataset.

from archdatapy import get_dataset, list_available_datasets

print(list_available_datasets())
mask_site = get_dataset("MaskSite")
print(mask_site.head())

5. Use your own package source

If you want to use a different CRAN package archive, pass the archive URL or local .tar.gz path directly:

manifest = get_archdata(data_url='https://cran.r-project.org/src/contrib/yourpackage_1.0.0.tar.gz')

Documentation

Full documentation is available on the GitHub Pages site: https://wccarleton.github.io/archdatapy

Contributing

Contributions are welcome. Please feel free to submit issues or pull requests to improve the package.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Roadmap

Future enhancements planned for ArchDataPy:

High Priority (Completed ✅)

  • Registry-based package sourcing system
  • Modern packaging with pyproject.toml (PEP 517/518/621)
  • Type hints for better IDE support
  • Automated CI/CD with GitHub Actions
  • .gitignore and MANIFEST.in for clean distribution

Medium Priority

  • Expand package registry with curated archaeology datasets
  • Add structured logging instead of print statements
  • Improve error messages with helpful recovery suggestions
  • Add CONTRIBUTING.md guide for registry contributions
  • Include metadata (DOI, citations) in registry entries

Lower Priority

  • Optional caching layer for load_archdata()
  • Docstring examples and doctests
  • Dependency version compatibility checking
  • GitHub issue/PR templates
  • Support for additional data formats beyond .rda

Contributing to the Registry

To add new package sources to the package registry:

  1. Fork the repository
  2. Edit archdatapy/package_registry.json to add your source
  3. Submit a pull request with a description of the package and datasets

Registry entries should follow this structure:

{
  "package_name": {
    "url": "https://cran.r-project.org/src/contrib/package_1.0.0.tar.gz",
    "description": "Description of the package and datasets",
    "homepage": "https://CRAN.R-project.org/package=package",
    "license": "Package license"
  }
}

Acknowledgments

The default registry includes datasets from the R archdata package, a collection of archaeological datasets maintained on CRAN. It provides the datasets used in Quantitative Methods in Archaeology Using R by David L. Carlson.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archdatapy-1.1.1.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archdatapy-1.1.1-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file archdatapy-1.1.1.tar.gz.

File metadata

  • Download URL: archdatapy-1.1.1.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for archdatapy-1.1.1.tar.gz
Algorithm Hash digest
SHA256 e3cbcfae7aabd2a2286ee19bb337d015ae6844269cc2ff40ff12d0430e1b6fa1
MD5 25b6ab04f7f4929f2327f236021b2e18
BLAKE2b-256 e0451de01fb5a4c9bdd820b7ff91f84c0c16b44572cc1efe7c2536a7584b99d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for archdatapy-1.1.1.tar.gz:

Publisher: publish.yml on wccarleton/archdatapy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file archdatapy-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: archdatapy-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for archdatapy-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59e7b9728fdbcef71bd0e239484ee6d62bf8ec00e9eaba85e12da65c681440a0
MD5 26dd732e006960cc8757768010044eff
BLAKE2b-256 e42947c43591ad1cbd0546da594eb88424b156bdfcdfb04b7844ce563c323c1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for archdatapy-1.1.1-py3-none-any.whl:

Publisher: publish.yml on wccarleton/archdatapy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page