A lightweight package to access R::archdata archaeological datasets in Python
Project description
ArchDataPy
ArchDataPy is a lightweight Python package for accessing archaeological datasets from R package archives in Python. It can download registered CRAN source packages, extract their .rda data files, and load those files with pyreadr. It also includes a small dataset registry for direct access to selected datasets as pandas DataFrames.
Features
- Download registered R package archives without needing R installed.
- List available package sources, including
archdataandfolio. - Load
.rdafiles into Python usingpyreadr. - Load selected datasets directly from the dataset registry as
pandasDataFrames. - Use custom sources by passing a CRAN archive URL or local
.tar.gzpackage archive.
Installation
You can install ArchDataPy from PyPI:
pip install archdatapy
For local development, clone the repository and install it in editable mode:
git clone https://github.com/wccarleton/archdatapy.git
cd archdatapy
pip install -e .
Dependencies
This package requires the following Python libraries:
requests
pyreadr
pandas
These dependencies are automatically installed when you install the package.
Usage
1. List registered package sources
The package ships with a registry in package_registry.json. Registered package keys currently include archdata and folio.
from archdatapy import list_available_packages
print(list_available_packages())
2. Download a package and build a manifest
The get_archdata function accepts either:
- a registry key for a known CRAN package, or
- a direct package archive URL or local archive path.
It returns a manifest mapping dataset names to .rda file paths, along with package metadata.
from archdatapy import get_archdata
# Download the default registered package, archdata
manifest = get_archdata()
print(manifest.package_name)
print(manifest.source_url)
print(manifest.keys())
To download another registered package, pass its registry key:
from archdatapy import get_archdata
manifest = get_archdata(data_url="folio")
print(manifest.package_name)
print(manifest.keys())
3. Load a specific .rda file from the manifest
Use load_archdata with a path from the returned manifest.
from archdatapy import load_archdata
dataset_name = 'Acheulean' # Example key from the manifest
data = load_archdata(manifest[dataset_name])
print(data)
pyreadr.read_r() returns a dictionary-like object because a single .rda file can contain one or more R objects.
4. Load a selected dataset directly
The package also ships with a smaller dataset registry in datasets.json. These entries point directly to individual dataset files and can be loaded with get_dataset.
from archdatapy import get_dataset, list_available_datasets
print(list_available_datasets())
mask_site = get_dataset("MaskSite")
print(mask_site.head())
5. Use your own package source
If you want to use a different CRAN package archive, pass the archive URL or local .tar.gz path directly:
manifest = get_archdata(data_url='https://cran.r-project.org/src/contrib/yourpackage_1.0.0.tar.gz')
Documentation
Full documentation is available on the GitHub Pages site: https://wccarleton.github.io/archdatapy
Contributing
Contributions are welcome. Please feel free to submit issues or pull requests to improve the package.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Roadmap
Future enhancements planned for ArchDataPy:
High Priority (Completed ✅)
- Registry-based package sourcing system
- Modern packaging with
pyproject.toml(PEP 517/518/621) - Type hints for better IDE support
- Automated CI/CD with GitHub Actions
-
.gitignoreandMANIFEST.infor clean distribution
Medium Priority
- Expand package registry with curated archaeology datasets
- Add structured logging instead of print statements
- Improve error messages with helpful recovery suggestions
- Add
CONTRIBUTING.mdguide for registry contributions - Include metadata (DOI, citations) in registry entries
Lower Priority
- Optional caching layer for
load_archdata() - Docstring examples and doctests
- Dependency version compatibility checking
- GitHub issue/PR templates
- Support for additional data formats beyond
.rda
Contributing to the Registry
To add new package sources to the package registry:
- Fork the repository
- Edit
archdatapy/package_registry.jsonto add your source - Submit a pull request with a description of the package and datasets
Registry entries should follow this structure:
{
"package_name": {
"url": "https://cran.r-project.org/src/contrib/package_1.0.0.tar.gz",
"description": "Description of the package and datasets",
"homepage": "https://CRAN.R-project.org/package=package",
"license": "Package license"
}
}
Acknowledgments
The default registry includes datasets from the R archdata package, a collection of archaeological datasets maintained on CRAN. It provides the datasets used in Quantitative Methods in Archaeology Using R by David L. Carlson.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file archdatapy-1.1.1.tar.gz.
File metadata
- Download URL: archdatapy-1.1.1.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3cbcfae7aabd2a2286ee19bb337d015ae6844269cc2ff40ff12d0430e1b6fa1
|
|
| MD5 |
25b6ab04f7f4929f2327f236021b2e18
|
|
| BLAKE2b-256 |
e0451de01fb5a4c9bdd820b7ff91f84c0c16b44572cc1efe7c2536a7584b99d4
|
Provenance
The following attestation bundles were made for archdatapy-1.1.1.tar.gz:
Publisher:
publish.yml on wccarleton/archdatapy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
archdatapy-1.1.1.tar.gz -
Subject digest:
e3cbcfae7aabd2a2286ee19bb337d015ae6844269cc2ff40ff12d0430e1b6fa1 - Sigstore transparency entry: 1440077216
- Sigstore integration time:
-
Permalink:
wccarleton/archdatapy@b9489a04d065c0f8578c23c4ada550edba618617 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/wccarleton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b9489a04d065c0f8578c23c4ada550edba618617 -
Trigger Event:
release
-
Statement type:
File details
Details for the file archdatapy-1.1.1-py3-none-any.whl.
File metadata
- Download URL: archdatapy-1.1.1-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59e7b9728fdbcef71bd0e239484ee6d62bf8ec00e9eaba85e12da65c681440a0
|
|
| MD5 |
26dd732e006960cc8757768010044eff
|
|
| BLAKE2b-256 |
e42947c43591ad1cbd0546da594eb88424b156bdfcdfb04b7844ce563c323c1d
|
Provenance
The following attestation bundles were made for archdatapy-1.1.1-py3-none-any.whl:
Publisher:
publish.yml on wccarleton/archdatapy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
archdatapy-1.1.1-py3-none-any.whl -
Subject digest:
59e7b9728fdbcef71bd0e239484ee6d62bf8ec00e9eaba85e12da65c681440a0 - Sigstore transparency entry: 1440077221
- Sigstore integration time:
-
Permalink:
wccarleton/archdatapy@b9489a04d065c0f8578c23c4ada550edba618617 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/wccarleton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b9489a04d065c0f8578c23c4ada550edba618617 -
Trigger Event:
release
-
Statement type: