AADR cross-version GeneticID / MasterID join utility for ancient-DNA / population-genetics workflows.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

carstenerickson

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- MacOS
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics
Typing
- Typed

Project description

aadr-resolve

AADR cross-version GeneticID / MasterID join utility for ancient-DNA and population-genetics workflows.

aadr-resolve reads AADR (Allen Ancient DNA Resource) .anno files across one or more releases and resolves the cross-version sample-ID join through the Master ID column — the part every ancient-DNA pipeline currently re-implements with custom awk. It handles AADR's progressive de-anonymization (I0001 in v44.3 → Loschbour.AG in v66) and the periodic Master-ID renames (9-18 per consecutive version pair; ~62 cumulative v44.3 → v66.0) automatically.

The HLD pins behavior and the LLD pins implementation; both live in the companion wiki:

HLD: cs-wiki/projects/aadr-resolve.md
LLD: cs-wiki/projects/aadr-resolve-lld.md
Bench-verify report: cs-wiki/projects/aadr-resolve-bench-verify.md

Install

pip install aadr-resolve

Requires Python 3.11+. Dependencies: pandas 2.x, click 8.x, PyYAML 6.x.

Quickstart

Resolve a single sample across two AADR releases.

aadr-resolve lookup I0001 \
    --anno-files v44.3_1240K_public.anno \
    --anno-files v66.0_1240K_public.anno

Output (stdout):

query: I0001
canonical individual_id: Loschbour    (matched via individual_id)
v44.3 rows: 1
  I0001  Luxembourg_Loschbour  537,182 SNPs
v66.0 rows: 2
  Loschbour.AG  Luxembourg_Mesolithic.AG  155,036 SNPs  pgid=33
  Loschbour.DG  Luxembourg_Mesolithic.DG  620,881 SNPs  pgid=39136
master_id_bridge: v44.3 I0001 → v66.0 Loschbour (via shared GID Loschbour.DG)
status: present_in_2_of_2_versions; multi_row; individual_id_renamed

Recreate a cohort against a newer release.

aadr-resolve cohort patterson_2022_whga.txt \
    --anno-files v44.3_1240K_public.anno \
    --anno-files v66.0_1240K_public.anno \
    --cohort-version v44.3 \
    -o whga_v66_manifest.tsv

The manifest is a TSV with one row per (individual × library), with per-version genetic_id / group_id / snps_hit_1240k columns, ready to feed into downstream relabeling tools like pgen-samplebind.

Structured diff between two releases.

aadr-resolve diff v62.0.anno v66.0.anno --tsv > v62_to_v66_changes.tsv

Emits one row per change event: added, removed, genetic_id_renamed, master_id_renamed, group_changed (with a per-class label — convention_restructure_suffix etc.).

Subcommands

Command	Purpose
`lookup`	Resolve a single sample across N versions
`cohort`	Emit a cross-version manifest for a user-supplied cohort
`diff`	Structured diff between two versions
`join`	Wide-format pairwise table over the full intersection
`schema`	Diagnostic: report the detected schema class

`aadr-resolve lookup`

aadr-resolve lookup INDIVIDUAL_OR_GENETIC_ID \
    --anno-files PATH [--anno-files PATH ...]
    [--json]

Treated as individual_id by default; falls back to genetic_id if no IID matches. The MID-rename bridge is built automatically from the supplied versions and reported under master_id_bridge in the output.

`aadr-resolve cohort`

aadr-resolve cohort COHORT_FILE \
    --anno-files PATH [--anno-files PATH ...]
    [--cohort-version LABEL]
    -o OUT.tsv [--json]
    [--no-propagate]
    [--collapse-to-individual]
    [--gid-preference AG,DG,SG,HO,TW,BY,AA,EC,WGC,bare]
    [--turnover-warn 0.05] [--turnover-fail 0.30]
    [--cohort-coverage-warn 0.50] [--cohort-coverage-fail 0.25]

COHORT_FILE is a TSV: one column for individual_id, optional second column for cohort_label. --cohort-version is auto-detected from the supplied annos when omitted. Default output is row-per-(individual × library); --collapse-to-individual reduces to one row per individual via the --gid-preference suffix priority.

`aadr-resolve diff`

aadr-resolve diff V_OLD.anno V_NEW.anno
    [--json | --tsv]
    [-o OUT]
    [--include-class CLASS [--include-class CLASS ...]]
    [--all-events]
    [--turnover-warn 0.05] [--turnover-fail 0.30]
    [--substantive-regroup-fail INT]

JSON output is summary-first: per-class counts always included; per-event arrays only for substantive_regroup (always) and any class named via --include-class, or all classes when --all-events is set. --tsv switches to streamed one-row-per-event format.

`aadr-resolve join`

aadr-resolve join V_OLD.anno V_NEW.anno
    -o OUT.tsv [--json]
    [--collapse-to-individual]
    [--gid-preference AG,DG,SG,HO,TW,BY,AA,EC,WGC,bare]

Wide-format pairwise table over the full v_old ∪ v_new canonical individual_id set. Same output schema as cohort; useful when you don't have a pre-existing cohort list.

`aadr-resolve schema`

aadr-resolve schema PATH [--json]

Diagnostic: detects which schema class (A–E) the .anno belongs to, reports the column layout. Useful for debugging "why does this .anno not load."

Shared options

These apply to all subcommands:

Option	Default	Notes
`--schema-override CLASS`	auto	Force schema class A/B/C/D/E (e.g., renamed `.anno`)
`--version-label LABEL`	auto	Force version label (when filename pattern doesn't match)
`--mid-bridge FILE`	none	Manual master_id-rename TSV layered on auto-detected bridge
`--on-mid-collision {error,warn}`	error	Cross-lab MID collision policy
`--quiet`	false	Suppress the "Wrote N rows" progress line

Library API

The same functionality is available in-process:

from aadr_resolve import (
    AnnoFrame,
    resolve_master_ids,
    resolve_genetic_ids,
)

# Resolve v44.3 Master IDs to v66.0 GeneticIDs
result = resolve_master_ids(
    ["I0001", "Bichon", "Mota"],
    src_version="v44.3",
    dst_version="v66.0",
    anno_paths={
        "v44.3": "v44.3_1240K_public.anno",
        "v66.0": "v66.0_1240K_public.anno",
    },
)
# result = {"I0001": "Loschbour.AG", "Bichon": "Bichon.SG", "Mota": None}

resolve_genetic_ids does the GID → GID inverse:

result = resolve_genetic_ids(
    ["I0001"],
    src_version="v44.3",
    dst_version="v66.0",
    anno_paths={...},
)
# result = {"I0001": ["Loschbour.AG", "Loschbour.DG"]}  # multi-row IID

Direct AnnoFrame access for lower-level work:

from aadr_resolve import AnnoFrame

af = AnnoFrame.from_path("v66.0_1240K_public.anno", version_label="v66.0")
af.schema_class       # SchemaClass.E
af.individual_id      # pd.Series of canonical IIDs
af.genetic_id         # pd.Series
af.persistent_genetic_id  # pd.Series of Int64 nullable (E only; all-NaN elsewhere)
af.date_calbp         # pd.Series of Int64 nullable
af.coverage           # pd.Series of Float64 nullable
af.path               # original Path, useful for re-creating anno_paths dicts

Exception hierarchy

All errors derive from aadr_resolve.AadrResolveError. Sibling tools catching aadr-resolve errors can except aadr_resolve.<Class>:

Class	Maps to exit	Trigger
`ValidationError`	1	Turnover gate, coverage gate, substantive-regroup gate
`IOFailure`	2	File not found, lock held, malformed TSV
`InvariantViolation`	3	Schema YAML malformed (rare)
`SchemaDetectionError`	3	Header signature unknown
`MissingNativeFieldError`	3	Canonical field requested for a class that lacks it
`CollisionDetected`	3	Cross-lab MID collision under `error` policy
`UsageError`	4	Bad CLI args; cohort file has no matching version

Exit codes

Stable across versions. CI workflows can grep:

0 — success
1 — soft-validation failure (any of the gates)
2 — I/O failure
3 — invariant violation (schema, MID collision)
4 — usage error (bad CLI args)

Troubleshooting

"unknown .anno schema signature" — your .anno header doesn't match any of the 5 known classes. Either the file is from a newer AADR release (file an issue with the bench-verify diff), or the file has been edited. Workarounds:

--schema-override A|B|C|D|E forces a class without signature check.
--version-label vN.N forces a version label when the filename doesn't match a known pattern.

"cross-lab MID collision" — the GID-stability check found a Master ID that maps to two different individuals in different versions. This indicates either a real data error in AADR or a cross-lab naming collision (rare). Workarounds:

--on-mid-collision warn continues with a stderr warning and marks affected rows with library_chain_ambiguous status.
--mid-bridge FILE lets you specify the correct mapping manually.

"sample turnover gate (fail)" — removal rate exceeded the --turnover-fail threshold (default 30%). Indicates either a major AADR cleanup (the v62→v66 bump removed ~17%) or that the wrong files are being compared. Override with --turnover-fail 1.0 to disable.

"cohort coverage gate (fail)" — fewer than 25% of cohort entries resolved in the supplied versions. Usually means the cohort file uses IDs from a version not in the supplied set. Check --cohort-version.

Pandas ParserError on a v52 / v54 .anno — these versions contain embedded quote characters in some full_date cells. aadr-resolve reads with csv.QUOTE_NONE to side-step pandas's default quote-handling; upgrade if you're on an older version.

Composition with the broader ecosystem

aadr-resolve cohort patterson_2022.txt \
    --anno-files v44.3.anno --anno-files v66.0.anno \
    -o cohort_manifest.tsv
pgen-samplebind merge \
    --relabel-from cohort_manifest.tsv \
    --output merged_v66.pgen \
    v44.3.pgen v66.0.pgen

The manifest's column layout is documented in HLD §Output: cohort.

Development

git clone https://github.com/carstenerickson/aadr-resolve
cd aadr-resolve
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Default suite (fast; ~10s)
pytest -ra

# Slow tests (synth perf benchmark)
pytest -m slow -ra

# External tests (real AADR files; requires AADR_CACHE env var)
AADR_CACHE=/path/to/cache pytest -m external -ra

# Standalone perf benchmark with per-phase timings
AADR_CACHE=/path/to/cache python -m benchmarks.perf_bench

# Lint + format + types
ruff check src/ tests/
ruff format --check src/ tests/
mypy src/

CI runs the default suite across Python 3.11/3.12/3.13 × Ubuntu+macOS; see .github/workflows/ci.yml.

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

carstenerickson

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- MacOS
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics
Typing
- Typed

Release history Release notifications | RSS feed

0.2.0

May 12, 2026

This version

0.1.0

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aadr_resolve-0.1.0.tar.gz (57.8 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aadr_resolve-0.1.0-py3-none-any.whl (71.1 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file aadr_resolve-0.1.0.tar.gz.

File metadata

Download URL: aadr_resolve-0.1.0.tar.gz
Upload date: May 12, 2026
Size: 57.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aadr_resolve-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3ccc0ae308de8da5b0f0817de51f59ebf666d1eec7164b93a414dcc78350eeaf`
MD5	`2acfc85154123ec5c32ce2f1fef20e98`
BLAKE2b-256	`b5cabcf10335fda3b11801c517b447dfa0480129c69d3eb37389a486fa13fe9f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aadr_resolve-0.1.0.tar.gz:

Publisher: publish.yml on carstenerickson/aadr-resolve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aadr_resolve-0.1.0.tar.gz
- Subject digest: 3ccc0ae308de8da5b0f0817de51f59ebf666d1eec7164b93a414dcc78350eeaf
- Sigstore transparency entry: 1519369504
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: carstenerickson/aadr-resolve@3f085b0529c001a940934a73421b6a4a10410808
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/carstenerickson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3f085b0529c001a940934a73421b6a4a10410808
- Trigger Event: release

File details

Details for the file aadr_resolve-0.1.0-py3-none-any.whl.

File metadata

Download URL: aadr_resolve-0.1.0-py3-none-any.whl
Upload date: May 12, 2026
Size: 71.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aadr_resolve-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e0983d5dc02a74079844306c0f4a7c66ace955660da4fd4f0a476803e0d5e2cf`
MD5	`f6da2e23c551b568517e60353429ba46`
BLAKE2b-256	`914dbcc7cdbce593a92e114adaae3b9c0b11b00d9bb4e656ec2f9b8458a3068d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aadr_resolve-0.1.0-py3-none-any.whl:

Publisher: publish.yml on carstenerickson/aadr-resolve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aadr_resolve-0.1.0-py3-none-any.whl
- Subject digest: e0983d5dc02a74079844306c0f4a7c66ace955660da4fd4f0a476803e0d5e2cf
- Sigstore transparency entry: 1519369564
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: carstenerickson/aadr-resolve@3f085b0529c001a940934a73421b6a4a10410808
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/carstenerickson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3f085b0529c001a940934a73421b6a4a10410808
- Trigger Event: release

aadr-resolve 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

aadr-resolve

Install

Quickstart

Subcommands

aadr-resolve lookup

aadr-resolve cohort

aadr-resolve diff

aadr-resolve join

aadr-resolve schema

Shared options

Library API

Exception hierarchy

Exit codes

Troubleshooting

Composition with the broader ecosystem

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`aadr-resolve lookup`

`aadr-resolve cohort`

`aadr-resolve diff`

`aadr-resolve join`

`aadr-resolve schema`