Skip to main content

Gedcom-X tools RC1

Project description

gedcomtools

A comprehensive Python toolkit for parsing, converting, validating, and analyzing genealogical data using the GEDCOM 5.x, GEDCOM 7, and GEDCOM X data models.

gedcomtools provides:

  • GEDCOM 5.x parser and high-level facade
  • GEDCOM 7 parser, 18-phase validator, serializer, and interactive CLI
  • GEDCOM X structured object model
  • GEDCOM 5.x โ†’ GEDCOM X conversion
  • CLI tooling (gxcli, g7cli, validate7)
  • Advanced logging via loggingkit
  • Graph export utilities

Designed for historical records processing, genealogy research, and archival data pipelines.


Features

  • โœ… GEDCOM 5.x parser (gedcom5)
  • โœ… GEDCOM 7 parser, validator, and serializer (gedcom7)
  • ๐Ÿ”ง GEDCOM X object model (gedcomx) โ€” in progress
  • ๐Ÿ”ง Converter (GEDCOM 5.x โ†’ GEDCOM X) โ€” in progress
  • โœ… CLI tools (gxcli, g7cli, validate7)
  • โœ… Structured logging (loggingkit)
  • โœ… Sub-loggers (conversion, parser, io, etc.)
  • โœ… Runtime log inspection
  • โœ… Extensible schema system
  • โœ… Source, person, family, relationship modeling
  • โœ… Place and event normalization
  • โœ… Metadata and attribution handling
  • ๐Ÿ”ง Graph database export (ArangoDB) โ€” in progress

Project Structure

gedcomtools/
โ”œโ”€โ”€ gedcom5/                # GEDCOM 5.x parsing layer
โ”‚   โ”œโ”€โ”€ gedcom5.py          # High-level facade (Gedcom5)
โ”‚   โ”œโ”€โ”€ parser.py           # Low-level parser engine (Gedcom5x)
โ”‚   โ”œโ”€โ”€ elements.py         # Typed element/record classes
โ”‚   โ”œโ”€โ”€ helpers.py          # Element query helpers
โ”‚   โ”œโ”€โ”€ tags.py             # GEDCOM 5.x tag constants
โ”‚   โ””โ”€โ”€ source.py           # Source record helpers
โ”œโ”€โ”€ gedcom7/                # GEDCOM 7 parsing, validation, and serialization
โ”‚   โ”œโ”€โ”€ gedcom7.py          # Parser + Gedcom7 class
โ”‚   โ”œโ”€โ”€ structure.py        # In-memory tree node (GedcomStructure)
โ”‚   โ”œโ”€โ”€ validator.py        # 18-phase structural/semantic validator
โ”‚   โ”œโ”€โ”€ writer.py           # GEDCOM 7 serializer
โ”‚   โ”œโ”€โ”€ models.py           # High-level detail dataclasses
โ”‚   โ”œโ”€โ”€ specification.py    # Tag rules, cardinality, enumerations
โ”‚   โ”œโ”€โ”€ g7interop.py        # Tag โ†” URI mapping
โ”‚   โ”œโ”€โ”€ exceptions.py       # Exception hierarchy
โ”‚   โ”œโ”€โ”€ g7cli.py            # Interactive browser/editor shell
โ”‚   โ””โ”€โ”€ validate7.py        # validate7 CLI entry point
โ”œโ”€โ”€ gedcomx/                # GEDCOM X object model and conversion (in progress)
โ”œโ”€โ”€ graph.py                # Graph export (persons, relationships)
โ”œโ”€โ”€ loggingkit.py           # Structured logging framework
โ””โ”€โ”€ utils/                  # Shared utilities

Installation

pip install gedcomtools

Or from source:

git clone https://github.com/cartwrightdj/gedcomtools.git
cd gedcomtools
pip install -e .

Quick Start

Parse GEDCOM 5.x

from gedcomtools.gedcom5 import Gedcom5

g = Gedcom5("family.ged")

for person in g.individual_details():
    print(person.full_name, person.birth_year, person.death_year)

for family in g.family_details():
    print(family.husband_xref, family.wife_xref, family.marriage_year)

Parse and validate GEDCOM 7

from gedcomtools.gedcom7 import Gedcom7

g = Gedcom7("family.ged")

issues = g.validate()
for issue in issues:
    print(f"[{issue.severity}] {issue.code}: {issue.message}")

# Write back out
g.write("family_out.ged")

Convert GEDCOM 5.x โ†’ GEDCOM X (in progress)

The GEDCOM X object model and converter are under active development. The core object model (persons, families, relationships, sources, events, places, names, facts) is implemented. Conversion and CLI tooling are still being worked on.

from gedcomtools.gedcomx import GedcomX, GedcomConverter

converter = GedcomConverter()
gx = converter.Gedcom5x_GedcomX(ged)

CLI Tools

validate7 โ€” GEDCOM 7 validator

validate7 family.ged
validate7 --lenient family.ged   # suppress undeclared extension tag errors
Exit code Meaning
0 Clean (warnings may still be printed)
1 One or more validation errors
2 Not a GEDCOM 7 file
3 File not found or cannot be read

g7cli โ€” interactive GEDCOM 7 browser/editor

g7cli family.ged

Commands: load, reload, write, validate, info, ls, cd, pwd, show, find, set, add, rm, help, quit.

gxcli โ€” GEDCOM X CLI

gxcli convert input.ged output.json

Logging

The project uses loggingkit for structured logging.

from gedcomtools.loggingkit import setup_logging, LoggerSpec

mgr = setup_logging("gedcomtools")
mgr.get_sublogger(LoggerSpec(name="conversion"))
mgr.get_sublogger(LoggerSpec(name="parser"))

Library modules use:

from gedcomtools.loggingkit import get_log

log = get_log("conversion")
log.info("Starting conversion")

Design Goals

  • Centralized logging control
  • Library-safe imports (no logging side effects)
  • Extensible schema support
  • Accurate GEDCOM X modeling
  • Robust error reporting
  • CLI + API parity
  • Clear separation of concerns

Roadmap

  • GEDCOM 5.x โ†’ GEDCOM 7 converter
  • GEDCOM X โ†’ GEDCOM 7 converter
  • JSON-LD export
  • RAG pipeline integration
  • Full test suite

License

MIT License


Author

David J. Cartwright


Build genealogy tooling like infrastructure: structured, observable, extensible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gedcomtools-0.6.0.tar.gz (213.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gedcomtools-0.6.0-py3-none-any.whl (237.9 kB view details)

Uploaded Python 3

File details

Details for the file gedcomtools-0.6.0.tar.gz.

File metadata

  • Download URL: gedcomtools-0.6.0.tar.gz
  • Upload date:
  • Size: 213.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gedcomtools-0.6.0.tar.gz
Algorithm Hash digest
SHA256 5b01c29430a3b02887dd897381366afcfebf88cb19288ff8e8a5abee274b7d02
MD5 0aa70a498715e090639e5c9fa2e377f3
BLAKE2b-256 d5794f972141666af28d0383f14a367a044312ff88318ec0480e38eebe9baf58

See more details on using hashes here.

File details

Details for the file gedcomtools-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: gedcomtools-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 237.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gedcomtools-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b629442f2f29b3e80d32961b1e09c7730bff0fb091d4203905483e17bce494db
MD5 514f53c928490484fb72f3d020c58ece
BLAKE2b-256 96b4065fa2da3c694e23e688e5bd9250bda773cbbc8a326cdd748dc2f4cbf48e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page