Skip to main content

A lightweight tool to easily parse, browse and edit gedcom files.

Project description

FastGedcom

A lightweight tool to easily parse, browse and edit gedcom files.

Install FastGedcom using pip from the PyPI repository:

pip install fastgedcom

To install the Ansel codec use the following command. It enables the use of the Ansel text encoding often used for gedcom files.

pip install fastgedcom[ansel]

Highlights of FastGedcom

  • FastGedcom is easy to write.
  • FastGedcom has type annotations.
  • FastGedcom has a documentation.
  • FastGedcom has code examples.
  • FastGedcom has unit tests.
  • FastGedcom has less methods than the alternatives, which make it easy to learn.
  • FastGedcom is concise thanks to operator overloads. (optional)
  • FastGedcom has a linear syntax, if/else and try/except blocks are less needed.
  • Last but not least, FastGedcom is fast. Go to benchmarks.

Comparison:

Gedcom file FastGedcom python-gedcom
0 HEAD
1 FILE my-file.ged
0 @I1@ INDI
1 NAME John Doe
1 BIRT
2 DATE 1 Jan 1970
1 DEAT
2 DATE 2 Feb 2081
0 TRLR
		
from fastgedcom.parser import strict_parse
document = strict_parse("my-file.ged")
person = document["@I1@"]
# use ">" to get a sub-line
death = person > "DEAT"
# use ">=" to get a sub-line value
date = death >= "DATE"
print(date)
# Prints "" if the field is missing
		
from gedcom.parser import Parser
document = Parser()
document.parse_file("my-file.ged")
records = document.get_element_dictionary()
person = records["@I1@"]
death_data = person.get_death_data()
# data is (date, place, sources)
date = death_data[0]
print(date)
		

Features

Multi-encoding support

It supports a broad set of encoding for gedcom files such as UTF-8 (with and without BOM), UTF-16 (also named UNICODE), ANSI, and ANSEL.

Kept closed from gedcom with free choice of formatting

There is a lot of genealogy software out there, and every one of them have its own tags and formats to write information. With the FastGedcom approach, you can easily adapt your code to your gedcom files. You have to choose how do you want to parse and format the values. You can use non-standard field, for example the "_AKA" field (standing for Also Known As).

from fastgedcom.parser import strict_parse
from fastgedcom.helpers import extract_name_parts

document = strict_parse("gedcom_file.ged")

person = document["@I1@"]
name = person >= "NAME"
print(name)  # Unformatted string such as "John /Doe/"

given_name, surname = extract_name_parts(name)
print(f"{given_name.capitalize()} {surname.upper()}")  # Would be "John DOE"

alias = person > "NAME" >= "_AKA"
print(f"a.k.a: {alias}")  # Could be "Johnny" or ""

The Option paradigm replaces the if blocks:

If a field is missing, you will get a FakeLine containing an empty string. This helps reduce the boilerplate code massively. And, you can differentiate a TrueLine from a FakeLine with a simple boolean check.

indi = document["@I13@"]

# You can access the date of death, whether the person is deceased or not.
date = (indi > "DEAT") >= "DATE"

# The date of death or an empty string
print("Death date:", date)

Another example:

for record in document:
    line = record > "_UID"
    if line:  # Check if field _UID exists to avoid ValueError in list.remove()
        record.sub_lines.remove(line)

# Get the Document as a gedcom string to write it into a file
gedcom_without_uids = document.get_source()

with open("./gedcom_without_uids.ged", "w", encoding="utf-8-sig") as f:
    f.write(gedcom_without_uids)

Typehints for salvation!

Autocompletion and type checking make development so much easier.

from fastgedcom.base import Record, FakeLine
from fastgedcom.family_link import FamilyLink

# For fast and easy family lookups
families = FamilyLink(document)


def ancestral_generation_count(indi: Record | FakeLine) -> int:
    """Return the number of generation registered as ancestors of the given person."""
    if not indi:
        return 1
    father, mother = families.get_parents(indi.tag)
    return 1 + max(
        ancestral_generation_count(father),
        ancestral_generation_count(mother),
    )


root = document["@I1@"]
number_generations_above_root = ancestral_generation_count(root)

Why it is called FastGedcom?

FastGedcom's aim is to keep the code close to your gedcom files. So, you don't have to learn what FastGedcom does. The data you have is the data you get. The content of the gedcom file is unchanged and there is no abstraction. Hence, the learning curve of the library is faster than the alternatives. The data processing is optional to best suit your needs. FastGedcom is more of a starting point for your data processing than a feature-rich library.

The name FastGedcom doesn't just come from its ease of use. Parsing is the fastest among Python libraries. Especially for parsing and getting the relatives of a person, the FamilyLink class is build for this purpose. Here are the benchmarks.

Documentation and examples

Want to see more of FastGedcom? Here are some examples

The documentation of FastGedcom is available on ReadTheDocs.

Feedback

Comments and contributions are welcomed, and they will be greatly appreciated!

If you like this project, consider putting a star on GitHub. Thank you!

For any feedback or questions, please feel free to contact me by email at gatien.bouyer.dev@gmail.com or via GitHub issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastgedcom-1.1.3.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

fastgedcom-1.1.3-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file fastgedcom-1.1.3.tar.gz.

File metadata

  • Download URL: fastgedcom-1.1.3.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for fastgedcom-1.1.3.tar.gz
Algorithm Hash digest
SHA256 384556587cc44080794e5a9ffaf60e15699d42fd1a0f2cc8dce16f41957e7ea3
MD5 73711dc137d017777e7b53658befff03
BLAKE2b-256 6c5beb45a9303fcdeba7040b4489c23754137adc308953197a9ec446f37c6fb2

See more details on using hashes here.

File details

Details for the file fastgedcom-1.1.3-py3-none-any.whl.

File metadata

  • Download URL: fastgedcom-1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for fastgedcom-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2b982488514bd398ab52f01c984a00f73a58fe290bd6874901bdbf77ac3cf625
MD5 10e90c0853c82ad0504cd6fce3d24013
BLAKE2b-256 2fac4b7aa8160febb4af4eccea12c4346bace936cdfa0f8a753f60a354ee4c78

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page