Skip to main content

A Python module for parsing, analyzing, and manipulating GEDCOM files.

Project description

Python GEDCOM Parser

PyPI GitHub release Build Status

A Python module for parsing, analyzing, and manipulating GEDCOM files.

GEDCOM files contain ancestry data. The parser is currently supporting the GEDCOM 5.5 format which is detailed here.

For the latest changes please have a look at the CHANGELOG.md file.

The current development process can be tracked in the develop branch.

Installation

The module can be installed via pip.

Run pip<version> install python-gedcom to install or pip<version> install python-gedcom --upgrade to upgrade to the newest version uploaded to the PyPI repository.

If you want to use the latest pre-release of the python-gedcom package, simply append the --pre option to pip: pip<version> install python-gedcom --pre

Example usage

For more examples: Please have a look at the test files found in the tests/ directory.

When successfully installed you may import the gedcom package and use it like so:

from gedcom.element.individual import IndividualElement
from gedcom.parser import Parser

# Path to your `.ged` file
file_path = ''

# Initialize the parser
gedcom_parser = Parser()

# Parse your file
gedcom_parser.parse_file(file_path)

root_child_elements = gedcom_parser.get_root_child_elements()

# Iterate through all root child elements
for element in root_child_elements:

    # Is the `element` an actual `IndividualElement`? (Allows usage of extra functions such as `surname_match` and `get_name`.)
    if isinstance(element, IndividualElement):

        # Get all individuals whose surname matches "Doe"
        if element.surname_match('Doe'):

            # Unpack the name tuple
            (first, last) = element.get_name()

            # Print the first and last name of the found individual
            print(first + " " + last)

Strict parsing

Large sites like Ancestry and MyHeritage (among others) don't always produce perfectly formatted GEDCOM files. If you encounter errors in parsing, you might consider disabling strict parsing which is enabled by default:

from gedcom.parser import Parser

file_path = '' # Path to your `.ged` file

gedcom_parser = Parser()
gedcom_parser.parse_file(file_path, False) # Disable strict parsing

Disabling strict parsing will allow the parser to gracefully handle the following quirks:

  • Multi-line fields that don't use CONC or CONT
  • Handle the last line not ending in a CRLF (\r\n)

Reference

Note: At a later state the documentation may be outsourced into individual, automatically generated wiki pages. (Makes things a little bit easier.)

Parser class

The Parser class represents the actual parser. Use this class to parse a GEDCOM file.

Note: May be imported via from gedcom.parser import Parser.

Method Parameters Returns Description
invalidate_cache Empties the element list and dictionary to cause get_element_list() and get_element_dictionary() to return updated data
get_element_list list of Element Returns a list containing all elements from within the GEDCOM file
get_element_dictionary dict of Element Returns a dictionary containing all elements, identified by a pointer, from within the GEDCOM file
get_root_element RootElement Returns a virtual root element containing all logical records as children
get_root_child_elements list of Element Returns a list of logical records in the GEDCOM file
parse_file str file_path, bool strict Opens and parses a file, from the given file path, as GEDCOM 5.5 formatted data
get_marriages IndividualElement individual tuple: (str date, str place) Returns a list of marriages of an individual formatted as a tuple (str date, str place)
get_marriage_years IndividualElement individual list of int Returns a list of marriage years (as integers) for an individual
marriage_year_match IndividualElement individual, int year bool Checks if one of the marriage years of an individual matches the supplied year. Year is an integer.
marriage_range_match IndividualElement individual, int from_year, int to_year bool Check if one of the marriage years of an individual is in a given range. Years are integers.
get_families IndividualElement individual, str family_type = gedcom.tags.GEDCOM_TAG_FAMILY_SPOUSE list of FamilyElement Return family elements listed for an individual
get_ancestors IndividualElement individual, str ancestor_type = "ALL" list of Element Return elements corresponding to ancestors of an individual
get_parents IndividualElement individual, str parent_type = "ALL" list of IndividualElement Return elements corresponding to parents of an individual
find_path_to_ancestor IndividualElement descendant, IndividualElement ancestor, path = None object Return path from descendant to ancestor
get_family_members FamilyElement family, str members_type = FAMILY_MEMBERS_TYPE_ALL list of IndividualElement Return array of family members: individual, spouse, and children
print_gedcom Write GEDCOM data to stdout
save_gedcom IO open_file Save GEDCOM data to a file

Element class

An element represents a line from within the parsed GEDCOM file.

May be imported via from gedcom.element.element import Element.

Method Parameters Returns Description
get_level int Returns the level of this element from within the GEDCOM file
get_pointer str Returns the pointer of this element from within the GEDCOM file
get_tag str Returns the tag of this element from within the GEDCOM file
get_value str Returns the tag of this element from within the GEDCOM file
set_value str value str Sets the value of this element
get_multi_line_value str Returns the value of this element including concatenations or continuations
set_multi_line_value str value str Sets the value of this element, adding concatenation and continuation lines when necessary
get_child_elements list of Element Returns the direct child elements of this element
new_child_element str tag, str pointer = "", str value = "" Element Creates and returns a new child element of this element
add_child_element Element child Element Adds a child element to this element
get_parent_element Element Returns the parent element of this element
set_parent_element Element parent Adds a parent element to this element. There's usually no need to call this method manually, add_child_element() calls it automatically.
get_individual str DEPRECATED: As of version v1.0.0 use to_gedcom_string() method instead.
to_gedcom_string bool recursive = False str Formats this element and optionally all of its sub-elements into a GEDCOM conform string

Casting an Element to a string will internally call the to_gedcom_string() method.

FamilyElement class (derived from Element)

May be imported via from gedcom.element.family import FamilyElement.

Method Parameters Returns Description
is_family bool Checks if this element is an actual family

FileElement class (derived from Element)

May be imported via from gedcom.element.file import FileElement.

Method Parameters Returns Description
is_file bool Checks if this element is an actual file

IndividualElement class (derived from Element)

Represents a person from within the parsed GEDCOM file.

May be imported via from gedcom.element.individual import IndividualElement.

Method Parameters Returns Description
is_individual bool Checks if this element is an actual individual
is_deceased bool Checks if this individual is deceased
is_child bool Checks if this element is a child of a family
is_private bool Checks if this individual is marked private
get_name tuple: (str given_name, str surname) Returns an individual's names as a tuple: (str given_name, str surname)
surname_match str surname_to_match bool Matches a string with the surname of an individual
given_name_match str given_name_to_match bool Matches a string with the given names of an individual
get_gender str Returns the gender of a person in string format
get_birth_data tuple: (str date, str place, list sources) Returns the birth data of a person formatted as a tuple: (str date, str place, list sources)
get_birth_year int Returns the birth year of a person in integer format
get_death_data tuple: (str date, str place, list sources) Returns the death data of a person formatted as a tuple: (str date, str place, list sources)
get_death_year int Returns the death year of a person in integer format
get_burial_data tuple: (str date, str place, list sources) Returns the burial data of a person formatted as a tuple: (str date, str´ place, list` sources)
get_census_data list of tuple: (str date, str place, list sources) Returns a list of censuses of an individual formatted as tuples: (str date, str´ place, list` sources)
get_last_change_date str Returns the date of when the person data was last changed formatted as a string
get_occupation str Returns the occupation of a person
birth_year_match int year bool Returns True if the given year matches the birth year of this person
birth_range_match int from_year, int to_year bool Checks if the birth year of an individual lies within the given range
death_year_match int year bool Returns True if the given year matches the death year of this person
death_range_match int from_year, int to_year bool Returns True if the given year matches the death year of this person
criteria_match str criteria bool Checks if this individual matches all of the given criteria. Full format for criteria: surname=[name]:given_name=[given_name]:birth[year]:birth_range=[from_year-to_year]

ObjectElement class (derived from Element)

May be imported via from gedcom.element.object import ObjectElement.

Method Parameters Returns Description
is_object bool Checks if this element is an actual object

RootElement class (derived from Element)

Virtual GEDCOM root element containing all logical records as children.

Local development

I suggest using pyenv for local development.

Running tests

  1. Run pip<version> install --no-cache-dir -r requirements.txt to install dependencies
  2. Run tests with tox (tox in your console)
    • For Python 2.7 run tox -e py27 (you need to have Python 2.7 installed)
    • For Python 3.4 run tox -e py34 (you need to have Python 3.4 installed)
    • For Python 3.5 run tox -e py35 (you need to have Python 3.5 installed)
    • For Python 3.6 run tox -e py36 (you need to have Python 3.6 installed)

Uploading a new package to PyPI

  1. Run pip<version> install --no-cache-dir -r requirements.txt to install dependencies
  2. Run python<version> setup.py sdist bdist_wheel to generate distribution archives
  3. Run twine upload --repository-url https://test.pypi.org/legacy/ dist/* to upload the archives to the Test Python Package Index repository

When the package is ready to be published to the real Python Package Index the repository-url is https://upload.pypi.org/legacy/.

Changelog

Please have a look at the CHANGELOG.md file.

History

This module was originally based on a GEDCOM parser written by Daniel Zappala at Brigham Young University (Copyright (C) 2005) which was licensed under the GPL v2 and then continued by Mad Price Ball in 2012.

The project was taken over by Nicklas Reincke in 2018. Together with Damon Brodie a lot of changes were made and the parser was optimized.

License

Licensed under the GNU General Public License v2

Python GEDCOM Parser
Copyright (C) 2018 Damon Brodie (damon.brodie at gmail.com)
Copyright (C) 2018-2019 Nicklas Reincke (contact at reynke.com)
Copyright (C) 2016 Andreas Oberritter
Copyright (C) 2012 Madeleine Price Ball
Copyright (C) 2005 Daniel Zappala (zappala at cs.byu.edu)
Copyright (C) 2005 Brigham Young University

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-gedcom-1.0.0.tar.gz (32.5 kB view hashes)

Uploaded Source

Built Distribution

python_gedcom-1.0.0-py2.py3-none-any.whl (35.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page