tools for genetic genealogy and the analysis of consumer DNA test results
Project description
lineage
lineage provides a framework for analyzing genotype (raw data) files from direct-to-consumer (DTC) DNA testing companies, primarily for the purposes of genetic genealogy.
Capabilities
Find shared DNA and genes between individuals
Compute centiMorgans (cMs) of shared DNA using a variety of genetic maps (e.g., HapMap Phase II, 1000 Genomes Project)
Plot shared DNA between individuals
Find discordant SNPs between child and parent(s)
Read, write, merge, and remap SNPs for an individual via the snps package
Supported Genotype Files
lineage supports all genotype files supported by snps.
Installation
lineage is available on the Python Package Index. Install lineage (and its required Python dependencies) via pip:
$ pip install lineage
Also see the installation documentation.
Dependencies
lineage requires Python 3.8+ and the following Python packages:
Examples
Initialize the lineage Framework
Import Lineage and instantiate a Lineage object:
>>> from lineage import Lineage >>> l = Lineage()
Download Example Data
First, let’s setup logging to get some helpful output:
>>> import logging, sys >>> logger = logging.getLogger() >>> logger.setLevel(logging.INFO) >>> logger.addHandler(logging.StreamHandler(sys.stdout))
Now we’re ready to download some example data from openSNP:
>>> paths = l.download_example_datasets() Downloading resources/662.23andme.340.txt.gz Downloading resources/662.ftdna-illumina.341.csv.gz Downloading resources/663.23andme.305.txt.gz Downloading resources/4583.ftdna-illumina.3482.csv.gz Downloading resources/4584.ftdna-illumina.3483.csv.gz
We’ll call these datasets User662, User663, User4583, and User4584.
Load Raw Data
Create an Individual in the context of the lineage framework to interact with the User662 dataset:
>>> user662 = l.create_individual('User662', ['resources/662.23andme.340.txt.gz', 'resources/662.ftdna-illumina.341.csv.gz']) Loading SNPs('662.23andme.340.txt.gz') Merging SNPs('662.ftdna-illumina.341.csv.gz') SNPs('662.ftdna-illumina.341.csv.gz') has Build 36; remapping to Build 37 Downloading resources/NCBI36_GRCh37.tar.gz 27 SNP positions were discrepant; keeping original positions 151 SNP genotypes were discrepant; marking those as null
Here we created user662 with the name User662. In the process, we merged two raw data files for this individual. Specifically:
662.23andme.340.txt.gz was loaded.
Then, 662.ftdna-illumina.341.csv.gz was merged. In the process, it was found to have Build 36. So, it was automatically remapped to Build 37 (downloading the remapping data in the process) to match the build of the SNPs already loaded. After this merge, 27 SNP positions and 151 SNP genotypes were found to be discrepant.
user662 is represented by an Individual object, which inherits from snps.SNPs. Therefore, all of the properties and methods available to a SNPs object are available here; for example:
>>> len(user662.discrepant_merge_genotypes) 151 >>> user662.build 37 >>> user662.build_detected True >>> user662.assembly 'GRCh37' >>> user662.count 1006960
As such, SNPs can be saved, remapped, merged, etc. See the snps package for further examples.
Compare Individuals
Let’s create another Individual for the User663 dataset:
>>> user663 = l.create_individual('User663', 'resources/663.23andme.305.txt.gz') Loading SNPs('663.23andme.305.txt.gz')
Now we can perform some analysis between the User662 and User663 datasets.
Find Discordant SNPs
First, let’s find discordant SNPs (i.e., SNP data that is not consistent with Mendelian inheritance):
>>> discordant_snps = l.find_discordant_snps(user662, user663, save_output=True) Saving output/discordant_snps_User662_User663_GRCh37.csv
All output files are saved to the output directory (a parameter to Lineage).
This method also returns a pandas.DataFrame, and it can be inspected interactively at the prompt, although the same output is available in the CSV file.
>>> len(discordant_snps.loc[discordant_snps['chrom'] != 'MT']) 37
Not counting mtDNA SNPs, there are 37 discordant SNPs between these two datasets.
Documentation
Documentation is available here.
Acknowledgements
Thanks to Whit Athey, Ryan Dale, Binh Bui, Jeff Gill, Gopal Vashishtha, CS50, and openSNP.
lineage incorporates code and concepts generated with the assistance of OpenAI’s ChatGPT . ✨
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.