Skip to main content

Optimization Package to phase haplotypes and infer local ancestry

Project description

Loter Python package

Loter can be used for haplotype phasing and local ancestry inference.

Loter is available under the MIT license. Copyright 2017 - Inria, UGA, CNRS.

If you encounter any problem or if you have questions regarding Loter, please open an issue on Github, or you can contact us at loter.dev@inria.fr.


Requirements

The package requires BLAS/LAPACK libraries, OpenMP (optional but recommended for parallel computing), a C++ compiler (tested with g++) and Python 3 (for the Python package). In addition, a version for R is under development.

:information_source: When using Python precompiled package for Linux 64bits, only Python 3 is required (other required libraries are included in the precompiled package).


Installation

Python package

From PyPI

Run:

pip install loter

:warning: Precompiled package is only available for Linux 64bits system. On other systems (Linux 32bits, MacOS, Windows), the package will need to be compiled at installation. You must prepare your system by installing all requirements (c.f. previous section) including OpenMP (mandatory for installation with pip, see below if you do not have OpenMP).

From precompiled package (only for Linux 64bits)

Go to the release page on Github and download the loter-*.whl file corresponding to your Python version, then install Loter by running:

pip install loter-*.whl

Note: replace loter-*.whl by the full name of the file you downloaded.

From sources

To get Loter sources:

git clone https://github.com/bcm-uga/Loter.git

To install the loter Python package:

# go to package source dir
cd Loter/python-package/
# install
pip install -e .
# or
python setup.py install

The following Python packages will be installed during the process as dependencies: numpy, pandas, scikit-learn, scipy. If not, you may have to install them before installing loter, for instance with the command pip install numpy pandas scikit-learn scipy.

To install loter locally and avoid messing with your system, you can do:

python setup.py install --user

or you can use a Python virtual environment (recommended) or a specific Python distribution like Anaconda.

If you do not have OpenMP on your system (especially for MacOS users), you can do:

python setup.py install --no_openmp
# or
python setup.py install --user --no_openmp

R package

A version of Loter will be soon available for R.

:warning: The development of the version for R is currently paused.


Using Loter Python package

Here are some details about how to run Loter for local ancestry inference (LAI) [1] and haplotype phasing [2].

Local Ancestry Inference (LAI)

Local ancestry inference with Loter (see [1] for details) in Python is explained in the following Jupyter notebook tutorial: Local Ancestry Example and corresponding markdown transcription.

To access it, you can do:

# install jupyter (if not available)
pip install jupyter
# go to package source dir
cd Loter/python-package/
# run jupyter
jupyter notebook

Note: The tutorial requires the following additional Python packages: matplotlib and scikit-allel (you can run pip install matplotlib scikit-allel to get them).

In addition, here is a small example of local ancestry inference with Loter:

import os
import numpy as np

# admixed haplotypes
H_adm = np.load(os.path.expanduser("FILE1")) # replace FILE1 by your data file name
# ref 1 haplotypes
H_ref1 = np.load(os.path.expanduser("FILE2")) # replace FILE2 by your data file name
# ref 2 haplotypes
H_ref2 = np.load(os.path.expanduser("FILE3")) # replace FILE3 by your data file name

# Loter local ancestry inference module
import loter.locanc.local_ancestry as lc

## Loter with bagging and phase correction module
res_loter = lc.loter_smooth(l_H=[H_ref1, H_ref2], h_adm=H_adm, num_threads=8) ## set the number of threads
## Loter with bagging only
res_loter = lc.loter_local_ancestry(l_H=[H_ref1, H_ref2], h_adm=H_adm, num_threads=8) ## set the number of threads

Note: More details are given in the notebook, especially how to load data from VCF files if your data are not available as Numpy arrays.

Simulations of admixed individuals: informations about data simulation are available here.

Comand line tool

With the Python package installation comes a command line interface loter_cli for local ancestry inference that allows you to directly call Loter from the command line without writing your own Python script.

It requires that your haplotype input data are stored as saved Numpy arrays, in csv text files (experimental) or in VCF files. In any case, your input haplotype matrices should be organised as follows: with haplotypes (samples) in rows and SNPs in columns. Ancestries of admixed haplotypes inferred by Loter will be stored in the same way.

# help
loter_cli -h
# examples run in Loter project root directory
cd Loter
# Loter with bagging
loter_cli -r data/H_ceu.npy data/H_yri.npy -a data/H_mex.npy -f npy -o tmp.npy -n 8 -v
# Loter with bagging and phase correction
loter_cli -r data/H_ceu.npy data/H_yri.npy -a data/H_mex.npy -f npy -o tmp.npy -n 8 -pc -v

Important: When using text format (csv) for input data, missing values should be encoded as 255 or NA.


Phasing

Two methods to run the package to phase genotypes into haplotypes (see [2] for details):

import os
import numpy as np

# Directly run the C++ function

import haplophase.wrapper_cpp as hap

G = np.load(os.path.expanduser("FILE")) # replace FILE by your data file name
G_res = np.copy(G)
H = hap.wrapper_all(G=G_res, k=k, nb_iter=20, nb_run=10, w=100, penalty=2.0)

# You get the imputed genotype matrix in G_res and H the haplotypes.
# You can create your own pipeline or select one already existing

import haplophase.pipeline as pipeline
G = np.load(os.path.expanduser("FILE"))

l_res = pipeline.pipelines["classic_pipeline"].run(np.copy(G), nbrun=10, nb_iter=20, nb_run=10, w=100, penalty=2.0)
# You get a list of results that you can combine.

G_res = combine.combiner_G["G vote"](l_res)
H_res = combine.combiner_H["H_mean"](l_res)

References

[1] Dias-Alves, T., Mairal, J., Blum, M.G.B., 2018. Loter: A Software Package to Infer Local Ancestry for a Wide Range of Species. Mol Biol Evol 35, 2318–2326. https://doi.org/10.1093/molbev/msy126

[2] Dias Alves, T., 2017. Modélisation du déséquilibre de liaison en génomique des populations par méthodes l’optimisation. PhD manuscript. Grenoble Alpes University. http://www.theses.fr/2017GREAS052

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loter-1.0.1.tar.gz (24.6 kB view details)

Uploaded Source

File details

Details for the file loter-1.0.1.tar.gz.

File metadata

  • Download URL: loter-1.0.1.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for loter-1.0.1.tar.gz
Algorithm Hash digest
SHA256 5c6dc61b3195342a9ef46454b2d1319c0d2339a215768b15746de7aa1585aea7
MD5 73fb0631d3a67a963fb3924c12d49578
BLAKE2b-256 32fce17a168e6580142fa37df0b80e603820b32477d2b1170dc1dc335e3adab6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page