RFMix-reader is a Python package designed to efficiently read and process output files generated by RFMix, a popular tool for estimating local ancestry in admixed populations. The package employs a lazy loading approach, which minimizes memory consumption by reading only the loci that are accessed by the user, rather than loading the entire dataset into memory at once.
Project description
RFMix-reader
RFMix-reader
is a Python package designed to efficiently read and process output
files generated by RFMix
,
a popular tool for estimating local ancestry in admixed
populations. The package employs a lazy loading approach, which minimizes memory
consumption by reading only the loci that are accessed by the user, rather than
loading the entire dataset into memory at once. Additionally, we leverage GPU
acceleration to improve computational speed.
Install
rfmix-reader
can be installed using pip:
pip install rfmix-reader
GPU Acceleration:
rfmix-reader
leverages GPU acceleration for improved performance. To use this
functionality, you will need to install the following libraries for your specific
CUDA version:
RAPIDS
: Refer to official installation guide herePyTorch
: Installation instructions can be found here
Additoinal Notes:
- We have not tested installation with
Docker
orConda
environemnts. Compatibility may vary. - If you do not have GPU, you can still use the basic functionality of
rfmix-reader
. This is still much faster than processing the files with stardard scripting.
Key Features
Lazy Loading
- Reads data on-the-fly as requested, reducing memory footprint.
- Ideal for working with large RFMix output files that may not fit entirely in memory.
Efficient Data Access
- Provides convenient access to specific loci or regions of interest.
- Allows for selective loading of data, enabling faster processing times.
Seamless Integration
- Designed to work seamlessly with existing Python data analysis workflows.
- Facilitates downstream analysis and manipulation of
RFMix
output data.
Whether you're working with large-scale genomic datasets or have limited
computational resources, RFMix-reader
offers an efficient and memory-conscious
solution for reading and processing RFMix
output files. Its lazy loading approach
ensures optimal resource utilization, making it a valuable tool for researchers
and bioinformaticians working with admixed population data.
Simulation data
Simulation data is available for testing two and three population admixture on Synapse: syn61691659.
Usage
This works similarly to pandas-plink
:
Two population admixture example
This is a two part process.
Generate binary files
To reduce computational time and memory, we leverage binary files. As this is not generated by RFMix, we provide a function to do this before running.
from rfmix_reader import create_binaries
# Generate binary files
file_path = "examples/two_popuations/out/"
binary_dir = "./binary_files"
create_binaries(file_path, binary_dir=binary_dir)
You can also do this on the fly.
from rfmix_reader import read_rfmix
file_path = "examples/two_popuations/out/"
binary_dir = "./binary_files"
loci, rf_q, admix = read_rfmix(file_path, binary_dir=binary_dir,
generate_binary=True)
We do not have this turned on by default, as it is the
rate limiting step. It can take upwards of 20 to 25 minutes
to run depending on *fb.tsv
file size.
Main function
Once binary files are generated, you can the main function to process the RFMix results. With GPU this takes less than 5 minutes.
from rfmix_reader import read_rfmix
file_path = "examples/two_popuations/out/"
loci, rf_q, admix = read_rfmix(file_path)
Note: ./binary_files
is the default for binary_dir
,
so this is an optional parameter.
Three population admixture example
RFMix-reader
is adaptable for as many population admixtures as
needed.
from rfmix_reader import read_rfmix
file_path = "examples/three_popuations/out/"
binary_dir = "./binary_files"
loci, rf_q, admix = read_rfmix(file_path, binary_dir=binary_dir,
generate_binary=True)
Authors
Citation
If you use this software in your work, please cite it.
Benjamin, K. J. M. (2024). RFMix-reader (Version v0.1.15) [Computer software]. https://github.com/heart-gen/rfmix_reader
Kynon JM Benjamin. "RFMix-reader: Accelerated reading and processing for local ancestry studies." bioRxiv. 2024. DOI: 10.1101/2024.07.13.603370.
Funding
This work was supported by grants from the National Institutes of Health, National Institute on Minority Health and Health Disparities (NIMHD) K99MD016964.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rfmix_reader-0.1.17.tar.gz
.
File metadata
- Download URL: rfmix_reader-0.1.17.tar.gz
- Upload date:
- Size: 31.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.9.18 Linux/5.14.0-427.13.1.el9_4.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b623e2de33f680651fb1a248200578effe6f36d5d5724de625295b93ca64e524 |
|
MD5 | 5d82cf30ae12d9f7bb87d312a1bf9783 |
|
BLAKE2b-256 | 139a15c79ec479e59d71fc943432d7a0a497e0d6e971ff4a4711e4805d865232 |
File details
Details for the file rfmix_reader-0.1.17-py3-none-any.whl
.
File metadata
- Download URL: rfmix_reader-0.1.17-py3-none-any.whl
- Upload date:
- Size: 32.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.9.18 Linux/5.14.0-427.13.1.el9_4.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e9b9c756a59ebdb5e76a719e627e9fb687e5a0bd3973f60a523d927ee315474 |
|
MD5 | f84b49edd379c19f757916cb3cf52d4b |
|
BLAKE2b-256 | e3c849c1d850a3b74c97d321aa8779bd98b464225b1964db73a892859ef4e3ac |