Skip to main content

RFMix-reader is a Python package designed to efficiently read and process output files generated by RFMix, a popular tool for estimating local ancestry in admixed populations. The package employs a lazy loading approach, which minimizes memory consumption by reading only the loci that are accessed by the user, rather than loading the entire dataset into memory at once.

Project description

RFMix-reader

RFMix-reader is a Python package designed to efficiently read and process output files generated by RFMix, a popular tool for estimating local ancestry in admixed populations. The package employs a lazy loading approach, which minimizes memory consumption by reading only the loci that are accessed by the user, rather than loading the entire dataset into memory at once. Additionally, we leverage GPU acceleration to improve computational speed.

Install

rfmix-reader can be installed using pip:

pip install rfmix-reader

GPU Acceleration: rfmix-reader leverages GPU acceleration for improved performance. To use this functionality, you will need to install the following libraries for your specific CUDA version:

  • RAPIDS: Refer to official installation guide here
  • PyTorch: Installation instructions can be found here

Additoinal Notes:

  • We have not tested installation with Docker or Conda environemnts. Compatibility may vary.
  • If you do not have GPU, you can still use the basic functionality of rfmix-reader. This is still much faster than processing the files with stardard scripting.

Key Features

Lazy Loading

  • Reads data on-the-fly as requested, reducing memory footprint.
  • Ideal for working with large RFMix output files that may not fit entirely in memory.

Efficient Data Access

  • Provides convenient access to specific loci or regions of interest.
  • Allows for selective loading of data, enabling faster processing times.

Seamless Integration

  • Designed to work seamlessly with existing Python data analysis workflows.
  • Facilitates downstream analysis and manipulation of RFMix output data.

Whether you're working with large-scale genomic datasets or have limited computational resources, RFMix-reader offers an efficient and memory-conscious solution for reading and processing RFMix output files. Its lazy loading approach ensures optimal resource utilization, making it a valuable tool for researchers and bioinformaticians working with admixed population data.

Simulation data

Simulation data is available for testing two and three population admixture on Synapse: syn61691659.

Usage

This works similarly to pandas-plink:

Two population admixture example

This is a two part process.

Generate binary files

To reduce computational time and memory, we leverage binary files. As this is not generated by RFMix, we provide a function to do this before running.

from rfmix_reader import create_binaries

# Generate binary files
file_path = "examples/two_popuations/out/"
binary_dir = "./binary_files"
create_binaries(file_path, binary_dir=binary_dir)

You can also do this on the fly.

from rfmix_reader import read_rfmix

file_path = "examples/two_popuations/out/"
binary_dir = "./binary_files"
loci, rf_q, admix = read_rfmix(file_path, binary_dir=binary_dir,
                               generate_binary=True)

We do not have this turned on by default, as it is the rate limiting step. It can take upwards of 20 to 25 minutes to run depending on *fb.tsv file size.

Main function

Once binary files are generated, you can the main function to process the RFMix results. With GPU this takes less than 5 minutes.

from rfmix_reader import read_rfmix

file_path = "examples/two_popuations/out/"
loci, rf_q, admix = read_rfmix(file_path)

Note: ./binary_files is the default for binary_dir, so this is an optional parameter.

Three population admixture example

RFMix-reader is adaptable for as many population admixtures as needed.

from rfmix_reader import read_rfmix

file_path = "examples/three_popuations/out/"
binary_dir = "./binary_files"
loci, rf_q, admix = read_rfmix(file_path, binary_dir=binary_dir,
                               generate_binary=True)

Authors

Citation

If you use this software in your work, please cite it. DOI

Benjamin, K. J. M. (2024). RFMix-reader (Version v0.1.15) [Computer software]. https://github.com/heart-gen/rfmix_reader

Kynon JM Benjamin. "RFMix-reader: Accelerated reading and processing for local ancestry studies." bioRxiv. 2024. DOI: 10.1101/2024.07.13.603370.

Funding

This work was supported by grants from the National Institutes of Health, National Institute on Minority Health and Health Disparities (NIMHD) K99MD016964.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rfmix_reader-0.1.17.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

rfmix_reader-0.1.17-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file rfmix_reader-0.1.17.tar.gz.

File metadata

  • Download URL: rfmix_reader-0.1.17.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.18 Linux/5.14.0-427.13.1.el9_4.x86_64

File hashes

Hashes for rfmix_reader-0.1.17.tar.gz
Algorithm Hash digest
SHA256 b623e2de33f680651fb1a248200578effe6f36d5d5724de625295b93ca64e524
MD5 5d82cf30ae12d9f7bb87d312a1bf9783
BLAKE2b-256 139a15c79ec479e59d71fc943432d7a0a497e0d6e971ff4a4711e4805d865232

See more details on using hashes here.

File details

Details for the file rfmix_reader-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: rfmix_reader-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.18 Linux/5.14.0-427.13.1.el9_4.x86_64

File hashes

Hashes for rfmix_reader-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 1e9b9c756a59ebdb5e76a719e627e9fb687e5a0bd3973f60a523d927ee315474
MD5 f84b49edd379c19f757916cb3cf52d4b
BLAKE2b-256 e3c849c1d850a3b74c97d321aa8779bd98b464225b1964db73a892859ef4e3ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page