The Transferase Python API

These details have not been verified by PyPI

Project links

Project description

pyxfr

Transferase enables access to massive volumes of remotely stored sequencing-based whole genome DNA methylation profiles, with the aim delivering results at speeds expected from locally stored data. Transferase include a collection of data formats and algorithms for fast computation of methylation levels through arbitrary genomic intervals, supporting flexible queries for the most useful summary statistics. The client apps and APIs of transferase are designed to facilitate exploratory data analysis and hypothesis testing. The public transferase server interfaces with MethBase2, a database that includes over 13,000 high-quality WGBS methylomes from mammalian species (04/2025). Since version 0.6.0, transferase clients include command line apps for Linux and macOS, along with a Python package and an R package.

The pyxfr Python package is an API for transferase. This package allows the same queries to be done within Python as with the transferase command line app. Almost all other utilities for manipulating transferase data are available through pyxfr.

Although documentation is still sparse for pyxfr, each class and function in pyxfr has built-in documentation:

from pyxfr import *
help(pyxfr)

Requirements

Linux: Python >= 3.12
macOS: Python >= 3.12 and macOS >= 13 (at least Ventura)

Usage examples

First we import the pyxfr module so we can set our preferred log level for the session or in your Python scripts. Setting it to "debug" let's us see everything. It will be a lot of information, most of it actually for debugging.

import pyxfr
from pyxfr import LogLevel
pyxfr.set_log_level(LogLevel.debug)

Next we want to set up transferase for the user (i.e., you) on the host system (e.g., your laptop). The following will do a default setup, and might take up to a minute:

from pyxfr import MConfig
config = MConfig()
config.install(["hg38"])

This will create files in ~/.config/transferase which are safe to delete anytime because you can just run the same command again. The reason this is done in two steps is because you might want to change something in the config before doing the installation. By typing the name of the variable config you will see a dump of its values. For now, leave them as their default values -- they only need to be changed if you are using local data.

You can select other genomes in the install step (e.g., mm39, rn7, bosTau9, etc.). If the genomes don't exist or are not on the server, you should see a RuntimeError exception in Python, indicating a problem downloading. The server can't tell the difference between an invalid genome assembly name, one that is misspelled, and a real one that simply isn't on the server. You can find the list of available genomes by checking out MethBase2 through the UCSC Genome Browser, or using a command I will show below.

With the setup has completed, we can get a client object:

from pyxfr import MClient
client = MClient()

The client object is what makes the queries. Our query will be based on a set of genomic intervals, which you would get from a BED format file. However, before working with the genomic intervals we need to first load a genome index, which guarantees that we are working with the exact reference genome that the transferase server expects.

from pyxfr import GenomeIndex
genome_index = GenomeIndex.read(client.get_index_dir(), "hg38")

We will now read genomic intervals. If you have a BED format file for hg38, for example around 100k intervals, you can use it. Otherwise you can find the intervals.bed.gz in the docs directory of the repo (likely alongside this file), gunzip it and put it in your working directory.

from pyxfr import GenomicInterval
intervals = GenomicInterval.read(genome_index, "intervals.bed")

At this point we can do a query:

levels = client.get_levels(["ERX9474770","ERX9474769"], intervals)

The levels is an object of class MLevels, which is a matrix where rows correspond to intervals and columns correspond to methylomes in the query.

The following loop will allow us to see the first 10 methylation levels that were retrieved for the second (ERX9474769) of the two methylomes in our query:

print("\n".join([str(levels.at(i, 1)) for i in range(10)]))

It should look like this:

(4, 1)
(1, 471)
(0, 0)
(45, 29)
(0, 0)
(334, 346)
(62, 1581)
(51, 1755)
(74, 664)
(199, 1753)

If you are doing multiple queries that involve the same set of genomic intervals, it's more efficient to make a MQuery object out of them. This is done internally by the query above, but the work to do it can be skipped if they you already have them. Here is an example:

query = genome_index.make_query(intervals)
levels = client.get_levels_covered(["ERX9474770","ERX9474769"], query)

If you want to see the methylation levels alongside the original genomic intervals, including printing the chromosome names for each genomic interval, you can do it as follows:

for col in range(levels.n_methylomes):
    for row in range(10):
        print(intervals[row].to_string(genome_index), levels.at(row, col))
    print()

The at function for class MLevels will return a tuple of (n_meth,n_unmeth) and if the _covered version of the query was used, it will return a tuple of (n_meth,n_unmeth,n_covered). These create the tuple objects that they return Python. If you want the corresponding values without creating the tuple, you can get them with the get_n_meth(i,j), get_n_unmeth(i,j) and get_n_covered(i,j) functions, where i is the row/interval, and j is the column/methylome. Here is an example:

for i in range(10):  # 10 for brevity
    for j in range(levels.n_methylomes):
        print(levels.get_n_covered(i, j), end=' ')
    print()

The get_n_covered(i,j) will raise an exception if the information about covered sites was not requested in the query. There is also a get_wmean(i,j) function that can be applied similarly, which gives the weighted mean methylation level for the corresponding interval. This is likely the most useful summary statistic in all of methylome analysis.

You can convert an MLevels object into a numpy array, which is already familiar to many Python users:

a = levels.view_nparray()
methylome_names = ["ERX9474770","ERX9474769"]
assert a.shape == (len(methylome_names), len(intervals), 2)

The intervals in that command came from an earlier command above. The full set of weighted mean methylation levels can be obtained as a numpy array (matrix) using the all_wmeans(min_reads) function, applied to the entire levels object. The min_reads parameter indicates a value below which the fraction is not interpretable. This is needed, because the information about whether there are even any reads at all for a given interval would be lost. Entries without enough reads are assigned a value of -1.0. Here is an example (the output is a numpy array):

min_reads = 2
means = levels.all_wmeans(min_reads)
print(means[0:10])

As we have seen, the queries named with _covered return, along with methylation levels, the number of CpG sites covered by reads in each query interval. It's also helpful to know how many total CpG sites are in the interval. This is a property of the reference genome, not a particular methylome. If you have a set of intervals and a GenomeIndex, you can get the total number of CpG sites in each interval like this:

n_cpgs_intervals = genome_index.get_n_cpgs(intervals)
print("\n".join([str(i) for i in n_cpgs_intervals[0:10]]))

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.4

Jul 18, 2025

0.6.3

Jun 28, 2025

0.6.2

May 11, 2025

0.6.1

Apr 8, 2025

This version

0.6.0

Apr 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyxfr-0.6.0-cp313-none-manylinux_2_17_x86_64.whl (2.3 MB view details)

Uploaded Apr 3, 2025 CPython 3.13manylinux: glibc 2.17+ x86-64

pyxfr-0.6.0-cp313-none-macosx_13_0_x86_64.whl (2.4 MB view details)

Uploaded Apr 3, 2025 CPython 3.13macOS 13.0+ x86-64

pyxfr-0.6.0-cp313-none-macosx_13_0_arm64.whl (2.3 MB view details)

Uploaded Apr 3, 2025 CPython 3.13macOS 13.0+ ARM64

pyxfr-0.6.0-cp312-none-manylinux_2_17_x86_64.whl (2.3 MB view details)

Uploaded Apr 3, 2025 CPython 3.12manylinux: glibc 2.17+ x86-64

pyxfr-0.6.0-cp312-none-macosx_13_0_x86_64.whl (2.4 MB view details)

Uploaded Apr 3, 2025 CPython 3.12macOS 13.0+ x86-64

pyxfr-0.6.0-cp312-none-macosx_13_0_arm64.whl (2.3 MB view details)

Uploaded Apr 3, 2025 CPython 3.12macOS 13.0+ ARM64

File details

Details for the file pyxfr-0.6.0-cp313-none-manylinux_2_17_x86_64.whl.

File metadata

Download URL: pyxfr-0.6.0-cp313-none-manylinux_2_17_x86_64.whl
Upload date: Apr 3, 2025
Size: 2.3 MB
Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pyxfr-0.6.0-cp313-none-manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`9cad2e0201c8d63b0f4badd223822542c81f00bbbb50a985c7e034a9a7b6fe85`
MD5	`c612bec0cd601620269e7f9c99e31b94`
BLAKE2b-256	`033c5ed963fc0995e49ea715b4c031197e21d4474b0e16cbc9d40b84c778a697`

See more details on using hashes here.

File details

Details for the file pyxfr-0.6.0-cp313-none-macosx_13_0_x86_64.whl.

File metadata

Download URL: pyxfr-0.6.0-cp313-none-macosx_13_0_x86_64.whl
Upload date: Apr 3, 2025
Size: 2.4 MB
Tags: CPython 3.13, macOS 13.0+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pyxfr-0.6.0-cp313-none-macosx_13_0_x86_64.whl
Algorithm	Hash digest
SHA256	`6fa77056389231de3d8e5067e2806b6be00abd40f9731a3d888d25d2c97d1404`
MD5	`a8c26f4b4cb1222c6844e74ecb9d643c`
BLAKE2b-256	`033c731451eabcecdac7d78c53aeb668ab85b47086066634f12b4406234c860d`

See more details on using hashes here.

File details

Details for the file pyxfr-0.6.0-cp313-none-macosx_13_0_arm64.whl.

File metadata

Download URL: pyxfr-0.6.0-cp313-none-macosx_13_0_arm64.whl
Upload date: Apr 3, 2025
Size: 2.3 MB
Tags: CPython 3.13, macOS 13.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pyxfr-0.6.0-cp313-none-macosx_13_0_arm64.whl
Algorithm	Hash digest
SHA256	`020b731125af8309f2f9cf605c6764cd03f6a04ddd5d1d59a18f44d9306531e9`
MD5	`ad5d86f66bfc3a195d31edbba2215f6c`
BLAKE2b-256	`c4a797927ce0664b198fac126d69c0d8789560911fe1028f90be878208ca7ee0`

See more details on using hashes here.

File details

Details for the file pyxfr-0.6.0-cp312-none-manylinux_2_17_x86_64.whl.

File metadata

Download URL: pyxfr-0.6.0-cp312-none-manylinux_2_17_x86_64.whl
Upload date: Apr 3, 2025
Size: 2.3 MB
Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pyxfr-0.6.0-cp312-none-manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`353894e37a7741d5138e5bfeaba358abdf79b35a4dd0cd1c73bfe17611a082be`
MD5	`64dbe6227b11fd70004716171618cd5d`
BLAKE2b-256	`47688b163e605d3778a4c88147a89a59e28ad09f993721c6217e4468f8b9058c`

See more details on using hashes here.

File details

Details for the file pyxfr-0.6.0-cp312-none-macosx_13_0_x86_64.whl.

File metadata

Download URL: pyxfr-0.6.0-cp312-none-macosx_13_0_x86_64.whl
Upload date: Apr 3, 2025
Size: 2.4 MB
Tags: CPython 3.12, macOS 13.0+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pyxfr-0.6.0-cp312-none-macosx_13_0_x86_64.whl
Algorithm	Hash digest
SHA256	`3ee2dc5d8a664d3ff330a12eb85d610dd37bcb1b4ce241f91024b006e7b2be3b`
MD5	`8d0e2c83b31e185ccf541358144d7056`
BLAKE2b-256	`50f2cc127d06fed5c8594431e4bb7bb5d7ef0e94e1e1225fc47cfe9360f4bbe1`

See more details on using hashes here.

File details

Details for the file pyxfr-0.6.0-cp312-none-macosx_13_0_arm64.whl.

File metadata

Download URL: pyxfr-0.6.0-cp312-none-macosx_13_0_arm64.whl
Upload date: Apr 3, 2025
Size: 2.3 MB
Tags: CPython 3.12, macOS 13.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pyxfr-0.6.0-cp312-none-macosx_13_0_arm64.whl
Algorithm	Hash digest
SHA256	`6b7884a53e11a5d3b57523da4a1df8ba60f30b5d26cbbeedd5a98b711e41d8c7`
MD5	`121d7ecfdff8f1cb6fb46b61653cb26e`
BLAKE2b-256	`e556bf57b7cb51cb80472864405f7ce1721e68e88661d93e871607d9e7b0e412`

See more details on using hashes here.

pyxfr 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyxfr

Requirements

Usage examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes