Methods for filtering for high-scoring genomic intervals
Project description
region-selection
Methods for filtering for high-scoring genomic intervals
Usage
Importing the module and creating a Selection instance
>>> from region_selection import Selection
>>> s = Selection()
Specify properties
>>> s.method = "pq"
>>> s.input_fn = "/Users/areynolds/Developer/Github/region_selection/tests/windows.fixed.25k.bed"
>>> s.bin_size = 200
>>> s.exclusion_span = 24800
The method can be one of pq, wis, or maxmean, for selecting from one of priority-queue, weighted interval scheduling, or max-mean window sweep methods, respectively.
The input_fn property points to a file on the file system. This is optional, unless using the read() method.
The bin_size and exclusion_span properties are integers. These represent the size of elements, and the distance required between them (exclusing the bin, itself).
The default values are 200 and 24800, respectively. This means bins are 200 nt wide, and we require at least 25000 nt of distance between any filtered bins.
Input data
You can read in data from a four-column, tab-delimited text file:
>>> in_df = s.read(s, s.method, s.input_fn)
[region_selection] Reading input file into dataframe...
[region_selection] Read dataframe
Otherwise, you must provide a Pandas dataframe containing four columns, each labeled: Chromosome, Start, End, and Score, respecively.
In the above snippet, the input dataframe is called in_df.
Running the selection method
Use run() to run the specified method on the input dataframe in_df (or whatever its name is):
>>> out_df = s.run(s, s.method, in_df)
[region_selection] Bin size (nt): 200
[region_selection] Exclusion span (nt): 24800
[region_selection] Exclusion bins: 124
[region_selection] Method: Priority-Queue (PQ)
[region_selection] Constructing heap
[region_selection] Constructing qualifying bin list from heap
[region_selection] Returning sorted bin list
[region_selection] Method (runtime in sec): 140.50703937999998
The result is stored as a Pandas dataframe. Here it is called out_df and you can call all the usual Pandas properties on this:
>>> print(out_df.head())
Chromosome Start End Score
47 chr1 9400 34400 0.41
172 chr1 34400 59400 0.41
304 chr1 60800 85800 0.41
429 chr1 85800 110800 0.41
554 chr1 110800 135800 0.41
Or use the write() to write to standard output:
>>> s.write(out_df)
...
Or write with to_csv() etc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file region_selection_apr-0.1.0.tar.gz.
File metadata
- Download URL: region_selection_apr-0.1.0.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e524774e921f40e423e72a6d9f104d91a008d756fbb32a34fc55c61cb007b376
|
|
| MD5 |
b3e90ca1081d342d2d13322465f40636
|
|
| BLAKE2b-256 |
9d04d1996e6ed8967be61e8137c127c8ec5e6e7ca5b33e2c2ee47a50f6feb823
|
File details
Details for the file region_selection_apr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: region_selection_apr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd6677776e223b89a99aa367c0fc71aae2ca9e3b587cc8bf08f5b039022c0bad
|
|
| MD5 |
0b839702f6ea6d62c70ca580542f78a9
|
|
| BLAKE2b-256 |
f8588b1c38670d08c87389b98a9a748e10d9d09f8ead28a3476772316dbf49a7
|