Skip to main content

Methods for filtering for high-scoring genomic intervals

Project description

region-selection

Methods for filtering for high-scoring genomic intervals

Usage

Importing the module and creating a Selection instance

>>> from region_selection import Selection
>>> s = Selection()

Specify properties

>>> s.method = "pq"
>>> s.input_fn = "/Users/areynolds/Developer/Github/region_selection/tests/windows.fixed.25k.bed"
>>> s.bin_size = 200
>>> s.exclusion_span = 24800

The method can be one of pq, wis, or maxmean, for selecting from one of priority-queue, weighted interval scheduling, or max-mean window sweep methods, respectively.

The input_fn property points to a file on the file system. This is optional, unless using the read() method.

The bin_size and exclusion_span properties are integers. These represent the size of elements, and the distance required between them (exclusing the bin, itself).

The default values are 200 and 24800, respectively. This means bins are 200 nt wide, and we require at least 25000 nt of distance between any filtered bins.

Input data

You can read in data from a four-column, tab-delimited text file:

>>> in_df = s.read(s, s.method, s.input_fn)
[region_selection] Reading input file into dataframe...
[region_selection] Read dataframe

Otherwise, you must provide a Pandas dataframe containing four columns, each labeled: Chromosome, Start, End, and Score, respecively.

In the above snippet, the input dataframe is called in_df.

Running the selection method

Use run() to run the specified method on the input dataframe in_df (or whatever its name is):

>>> out_df = s.run(s, s.method, in_df)
[region_selection] Bin size (nt): 200
[region_selection] Exclusion span (nt): 24800
[region_selection] Exclusion bins: 124
[region_selection] Method: Priority-Queue (PQ)
[region_selection] Constructing heap
[region_selection] Constructing qualifying bin list from heap
[region_selection] Returning sorted bin list
[region_selection] Method (runtime in sec): 140.50703937999998

The result is stored as a Pandas dataframe. Here it is called out_df and you can call all the usual Pandas properties on this:

>>> print(out_df.head())
    Chromosome   Start     End  Score
47        chr1    9400   34400   0.41
172       chr1   34400   59400   0.41
304       chr1   60800   85800   0.41
429       chr1   85800  110800   0.41
554       chr1  110800  135800   0.41

Or use the write() to write to standard output:

>>> s.write(out_df)
...

Or write with to_csv() etc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

region_selection_apr-0.1.0.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

region_selection_apr-0.1.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file region_selection_apr-0.1.0.tar.gz.

File metadata

  • Download URL: region_selection_apr-0.1.0.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.3

File hashes

Hashes for region_selection_apr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e524774e921f40e423e72a6d9f104d91a008d756fbb32a34fc55c61cb007b376
MD5 b3e90ca1081d342d2d13322465f40636
BLAKE2b-256 9d04d1996e6ed8967be61e8137c127c8ec5e6e7ca5b33e2c2ee47a50f6feb823

See more details on using hashes here.

File details

Details for the file region_selection_apr-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for region_selection_apr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd6677776e223b89a99aa367c0fc71aae2ca9e3b587cc8bf08f5b039022c0bad
MD5 0b839702f6ea6d62c70ca580542f78a9
BLAKE2b-256 f8588b1c38670d08c87389b98a9a748e10d9d09f8ead28a3476772316dbf49a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page