Skip to main content

Efficient subroutines for computing summary statistics for the SAM FLAG field

Project description

pyflagstats

PyPI version

Given a stream of k-bit words, we seek to sum the bit values at indexes 0, 1, 2, ..., k-1 across multiple words by computing k distinct sums. If the k-bit words are one-hot encoded then the sums corresponds to their frequencies.

This multiple-sum problem is a generalization of the population-count problem where we count the total number of set bits in independent machine words. We refer to this new problem as the positional population-count problem.

Using SIMD (Single Instruction, Multiple Data) instructions from recent Intel processors, we describe algorithms for computing the 16-bit position population count using about one eighth (0.125) of a CPU cycle per 16-bit word. Our best approach is about 140-fold faster than competitive code using only non-SIMD instructions in terms of CPU cycles.

This package contains native Python bindings for the applying the efficient positional population count operator to computing summary statistics for the SAM FLAG field

Intallation

Install with

pip3 install .

or locally with

python3 setup.py build_ext --inplace

Uninstall with

pip3 uninstall pyflagstats

Example

import numpy as np
import pyflagstats as fs

# Compute summary statistics for 100 million random FLAG fields.
# Completes in around 1 second.
fs.flagstats(np.random.randint(0,8192,100000000,dtype="uint16"))

returns (for example)

{'passed': array([ 624787,  312748, 2500089,  312384,  312314,  312678,  312045,
        311845, 2499502, 4999279, 2497500, 1248979,  389744,  156194,
        156029,       0], dtype=uint32), 'failed': array([ 625143,  312906, 2498840,  312818,  312129,  312802,  311869,
        312105, 2501477, 5000721, 2499178, 1249105,  390962,  155828,
        156018,       0], dtype=uint32)}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyflagstats-0.1.4.tar.gz (148.7 kB view details)

Uploaded Source

File details

Details for the file pyflagstats-0.1.4.tar.gz.

File metadata

  • Download URL: pyflagstats-0.1.4.tar.gz
  • Upload date:
  • Size: 148.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.8

File hashes

Hashes for pyflagstats-0.1.4.tar.gz
Algorithm Hash digest
SHA256 4866ec1f495b4ae4695367a55b39bc2fc2154e568cbb6484ec86590e8846dee3
MD5 2dd92c703d7f25407a498e49617cf955
BLAKE2b-256 42924ef2e4356cf1f7468b3ac4d0d1770a9a690bda08b6ddf0c90266e0bf92e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page