Skip to main content

An implementation of Karger's Algorithm to find similar Genesets

Project description

roux-algo-geneweaver-kargers

WARNING: This package is under active development. The API may change at any time. Documentation may be out of date.

Installation

Set Up

It's usually a good idea to install this package in a python virtual environment.

python3 -m venv $NAME_OF_VENV
source $NAME_OF_VENV/bin/activate
pip install roux-algo-geneweaver-kargers

With Python Poetry

poetry new $NEW_PROJECT_NAME
cd $NEW_PROJECT_NAME
poetry add roux-also

Usage

Two data formats are currently supported, "Node and Edges List" and "Adjacency List" formats. If you use the "Adjacency List" format, you will need to convert it to the "Node and Edges List" format before using it with Karger's. We included utility methods to make this easy.

Load Data

from roux.algo.geneweaver.kargers import *
nodes, edges = karger_io.load_nodes_edges('nodes_edges_file.json')
Load Data from Adjacency List Source
from roux.algo.geneweaver.kargers import *
graph = karger_io.load_graph('graph_file.json')
nodes, edges = karger_tf.adj_graph_to_edge_list(graph)
edges = karger_tf.deduplicate_edge_list(edges)

Run Kargers on Data

from roux.algo.geneweaver.kargers import *
nodes, edges = karger_io.load_nodes_edges('nodes_edges_file.json')
k_inst = KargerMinCut(nodes, edges)
min_cut, best_cuts, result = k_inst.min_cut()

...

super_nodes = karger_tf.union_find_to_geneset_list(
   result.roots(),
   result.non_roots()
)

Developer Setup

Base Tools

  1. Install Git
  2. Install python3.9
    1. Note: install method varies by operating system
  3. Install poetry

Set up

  1. Clone this repository
  2. Move to cloned directory (e.g. cd cs5800-final-project)
  3. Run poetry install
  4. If you need to connect to the database, create a .env configuration file

You should now be able to use the python package:

  1. Run poetry shell
  2. Run python3
from roux.algo.geneweaver.kargers import kragers_poc as kpm
from roux.algo.geneweaver.kargers.utils import build_graph as bg
from roux.algo.geneweaver.kargers.db.session import SessionLocal

db = SessionLocal()
graph = bg.build_graph(db, {i for i in range(349100, 349110)})
result = kpm.kragers_poc_1(graph)

Create the Tier 2 Dataset

from roux.algo.geneweaver.kargers.utils import build_graph as bg
from roux.algo.geneweaver.kargers.db.session import SessionLocal

db = SessionLocal()
# The dataset is slightly smaller than 19000 nodes
graph = bg.get_adjacency_exclusive_new(db, 2, 19000)

To save the dataset

from roux.algo.geneweaver.kargers.utils import load_save_graph as ls

# Build the graph above
graph = ...

### Get all Tier 2 Genesets
ls.save_graph('filename.json', graph)

Get all Tier 2 Genesets

from roux.algo.geneweaver.kargers.utils import build_graph as bg
from roux.algo.geneweaver.kargers.db.session import SessionLocal

db = SessionLocal()
tier_2_genesets = bg.get_all_genesets_by_tier(db, 2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roux-algo-geneweaver-kargers-0.0.3.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file roux-algo-geneweaver-kargers-0.0.3.tar.gz.

File metadata

File hashes

Hashes for roux-algo-geneweaver-kargers-0.0.3.tar.gz
Algorithm Hash digest
SHA256 52e67a97c8f2ae4cf438dc6c09d33f95e62d051fa9d29c86191b001a6c8f3dd8
MD5 99edb868fcdb45c3e5ae90c36bf8a2db
BLAKE2b-256 b2140e629d5fe25464cbd0f5046fdbfe8aa6c7a6ecef629f8042587bac84a533

See more details on using hashes here.

File details

Details for the file roux_algo_geneweaver_kargers-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for roux_algo_geneweaver_kargers-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1558626dbefb356195dfb7f4098ce3c674a964b8503fd7ea1ca1a0555c9f6386
MD5 ec4262451a4a440ad09be6208f80514d
BLAKE2b-256 d9f5c06a2f98c39093d673c6e3f44a4e9dc08ce22b5add4f2fb5b0ee8759bec7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page