# CIIProMol

### Outline -

1. From a set of SMILES
* obtain CIDS
* Get bioassays for each compound

2. Concatenate bioassays into a bioprofile
* eliminate bioassays based of number of actives
* remove invivo assays
* remove highly correlated assays

3. Find invitro/invivo correlations
* remove assays based on different stats
* use those stats to find bio nearest neighbors
* find the assays that minimize the difference in activity between target cmp and nn

A python class for extending [CIIPro]( functionality.
CIIProMol looks to extend the [Python rdkit API]
CIIProMol requires several packages (e.g., rdkit, Pandas, Numpy).
These dependencies are listed in the file `explicit-spec-file.txt` and
can be loaded directly into a new [conda]( environment.

Use the following code to install the required Python packages into a
new conda environment.

$ conda create --name ciipromol --file explicit-spec-file.txt

CIIProMol is broken down into three modules:
1) ``
2) ``
3) ``

### ciipromol

Contains `class CIIProMol`. A `CIIProMol` object can by instantiated
with an rdkit `Mol` object.

from ciipromol import *
from rdkit import Chem

mol = Chem.MolFromSmiles('CC(=O)OC1=CC=CC=C1C(=O)O')
cpm = CIIProMol(mol, activity=20.0)

If `cids=None` (default), all PubChem Compound ID's (CIDs) associated with
that structure will be retrieved. Biological assays associated with those CIDs
can be retrieved.


If `attribute=True` (default), the BioAssays can be acccessed as a Pandas DataFrame
using the attribute, `cpm.BioAssays`

### ciiprofiler

Contains `class CIIProfiler`. A `CIIProfiler` object can be instantiated by passing
a list of `CIIProMol` objects.

ciiprofiler = CIIProfiler([cpm1, cpm2, cmp3])

The function `MakeBioProfiler()` will concatenate the attributes `cpm1.BioAssays`,
`cpm2.BioAssays`, etc., into one matrix.

