Skip to main content

Package was renamed from Biocarta v0.2.27 to Biocartograph because of an unintentional name clash

Project description

Biocartograph

Creating Cartographic Representations of Biological Data DOI

Installation

pip install biocartograph

You can also build a nix environment for code execution if you have installed the nix package manager. You can enter it via a terminal by issuing:

nix-shell versioned_R_and_Python.nix

Example code

We generally work with short, or compact, format data frames. One describing the analytes (often abbreviated "adf") :

NAME NGT_mm12_10591 ... DM2_mm81_10199
215538_at 16.826041 ... 31.764484
...
LDLR 19.261185 ... 30.004612

and one journal describing the sample metadata (often abbreviated "jdf") :

NGT_mm12_10591 ... DM2_mm81_10199
Disease Prostate Cancer ... Gastric Cancer
Cell-line 143B ... 22Rv1
Tissue Prostate ... Gastric Urinary tract

if these are stored as tab-delimited text files then it is straightforward to read them in from disc.

if __name__ == '__main__' :
    from biocartograph.quantification import full_mapping
    #
    adf = pd.read_csv('analytes.tsv',sep='\t',index_col=0)
    #
    # WE DO NOT WANT TO KEEP POTENTIALLY BAD ENTRIES 
    adf = adf.iloc[ np.inf != np.abs( 1.0/np.std(adf.values,1) ) ,
                    np.inf != np.abs( 1.0/np.std(adf.values,0) ) ].copy()
    #
    # READING IN SAMPLE INFORMATION
    # THIS IS NEEDED FOR THE ALIGNED PCA TO WORK
    jdf = pd.read_csv('journal.tsv',sep='\t',index_col=0)
    jdf = jdf.loc[:,adf.columns.values]

Next, we specify how to conduct the calculation

    consensus_labels = ['Tissue']
    results = full_mapping ( adf , jdf                                  ,
            bVerbose                    = True                          ,
            alignment_label             = alignment_label               ,
            umap_n_neighbors            = 20                            ,
            umap_local_connectivity     = 20.                           ,
            bUseUmap                    = False                         ,
            consensus_labels            = consensus_labels              ,
            distance_type               = 'coexpression'                ,
            hierarchy_cmd               = 'ward' ,
            directory                   = '../results' ,
            n_clusters                  = sorted([ 10 , 20 , 30 , 40 , 60 , 70 , 90 , 80 , 100 ,
                                                120 , 140 , 160 , 180 , 200 ,
                                                250 , 300 , 350 , 400 , 450 , 500 ,
                                                600 , 700 , 800 , 900 , 1000 ])  )
    #
    map_analytes        = results[0]
    map_samples         = results[1]
    hierarchy_analytes  = results[2]
    hierarchy_samples   = results[3]
    header_str = results[0].index.name

In this example, we didn't calculate any projection properties relating to the Cell-line label. We also decided on outputting some specific cuts through the hierarchical clustering solution corresponding to different amounts of clusters. We generate multivariate projected PCA files for all the consensus and alignment labels. Plotting the information on the map analytes PCA projections yields: Cancer Disease mPCA Example

You can also run an alternative algorithm where the UMAP coordinates are employed directly for clustering by setting bUseUmap=True with the following results, or download the gist zip and open the html index:

chromium index.html

Other generated solutions

The clustering visualisations were created using the Biocartograph and hvplot :

What groupings correspond to biomarker variance that describes them? Here are some visualisations of that:

Cell-line Diseases Tissues Single cells Brain tissues Blood immune cells

We can also make more elaborate visualisation applications with the information that the biocartograph calculates.

Enrichment results

If we have gmt files describing what groups of our analytes might be in then we can calculate enrichment properties for gene groupings (clusters). One resource for obtaining information is the Reactome database. If the pathway definitions are hierarchical then you can also supply the parent-child list and calculate treemap enrichments for all your clusters. Example of biocartograph treemap cluster

The code for doing it might look something like this :

    from biocartograph.special import generate_atlas_files
    import biocartograph.enrichment as bEnriched

    df_ = pd.read_csv( header_str + 'resdf_f.tsv' , index_col=0 , sep='\t' )
    df_ .loc[:,'cids.max' ]     = [ str(v) for v in df_.loc[:,'cids.max' ].values       ] # OPTIMAL SOLUTION
    enr_dict = bEnriched.calculate_for_cluster_groups ( df_ , label = 'cids.max' ,
                    gmtfile = '../data/Reactome/reactome_v71.gmt' , pcfile = '../data/Reactome/NewestReactomeNodeRelations.txt' ,
                    group_identifier = 'R-HSA' , significance_level = 0.1 )
    for item in enr_dict.items() :
        item[1].to_csv( header_str + 'treemap_c' + str(item[0])+'.tsv',sep='\t' )

You can also produce a gmt and pcfile of your own from the clustering solution labels:

    from biocartograph.special import generate_atlas_files , reformat_results_and_print_gmtfile_pcfile
    cl_gmtname , cl_pcname = reformat_results_and_print_gmtfile_pcfile ( header_str , hierarchy_id = 'cids.max', hierarchy_level_label = 'HCLN' )

For group factor enrichments simply use the bEnriched.from_multivariate_group_factors method instead. This will produce results that can be visualised like this: biocartograph gfa Reactome enrichment or the cluster label gfa enrichments

Creating a nested file structure

There is a function within the biocartograph package that can be used to package your generated results into a more easily parsed directory. This function can be called via :

generate_atlas_files ( header_str )

This will produce cluster annotation information taken from the enrichment files as well as the sample labels used.

Cell-line Disases Tissues Single cells Brain tissues Blood immune cells

Upcoming

Hopefully, an even more helpful wiki will be provided in the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biocartograph-0.10.1.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

biocartograph-0.10.1-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file biocartograph-0.10.1.tar.gz.

File metadata

  • Download URL: biocartograph-0.10.1.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.16

File hashes

Hashes for biocartograph-0.10.1.tar.gz
Algorithm Hash digest
SHA256 638621dc8a8a79525421ef1ffe7f68c87fe342d9c28dcd2947cd40c1a1a6d7e8
MD5 88d73fe13cccc7f616c13805c2b10173
BLAKE2b-256 1aa4b5040467a7ce208179155c0c6b5936e75510cb20185a564133180b9cbbce

See more details on using hashes here.

File details

Details for the file biocartograph-0.10.1-py3-none-any.whl.

File metadata

File hashes

Hashes for biocartograph-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a59b539dcefa2addf6a10685f97da68b597b5066ca79fd6e2faa1bc4d757582c
MD5 b41552b08451e439fffa5fcb40746af4
BLAKE2b-256 650c68c0528275a5b5737c7a12159eafcf04492e62855d46e6c28f8e3355020a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page