Skip to main content

Download-Annotate-TCGA: Facilitates the download of data and annotation with metadata from TCGA

Project description

Sci-dat: Download Annotate TCGA

codecov.io PyPI

A package developed to enable the download an annotation of TCGA data from https://portal.gdc.cancer.gov/

Docs

https://arianemora.github.io/scidat/

Install

pip install scidat

Use

API

The API combines the functions in Download and Annotation. It removes some of the ability to set specific directories etc but makes it easier to perform the functions.

See example notebook for how we get the following from the TCGA site:

    1. manifest_file
    2. gdc_client
    3. clinical_file
    4. sample_file
api = API(manifest_file, gdc_client, clinical_file, sample_file, requires_lst=None, clin_cols=None,
                 max_cnt=100, sciutil=None, split_manifest_dir='.', download_dir='.', meta_dir='.', sep='_')

Step 1. Download manifest data

# Downloads every file using default parameters in the manifest file
api.download_data_from_manifest()
# This will also unzip and copy the files all into one directory

Step 2. Annotation

# Builds the annotation information
api.build_annotation()

Step 3. Download mutation data

# Downloads all the mutation data for all the cases in the clinical_file
api.download_mutation_data()

Step 4. Generate RNAseq dataframe

# Generates the RNA dataframe from the downloaded folder
api.build_rna_df()

Step 5. Get cases that have any mutations or specific mutations

# Returns a list of cases that have mutations (either in any gene if gene_list = None or in specific genes)
list_of_cases = api.get_cases_with_mutations(gene_list=None, id_type='symbol')

# Get genes with a small deletion
filter_col = 'ssm.consequence.0.transcript.gene.symbol'
genes = api.get_mutation_values_on_filter(filter_col, ['Small deletion'], 'ssm.mutation_subtype')

# Get genes with a specifc genomic change: ssm.genomic_dna_change
filter_col = 'case_id'
cases =  api.get_mutation_values_on_filter(filter_col, ['chr13:g.45340134A>G'], 'ssm.genomic_dna_change')

Step 6. Get cases with specific metadata information

Metadata list:

submitter_id
project_id
age_at_index
gender
race
vital_status
tumor_stage
normal_samples
tumor_samples
case_files
tumor_stage_num
example: {'gender': ['female'], 'tumor_stage_num': [1, 2]}

Method can be any i.e. it satisfies any of the conditions, or all, a case has to satisfy all the conditions in the meta_dict

# Returns cases that have the chosen metadata information e.g. gender, race, tumour_stage_num
cases_list = api.get_cases_with_meta(meta: dict, method="all")

Step 7. Get genes with mutations

# Returns a list of genes with mutations for specific cases
list_of_genes = api.get_genes_with_mutations(case_ids=None, id_type='symbol')

Step 8. Get values from the dataframe

# Returns the values, columns, dataframe of a subset of the RNAseq dataframe
values, columns, dataframe = get_values_from_df(df: pd.DataFrame, gene_id_column: str, case_ids=None, gene_ids=None,
                           column_name_includes=None, column_name_method="all")

Download

# Downloads data using a manifest file
download = Download(manifest_file, split_manifest_dir, download_dir, gdc_client, max_cnt=100)
download.download()
# Downloads data from API to complement data from manifest file
# example datatype = mutation (this is the only one implemented for now)
download.download_data_using_api(case_ids: list, data_type: str)

Annotate

** Generate annotation using clinical information from TCGA **

annotator = Annotate(output_dir: str, clinical_file: str, sample_file: str, manifest_file: str, file_types: list,
                 sep='_', clin_cols=None)
# Generate the annotate dataframe
annotator.build_annotation()

# Save the dataframe to a csv file
annotator.save_annotation(output_directory: str, filename: str)

# Save the clinical information to a csv file
annotator.save_annotated_clinical_df(output_directory: str, filename: str)

** Download mutation data for the cases of interest ** Note we first need to download the data using the download_data_using_api from above.

annotator.build_mutation_df(mutation_dir)

# Get that dataframe
mutation_df = annotator.get_mutation_df()

# Save the mutation dataframe to a csv
annotator.save_mutation_df(output_directory: str, filename: str)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scidat-1.0.3.tar.gz (18.2 MB view details)

Uploaded Source

Built Distribution

scidat-1.0.3-py3-none-any.whl (18.2 MB view details)

Uploaded Python 3

File details

Details for the file scidat-1.0.3.tar.gz.

File metadata

  • Download URL: scidat-1.0.3.tar.gz
  • Upload date:
  • Size: 18.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for scidat-1.0.3.tar.gz
Algorithm Hash digest
SHA256 acf5f594bad2f9ba1ddf97e39c70cf79501d92bbd7f205690c108f99ad0bff01
MD5 0b130b6b8b9ff038392d242955aaf009
BLAKE2b-256 c04a67ab1ea652958ae78700610a8250518389be2a2d7e3f1c3fc134f40872d6

See more details on using hashes here.

File details

Details for the file scidat-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: scidat-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 18.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for scidat-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1766afdc090e6eb2603f812f987f21f6c5ba66a97815f6b3a3d9b1b4bd5751fa
MD5 8e87461f8ba3cf8e2e1327f4304ce55f
BLAKE2b-256 8adbb5582deb259fb51b614e397edfe13df0f3fa8a801cc84a176d6b437acff6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page