Skip to main content

A python tool to interconvert seq-ids in gff3, gtf, bed and other files.

Project description

cthreepo

PyPI version Conda

A python script to interconvert seq-ids in gff3, gtf, bed and other files.


Quick start for the impatient

  1. Install using conda
conda install -c bioconda cthreepo 
  1. Execute as follows:
## convert seq-ids in <input.gff3> from refseq format (NC_000001.11)
## to UCSC format (chr1) using the Human GRCh38 mapping dictionary
cthreepo -i <input.gff3> -if rs -it uc -f gff3 -m h38 -o <output.gff3>

Introduction

NCBI RefSeq, UCSC and Ensembl use different identifiers for chromosomes in annotation and other files such as GFF3, GTF, etc. Users interested in using a mix of files downloaded from different sources and use them in a single pipeline may end up with seq-id mismatch related errors. This script converts seq-ids from one style to the other in order to make the files compatible with each other.

Installation and Usage

Python3 is required for this script to work. With that requirement satisfied, you can install as shown below:

Install using conda

conda install -c bioconda cthreepo 

Install using pip

pip install cthreepo

Install from this repository

First, download/clone the repository. Then run:

python3 setup.py install

Usage

## help
cthreepo --help 

## usage
## convert seq-ids in <input.gff3> from refseq format (NC_000001.11)
## to UCSC format (chr1) using the Human GRCh38 mapping dictionary
cthreepo \
    --infile <input.gff3> \
    --id_from rs \
    --id_to uc \
    --format gff3 \
    --mapfile h38 \
    --outfile <output.gff3>

File formats supported

  1. GFF3 (default)
  2. GTF
  3. BedGraph
  4. BED
  5. SAM
  6. VCF
  7. WIG
  8. TSV

Mapping files

cthreepo needs a mapfile that it uses to figure out how seq-ids map from one style to the other.

  • Use the built-in shortcuts -- h38, h37, m38 and m37 for GRCh38/hg38, GRCh37/hg19, MGSCv37/mm9 and GRCm38/mm10 respectively. I try to keep these files up-to-date but if they don't work as expected, I suggest using the latest file by following one of the two options described below.
  • Provide NCBI assembly accession using the -a parameter. A complete, legal accession.version such as GCF_000001405.39 should be provided.
  • Provide an NCBI assembly report file. For a given assembly it can be downloaded from the NCBI Assembly website. If the 'Download' button is used, this file is called 'Assembly structure report'. On the NCBI Genomes FTP site, these files have the suffix assembly_report.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cthreepo-0.1.3.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cthreepo-0.1.3-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file cthreepo-0.1.3.tar.gz.

File metadata

  • Download URL: cthreepo-0.1.3.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for cthreepo-0.1.3.tar.gz
Algorithm Hash digest
SHA256 43f20134517455948170bbc3de4a1b72cdc2d3c4a60555f62143679a06ef0cfa
MD5 5e9da87e392046742f07c8022b76d5f6
BLAKE2b-256 6cb880a03eb5d9ae0e43313ad9689f2f017f76c17fdf333aa0810ece62771b64

See more details on using hashes here.

File details

Details for the file cthreepo-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: cthreepo-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for cthreepo-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2637608343aa052bb063b7c8931c62f01f24521d86b7b314ffdad18328579c63
MD5 02ca8f99e90b4d5a862c0a8e8f977f4a
BLAKE2b-256 4deeb05eab7d84ba45a51cc94fee1377b3511939db70cf8c822bdcaad0b735b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page