Skip to main content

UPFP, a package to parse UniProt FASTA files.

Project description

Build Status codecov Updates License: MIT PyPI version Codacy Badge

uniprot_fasta_parser

UniProt FASTA parser written in pure python.

Development setup

Create a venv:

python -m venv venv

Activate it:

source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Install the package in editable mode:

pip install -e .

Install jupiter playground:

pip install jupyter
ipython kernel install --user --name=uniprot_fasta_parser

Tutorial on converting FASTA sequences into CSV format

Get the latest FASTA from UniProt SwissProt:

wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

The script upfp-fasta-to-csv (installed with upfp) can be used.

upfp-fasta-to-csv -h
usage: upfp-fasta-to-csv [-h] [-g] [-c CHUNK_SIZE] fasta_filepath csv_filepath

positional arguments:
  fasta_filepath        path to the FASTA file.
  csv_filepath          path where to store the CSV file.

optional arguments:
  -h, --help            show this help message and exit
  -g, --gzipped         flag to indicate whether the FASTA is gzipped.
                        Defaults to False.
  -c CHUNK_SIZE, --chunk_size CHUNK_SIZE
                        size of the chunks used when writing the CSV file.
                        Defaults to 10000.

Provide as input the downloaded gzipped FASTA file and convert it to CSV:

upfp-fasta-to-csv uniprot_sprot.fasta.gz /path/to/file.csv -g

Revert CSV to FASTA

You might want to recreate FASTA format from a CSV resulting from upfp with the script upfp-csv-to-fasta.

upfp-csv-to-fasta -h  
usage: upfp-csv-to-fasta [-h] [-g] [-c CHUNK_SIZE] csv_filepath fasta_filepath

positional arguments:
  csv_filepath          path to the CSV file or SMI file.
  fasta_filepath        path where to store the FASTA file

optional arguments:
  -h, --help            show this help message and exit
  -g, --gzipped         flag to indicate whether the FASTA should be gzipped.
                        Defaults to False.
  -c CHUNK_SIZE, --chunk_size CHUNK_SIZE
                        size of the chunks used when writing the FASTA file.
                        Defaults to 10000.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upfp-0.0.5.tar.gz (6.8 kB view details)

Uploaded Source

File details

Details for the file upfp-0.0.5.tar.gz.

File metadata

  • Download URL: upfp-0.0.5.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for upfp-0.0.5.tar.gz
Algorithm Hash digest
SHA256 af91c282429e558128e7d65bb18572105820a4fbc57fd4b964a954a2e3315f84
MD5 30120282bf82111cbd59a86481426cb5
BLAKE2b-256 2c4bc49b417141afb0ea99b289738762558ae3842496435cd44325d66e155d3e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page