Skip to main content

No project description provided

Project description

Installation

pip install ncbi_asm_summary

or

git clone https://github.com/evoquant/ncbi_asm_summary.git
cd ncbi_asm_summary
pip install .

Usage

Stream from remote NCBI server, in terminal

Stream the GenGank assembly summary file from NCBI, can limit the columns the number of rows.

gbsummary \
        --db genbank \
        --nrows 2 \
        --columns assembly_accession bioproject biosample
2025-06-25 09:53:05,522 - INFO - First 2 rows, assembly and FTP columns, from genbank...
2025-06-25 09:53:05,522 - INFO - Streaming download from https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_genbank.txt
GCA_000001215.4 PRJNA13812      SAMN02803731
GCA_000001405.29        PRJNA31257      na

Stream the RefSeq assembly summary file from NCBI, can limit the columns the number of rows.

gbsummary \
        --db refseq \
        --nrows 2 \
        --columns assembly_accession bioproject biosample
2025-06-25 09:54:47,206 - INFO - First 2 rows, assembly and FTP columns, from refseq...
2025-06-25 09:54:47,206 - INFO - Streaming download from https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt
GCF_000001215.4 PRJNA164        SAMN02803731
GCF_000001405.40        PRJNA168        na

These can be used in a pipeline, for example to download certain columns and save them to a file. Leave out the --nrows option to download the full file. Include the --header option to include the header row (column names) in the output.

gbsummary \
        --db genbank \
        --columns assembly_accession bioproject biosample \
        > genbank_summary.txt

Use as a Python library

Stream from remote NCBI server

from ncbi_asm_summary.reader import AssemblySummaryStream

f = AssemblySummaryStream(db="refseq")

# Only print the first result for the example
for i in f.stream():
    print(i)
    break 
AssemblySummary(assembly_accession='GCF_000001215.4', bioproject='PRJNA164', biosample='SAMN02803731', wgs_master='na', refseq_category='reference genome', taxid='7227', species_taxid='7227', organism_name='Drosophila melanogaster', infraspecific_name='na', isolate='na', version_status='latest', assembly_level='Chromosome', release_type='Major', genome_rep='Full', seq_rel_date='2014-08-01', asm_name='Release 6 plus ISO1 MT', asm_submitter='The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics', gbrs_paired_asm='GCA_000001215.4', paired_asm_comp='identical', ftp_path='https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT', excluded_from_refseq='na', relation_to_type_material='na', asm_not_live_date='na', assembly_type='haploid', group='invertebrate', genome_size='143706478', genome_size_ungapped='142553500', gc_percent='42.000000', replicon_count='7', scaffold_count='1869', contig_count='1869', annotation_provider='FlyBase', annotation_name='FlyBase Release 6.54', annotation_date='2023-12-26', total_gene_count='17872', protein_coding_gene_count='13962', non_coding_gene_count='3543', pubmed_id='10731132;12537568;12537572;12537573;12537574;16110336;17569856;17569867;25589440;26109356;26109357')

Stream from local copy

from ncbi_asm_summary.reader import AssemblySummaryStream

path = "/home/chase/Downloads/assembly_summary_genbank_20250619_1057.txt.gz"

f = AssemblySummaryStream(file_path=path)

# Only print the first result for the example
for i in f.stream():
    print(i)
    break  
AssemblySummary(assembly_accession='GCA_000001215.4', bioproject='PRJNA13812', biosample='SAMN02803731', wgs_master='na', refseq_category='reference genome', taxid='7227', species_taxid='7227', organism_name='Drosophila melanogaster', infraspecific_name='na', isolate='na', version_status='latest', assembly_level='Chromosome', release_type='Major', genome_rep='Full', seq_rel_date='2014-08-01', asm_name='Release 6 plus ISO1 MT', asm_submitter='The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics', gbrs_paired_asm='GCF_000001215.4', paired_asm_comp='identical', ftp_path='https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/215/GCA_000001215.4_Release_6_plus_ISO1_MT', excluded_from_refseq='na', relation_to_type_material='na', asm_not_live_date='na', assembly_type='haploid', group='invertebrate', genome_size='143706478', genome_size_ungapped='142553500', gc_percent='42.000000', replicon_count='7', scaffold_count='1869', contig_count='1869', annotation_provider='FlyBase', annotation_name='FlyBase Release 6.54', annotation_date='2023-12-13', total_gene_count='17872', protein_coding_gene_count='13962', non_coding_gene_count='3543', pubmed_id='10731132;12537568;12537572;12537573;12537574;16110336;17569856;17569867;25589440;26109356;26109357')

Table Columns

assembly_accession
bioproject
biosample
wgs_master
refseq_category
taxid
species_taxid
organism_name
infraspecific_name
isolate
version_status
assembly_level
release_type
genome_rep
seq_rel_date
asm_name
asm_submitter
gbrs_paired_asm
paired_asm_comp
ftp_path
excluded_from_refseq
relation_to_type_material
asm_not_live_date
assembly_type
group
genome_size
genome_size_ungapped
gc_percent
replicon_count
scaffold_count
contig_count
annotation_provider
annotation_name
annotation_date
total_gene_count
protein_coding_gene_count
non_coding_gene_count
pubmed_id

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncbi_asm_summary-0.2.4.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ncbi_asm_summary-0.2.4-py2.py3-none-any.whl (8.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file ncbi_asm_summary-0.2.4.tar.gz.

File metadata

  • Download URL: ncbi_asm_summary-0.2.4.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for ncbi_asm_summary-0.2.4.tar.gz
Algorithm Hash digest
SHA256 d962cceeb15f4a76ad05c3631ef69ecb0114127c35520bb610049d0a10d76cac
MD5 6fb55b96cd69c9eb32bc8db655a08c1c
BLAKE2b-256 b856533b769300c5dec32d0f24d163926bcf951ebaf87343ed10c7373a0a6885

See more details on using hashes here.

File details

Details for the file ncbi_asm_summary-0.2.4-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for ncbi_asm_summary-0.2.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f83bf723633c04fb206d0b842f16594b5052b8666490b1af249ffe08695803e0
MD5 a8fec39d139ce516c944a5b5d7e651f6
BLAKE2b-256 b06fdd76f6f0442a874e754759503e217bd87e1368672cbd57af9389ea375d1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page