Skip to main content

No project description provided

Project description

Installation

pip install ncbi_asm_summary

or

git clone https://github.com/evoquant/ncbi_asm_summary.git
cd ncbi_asm_summary
pip install .

Usage

Stream from remote NCBI server, in terminal

Stream the GenGank assembly summary file from NCBI, can limit the columns the number of rows.

gbsummary \
        --db genbank \
        --nrows 2 \
        --columns assembly_accession bioproject biosample
2025-06-25 09:53:05,522 - INFO - First 2 rows, assembly and FTP columns, from genbank...
2025-06-25 09:53:05,522 - INFO - Streaming download from https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_genbank.txt
GCA_000001215.4 PRJNA13812      SAMN02803731
GCA_000001405.29        PRJNA31257      na

Stream the RefSeq assembly summary file from NCBI, can limit the columns the number of rows.

gbsummary \
        --db refseq \
        --nrows 2 \
        --columns assembly_accession bioproject biosample
2025-06-25 09:54:47,206 - INFO - First 2 rows, assembly and FTP columns, from refseq...
2025-06-25 09:54:47,206 - INFO - Streaming download from https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt
GCF_000001215.4 PRJNA164        SAMN02803731
GCF_000001405.40        PRJNA168        na

These can be used in a pipeline, for example to download certain columns and save them to a file. Leave out the --nrows option to download the full file. Include the --header option to include the header row (column names) in the output.

gbsummary \
        --db genbank \
        --columns assembly_accession bioproject biosample \
        > genbank_summary.txt

Use as a Python library

Stream from remote NCBI server

from ncbi_asm_summary.reader import AssemblySummaryStream

f = AssemblySummaryStream(db="refseq")

# Only print the first result for the example
for i in f.stream():
    print(i)
    break 
AssemblySummary(assembly_accession='GCF_000001215.4', bioproject='PRJNA164', biosample='SAMN02803731', wgs_master='na', refseq_category='reference genome', taxid='7227', species_taxid='7227', organism_name='Drosophila melanogaster', infraspecific_name='na', isolate='na', version_status='latest', assembly_level='Chromosome', release_type='Major', genome_rep='Full', seq_rel_date='2014-08-01', asm_name='Release 6 plus ISO1 MT', asm_submitter='The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics', gbrs_paired_asm='GCA_000001215.4', paired_asm_comp='identical', ftp_path='https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT', excluded_from_refseq='na', relation_to_type_material='na', asm_not_live_date='na', assembly_type='haploid', group='invertebrate', genome_size='143706478', genome_size_ungapped='142553500', gc_percent='42.000000', replicon_count='7', scaffold_count='1869', contig_count='1869', annotation_provider='FlyBase', annotation_name='FlyBase Release 6.54', annotation_date='2023-12-26', total_gene_count='17872', protein_coding_gene_count='13962', non_coding_gene_count='3543', pubmed_id='10731132;12537568;12537572;12537573;12537574;16110336;17569856;17569867;25589440;26109356;26109357')

Stream from local copy

from ncbi_asm_summary.reader import AssemblySummaryStream

path = "/home/chase/Downloads/assembly_summary_genbank_20250619_1057.txt.gz"

f = AssemblySummaryStream(file_path=path)

# Only print the first result for the example
for i in f.stream():
    print(i)
    break  
AssemblySummary(assembly_accession='GCA_000001215.4', bioproject='PRJNA13812', biosample='SAMN02803731', wgs_master='na', refseq_category='reference genome', taxid='7227', species_taxid='7227', organism_name='Drosophila melanogaster', infraspecific_name='na', isolate='na', version_status='latest', assembly_level='Chromosome', release_type='Major', genome_rep='Full', seq_rel_date='2014-08-01', asm_name='Release 6 plus ISO1 MT', asm_submitter='The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics', gbrs_paired_asm='GCF_000001215.4', paired_asm_comp='identical', ftp_path='https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/215/GCA_000001215.4_Release_6_plus_ISO1_MT', excluded_from_refseq='na', relation_to_type_material='na', asm_not_live_date='na', assembly_type='haploid', group='invertebrate', genome_size='143706478', genome_size_ungapped='142553500', gc_percent='42.000000', replicon_count='7', scaffold_count='1869', contig_count='1869', annotation_provider='FlyBase', annotation_name='FlyBase Release 6.54', annotation_date='2023-12-13', total_gene_count='17872', protein_coding_gene_count='13962', non_coding_gene_count='3543', pubmed_id='10731132;12537568;12537572;12537573;12537574;16110336;17569856;17569867;25589440;26109356;26109357')

Table Columns

assembly_accession
bioproject
biosample
wgs_master
refseq_category
taxid
species_taxid
organism_name
infraspecific_name
isolate
version_status
assembly_level
release_type
genome_rep
seq_rel_date
asm_name
asm_submitter
gbrs_paired_asm
paired_asm_comp
ftp_path
excluded_from_refseq
relation_to_type_material
asm_not_live_date
assembly_type
group
genome_size
genome_size_ungapped
gc_percent
replicon_count
scaffold_count
contig_count
annotation_provider
annotation_name
annotation_date
total_gene_count
protein_coding_gene_count
non_coding_gene_count
pubmed_id

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncbi_asm_summary-0.2.3.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ncbi_asm_summary-0.2.3-py2.py3-none-any.whl (8.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file ncbi_asm_summary-0.2.3.tar.gz.

File metadata

  • Download URL: ncbi_asm_summary-0.2.3.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for ncbi_asm_summary-0.2.3.tar.gz
Algorithm Hash digest
SHA256 5b4e895f9f59b553bf83f7cb0cd584628a775bc38d24ae37d76fb475d2fdbfd3
MD5 91429cedcdba8478f4292df8e709ad48
BLAKE2b-256 32353d95aee118567fd01d267dc613071bda1765b516b0a54020b70b55cd0606

See more details on using hashes here.

File details

Details for the file ncbi_asm_summary-0.2.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for ncbi_asm_summary-0.2.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 02ec72651ac1ec6ad5383fdb7532ecbddf7b3b4cd73f9dc5998c92ce0d1a5e06
MD5 29b202151695a268b61b80495668afc2
BLAKE2b-256 fddbf96fc530c97547f6ef3106f62503d012685eed0cf60b89e6f0cef884cb9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page