Skip to main content

No project description provided

Project description

Installation

git clone https://github.com/evoquant/ncbi_asm_summary.git
cd ncbi_asm_summary
pip install .

Usage

Stream from remote NCBI server, in terminal

Stream the GenGank assembly summary file from NCBI, can limit the columns the number of rows.

gbsummary \
        --db genbank \
        --nrows 2 \
        --columns assembly_accession bioproject biosample
2025-06-25 09:53:05,522 - INFO - First 2 rows, assembly and FTP columns, from genbank...
2025-06-25 09:53:05,522 - INFO - Streaming download from https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_genbank.txt
GCA_000001215.4 PRJNA13812      SAMN02803731
GCA_000001405.29        PRJNA31257      na

Stream the RefSeq assembly summary file from NCBI, can limit the columns the number of rows.

gbsummary \
        --db refseq \
        --nrows 2 \
        --columns assembly_accession bioproject biosample
2025-06-25 09:54:47,206 - INFO - First 2 rows, assembly and FTP columns, from refseq...
2025-06-25 09:54:47,206 - INFO - Streaming download from https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt
GCF_000001215.4 PRJNA164        SAMN02803731
GCF_000001405.40        PRJNA168        na

These can be used in a pipeline, for example to download certain columns and save them to a file. Leave out the --nrows option to download the full file. Include the --header option to include the header row (column names) in the output.

gbsummary \
        --db genbank \
        --columns assembly_accession bioproject biosample \
        > genbank_summary.txt

Use as a Python library

Stream from remote NCBI server

from ncbi_asm_summary.reader import AssemblySummaryStream

f = AssemblySummaryStream(db="refseq")

# Only print the first result for the example
for i in f.stream():
    print(i)
    break 
AssemblySummary(assembly_accession='GCF_000001215.4', bioproject='PRJNA164', biosample='SAMN02803731', wgs_master='na', refseq_category='reference genome', taxid='7227', species_taxid='7227', organism_name='Drosophila melanogaster', infraspecific_name='na', isolate='na', version_status='latest', assembly_level='Chromosome', release_type='Major', genome_rep='Full', seq_rel_date='2014-08-01', asm_name='Release 6 plus ISO1 MT', asm_submitter='The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics', gbrs_paired_asm='GCA_000001215.4', paired_asm_comp='identical', ftp_path='https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT', excluded_from_refseq='na', relation_to_type_material='na', asm_not_live_date='na', assembly_type='haploid', group='invertebrate', genome_size='143706478', genome_size_ungapped='142553500', gc_percent='42.000000', replicon_count='7', scaffold_count='1869', contig_count='1869', annotation_provider='FlyBase', annotation_name='FlyBase Release 6.54', annotation_date='2023-12-26', total_gene_count='17872', protein_coding_gene_count='13962', non_coding_gene_count='3543', pubmed_id='10731132;12537568;12537572;12537573;12537574;16110336;17569856;17569867;25589440;26109356;26109357')

Stream from local copy

from ncbi_asm_summary.reader import AssemblySummaryStream

path = "/home/chase/Downloads/assembly_summary_genbank_20250619_1057.txt.gz"

f = AssemblySummaryStream(file_path=path)

# Only print the first result for the example
for i in f.stream():
    print(i)
    break  
AssemblySummary(assembly_accession='GCA_000001215.4', bioproject='PRJNA13812', biosample='SAMN02803731', wgs_master='na', refseq_category='reference genome', taxid='7227', species_taxid='7227', organism_name='Drosophila melanogaster', infraspecific_name='na', isolate='na', version_status='latest', assembly_level='Chromosome', release_type='Major', genome_rep='Full', seq_rel_date='2014-08-01', asm_name='Release 6 plus ISO1 MT', asm_submitter='The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics', gbrs_paired_asm='GCF_000001215.4', paired_asm_comp='identical', ftp_path='https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/215/GCA_000001215.4_Release_6_plus_ISO1_MT', excluded_from_refseq='na', relation_to_type_material='na', asm_not_live_date='na', assembly_type='haploid', group='invertebrate', genome_size='143706478', genome_size_ungapped='142553500', gc_percent='42.000000', replicon_count='7', scaffold_count='1869', contig_count='1869', annotation_provider='FlyBase', annotation_name='FlyBase Release 6.54', annotation_date='2023-12-13', total_gene_count='17872', protein_coding_gene_count='13962', non_coding_gene_count='3543', pubmed_id='10731132;12537568;12537572;12537573;12537574;16110336;17569856;17569867;25589440;26109356;26109357')

Table Columns

assembly_accession
bioproject
biosample
wgs_master
refseq_category
taxid
species_taxid
organism_name
infraspecific_name
isolate
version_status
assembly_level
release_type
genome_rep
seq_rel_date
asm_name
asm_submitter
gbrs_paired_asm
paired_asm_comp
ftp_path
excluded_from_refseq
relation_to_type_material
asm_not_live_date
assembly_type
group
genome_size
genome_size_ungapped
gc_percent
replicon_count
scaffold_count
contig_count
annotation_provider
annotation_name
annotation_date
total_gene_count
protein_coding_gene_count
non_coding_gene_count
pubmed_id

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncbi_asm_summary-0.2.2.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ncbi_asm_summary-0.2.2-py2.py3-none-any.whl (8.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file ncbi_asm_summary-0.2.2.tar.gz.

File metadata

  • Download URL: ncbi_asm_summary-0.2.2.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for ncbi_asm_summary-0.2.2.tar.gz
Algorithm Hash digest
SHA256 f5edebbc3c2c098bdd32e4670b3ac4fed7c08a12124989f5bb2fcc95bc90eb2b
MD5 c602a844febd170889605ad5f6c75e2c
BLAKE2b-256 49aa4bc29bded5346bc801439b5a42de4d9f097e070430e52467815304b241c2

See more details on using hashes here.

File details

Details for the file ncbi_asm_summary-0.2.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for ncbi_asm_summary-0.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2b6cdf68ced6d718fe32128daf2e2dc49693b159d1cfeb17227b162bf9de9a9c
MD5 5885deb769c8e8bd5eff9076943f0c8f
BLAKE2b-256 aebfa2dd061319c9c150336047f18b36ded849348729f37e7829b600f73bf2e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page