No project description provided
Project description
Installation
pip install ncbi_asm_summary
or
git clone https://github.com/evoquant/ncbi_asm_summary.git
cd ncbi_asm_summary
pip install .
Usage
Stream from remote NCBI server, in terminal
Stream the GenGank assembly summary file from NCBI, can limit the columns the number of rows.
gbsummary \
--db genbank \
--nrows 2 \
--columns assembly_accession bioproject biosample
2025-06-25 09:53:05,522 - INFO - First 2 rows, assembly and FTP columns, from genbank...
2025-06-25 09:53:05,522 - INFO - Streaming download from https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_genbank.txt
GCA_000001215.4 PRJNA13812 SAMN02803731
GCA_000001405.29 PRJNA31257 na
Stream the RefSeq assembly summary file from NCBI, can limit the columns the number of rows.
gbsummary \
--db refseq \
--nrows 2 \
--columns assembly_accession bioproject biosample
2025-06-25 09:54:47,206 - INFO - First 2 rows, assembly and FTP columns, from refseq...
2025-06-25 09:54:47,206 - INFO - Streaming download from https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt
GCF_000001215.4 PRJNA164 SAMN02803731
GCF_000001405.40 PRJNA168 na
These can be used in a pipeline, for example to download certain columns and save them to a file. Leave out the --nrows option to download the full file. Include the --header option to include the header row (column names) in the output.
gbsummary \
--db genbank \
--columns assembly_accession bioproject biosample \
> genbank_summary.txt
Use as a Python library
Stream from remote NCBI server
from ncbi_asm_summary.reader import AssemblySummaryStream
f = AssemblySummaryStream(db="refseq")
# Only print the first result for the example
for i in f.stream():
print(i)
break
AssemblySummary(assembly_accession='GCF_000001215.4', bioproject='PRJNA164', biosample='SAMN02803731', wgs_master='na', refseq_category='reference genome', taxid='7227', species_taxid='7227', organism_name='Drosophila melanogaster', infraspecific_name='na', isolate='na', version_status='latest', assembly_level='Chromosome', release_type='Major', genome_rep='Full', seq_rel_date='2014-08-01', asm_name='Release 6 plus ISO1 MT', asm_submitter='The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics', gbrs_paired_asm='GCA_000001215.4', paired_asm_comp='identical', ftp_path='https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT', excluded_from_refseq='na', relation_to_type_material='na', asm_not_live_date='na', assembly_type='haploid', group='invertebrate', genome_size='143706478', genome_size_ungapped='142553500', gc_percent='42.000000', replicon_count='7', scaffold_count='1869', contig_count='1869', annotation_provider='FlyBase', annotation_name='FlyBase Release 6.54', annotation_date='2023-12-26', total_gene_count='17872', protein_coding_gene_count='13962', non_coding_gene_count='3543', pubmed_id='10731132;12537568;12537572;12537573;12537574;16110336;17569856;17569867;25589440;26109356;26109357')
Stream from local copy
from ncbi_asm_summary.reader import AssemblySummaryStream
path = "/home/chase/Downloads/assembly_summary_genbank_20250619_1057.txt.gz"
f = AssemblySummaryStream(file_path=path)
# Only print the first result for the example
for i in f.stream():
print(i)
break
AssemblySummary(assembly_accession='GCA_000001215.4', bioproject='PRJNA13812', biosample='SAMN02803731', wgs_master='na', refseq_category='reference genome', taxid='7227', species_taxid='7227', organism_name='Drosophila melanogaster', infraspecific_name='na', isolate='na', version_status='latest', assembly_level='Chromosome', release_type='Major', genome_rep='Full', seq_rel_date='2014-08-01', asm_name='Release 6 plus ISO1 MT', asm_submitter='The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics', gbrs_paired_asm='GCF_000001215.4', paired_asm_comp='identical', ftp_path='https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/215/GCA_000001215.4_Release_6_plus_ISO1_MT', excluded_from_refseq='na', relation_to_type_material='na', asm_not_live_date='na', assembly_type='haploid', group='invertebrate', genome_size='143706478', genome_size_ungapped='142553500', gc_percent='42.000000', replicon_count='7', scaffold_count='1869', contig_count='1869', annotation_provider='FlyBase', annotation_name='FlyBase Release 6.54', annotation_date='2023-12-13', total_gene_count='17872', protein_coding_gene_count='13962', non_coding_gene_count='3543', pubmed_id='10731132;12537568;12537572;12537573;12537574;16110336;17569856;17569867;25589440;26109356;26109357')
Table Columns
assembly_accession
bioproject
biosample
wgs_master
refseq_category
taxid
species_taxid
organism_name
infraspecific_name
isolate
version_status
assembly_level
release_type
genome_rep
seq_rel_date
asm_name
asm_submitter
gbrs_paired_asm
paired_asm_comp
ftp_path
excluded_from_refseq
relation_to_type_material
asm_not_live_date
assembly_type
group
genome_size
genome_size_ungapped
gc_percent
replicon_count
scaffold_count
contig_count
annotation_provider
annotation_name
annotation_date
total_gene_count
protein_coding_gene_count
non_coding_gene_count
pubmed_id
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ncbi_asm_summary-0.2.4.tar.gz.
File metadata
- Download URL: ncbi_asm_summary-0.2.4.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d962cceeb15f4a76ad05c3631ef69ecb0114127c35520bb610049d0a10d76cac
|
|
| MD5 |
6fb55b96cd69c9eb32bc8db655a08c1c
|
|
| BLAKE2b-256 |
b856533b769300c5dec32d0f24d163926bcf951ebaf87343ed10c7373a0a6885
|
File details
Details for the file ncbi_asm_summary-0.2.4-py2.py3-none-any.whl.
File metadata
- Download URL: ncbi_asm_summary-0.2.4-py2.py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f83bf723633c04fb206d0b842f16594b5052b8666490b1af249ffe08695803e0
|
|
| MD5 |
a8fec39d139ce516c944a5b5d7e651f6
|
|
| BLAKE2b-256 |
b06fdd76f6f0442a874e754759503e217bd87e1368672cbd57af9389ea375d1e
|