FlyBase sync/query helper for agents.
Project description
FlyBase local sync/query
Use FlyBase bulk files for agent workloads. Live API: helper only.
Why
https://api.flybase.org/api/v1.0/exists.- some endpoints return useful JSON now, eg
domain/FBgn0001250,sequence/id/FBgn0001250. - some plausible endpoints return empty body today.
- bulk bucket + release files: better for repeatable agent queries.
Current surfaces checked
- release bucket:
https://s3ftp.flybase.org/releases/current/ - precomputed files:
https://s3ftp.flybase.org/releases/current/precomputed_files/ - Postgres dump:
https://s3ftp.flybase.org/releases/current/psql/FB2026_01.sql.gz - API root:
https://api.flybase.org/api/v1.0/ - batch download:
https://flybase.org/batchdownload
Layout
src/flybase_cli/: package codetests/: stdlibunittestflybase_cli.py: thin repo-root shimpyproject.toml: package metadata / console entrypoint
CLI
python3 flybase_cli.py presets
python3 flybase_cli.py sync gene-core
python3 flybase_cli.py sync gene-core --release FB2026_01
python3 flybase_cli.py sync gene-knowledge --release FB2026_01
python3 flybase_cli.py full-sync --release FB2026_01
python3 flybase_cli.py full-sync \
--release FB2026_01 \
--include 'best_gene_summary|entity_publication'
python3 flybase_cli.py sync-incremental \
gene-knowledge \
--from-release FB2025_06 \
--release FB2026_01
python3 flybase_cli.py release-diff \
--preset gene-knowledge \
--from-release FB2025_06 \
--to-release FB2026_01
python3 flybase_cli.py genomes --release FB2026_01
python3 flybase_cli.py sync-genome \
--release FB2026_01 \
--genome dmel_r6.67 \
--section fasta \
--asset mirna
python3 flybase_cli.py genome-presets
python3 flybase_cli.py sync-genome \
--release FB2026_01 \
--genome dmel_r6.67 \
--preset mirna-fasta
PYTHONPATH=src python3 -m flybase_cli sync gene-expression
python3 flybase_cli.py manifest \
--url https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r6.67_FB2026_01/fasta/ \
--include 'miRNA'
python3 flybase_cli.py sync-url \
--url https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r6.67_FB2026_01/fasta/ \
--include 'miRNA'
python3 flybase_cli.py ingest \
data/flybase/precomputed_files/genes/best_gene_summary_fb_2026_01.tsv.gz \
data/flybase/precomputed_files/genes/fbgn_fbtr_fbpp_fb_2026_01.tsv.gz \
data/flybase/precomputed_files/genes/fbgn_annotation_ID_fb_2026_01.tsv.gz
python3 flybase_cli.py tables --columns
python3 flybase_cli.py describe --sample-values 2
python3 flybase_cli.py schema-export --sample-values 1
python3 flybase_cli.py query-plan --sample-values 1 --limit 5
python3 flybase_cli.py query-run --template-name gene-summary-by-fbgn --param fbgn_id=FBgn0002121
python3 flybase_cli.py fts-build
python3 flybase_cli.py search 'memory formation'
python3 flybase_cli.py pg-load --release FB2026_01
python3 flybase_cli.py sql \
"select * from fb_best_gene_summary_fb_2026_01 limit 5"
python3 flybase_cli.py sql \
"select s.fbgn_id, s.gene_symbol, a.annotation_id, p.flybase_fbtr, p.flybase_fbpp \
from fb_best_gene_summary_fb_2026_01 s \
join fb_fbgn_annotation_id_fb_2026_01 a on a.primary_fbgn = s.fbgn_id \
left join fb_fbgn_fbtr_fbpp_fb_2026_01 p on p.flybase_fbgn = s.fbgn_id \
limit 5"
python3 flybase_cli.py api domain/FBgn0001250
Sync presets
gene-core: summaries + FBgn/FBtr/FBpp + annotation IDs + SO annotationsgene-expression: curated/high-throughput/scRNA expression slicesreferences: publication/link tablesgene-knowledge: core gene facts + representative publications + orthology tablesorthology: ortholog, paralog, and disease-association tablesinteractions: gene- and allele-level interaction tables
Full sync
full-synccrawls an entire release prefix, defaultprecomputed_files/- default behavior: download only files the current loaders can ingest into SQLite
- use
--all-filesif you want non-ingestable release artifacts too - use
--include/--excludeto stage a narrower smoke or partial warehouse - default manifest path:
data/flybase/manifests/<release>/full-sync.json
Discovery
genomes --release FB2026_01lists genome builds linked from that FlyBase releasesync-urlturns a crawlable FlyBase directory URL into a one-step local syncsync-genomeresolves a release/build pair into the right genome-section URL automaticallygenome-presetslists reusable genome asset sync recipes
Genome sync
- sections:
fasta,gff,gtf,dna,chado-xml - asset shortcuts include
mirna,transcript,translation,gene,chromosome,cds,ncrna,gff,gtf - presets include
mirna-fasta,transcript-fasta,translation-fasta,gene-fasta,chromosome-fasta,ncrna-fasta,gff-all,gtf-all - use
--include/--excludefor narrower file selection on top of the asset preset
Ingest formats
- delimited:
tsv,csv, gzipped variants - sequence:
fasta,fa,fna,faa, gzipped variants - annotation:
gff,gff3,gtf, gzipped variants - JSON:
json,json.gz
JSON ingest
- top-level scalar JSON fields become queryable SQLite columns
- one nested dict level is flattened, eg
gene.symbol->gene_symbol - repeated top-level lists become child tables, eg
symbolSynonyms-><table>_symbolsynonyms - repeated lists nested inside child dict rows become descendant tables, eg
genomeLocations[].exons[]-><table>_genomelocations_exons - full source record remains in
payload_json
Example:
python3 flybase_cli.py sql \
"select record_id, symbol, gene_geneId from fb_ncrna_genes_fb_2026_01 limit 5"
python3 flybase_cli.py sql \
"select parent_record_id, ordinal, value \
from fb_ncrna_genes_fb_2026_01_symbolsynonyms \
limit 5"
python3 flybase_cli.py sql \
"select parent_record_id, parent_ordinal, ordinal, startPosition, endPosition \
from fb_ncrna_genes_fb_2026_01_genomelocations_exons \
limit 5"
Search
fts-buildcreates a local SQLite FTS5 index from ingested tablessearchqueries that index without calling the live FlyBase API- record ids prefer stable FlyBase-like columns such as
fbgn_id,primary_fbgn,flybase_fbtr
Metadata
describesummarizes ingested tables with row counts, source paths, semantic tags, columns, and representative non-empty valuesschema-exportwrites the same metadata to a deterministic JSON artifact beside the SQLite DB, egFB2026_01.schema.jsonschema-exportalso includes inferredrelationshipsfor nested child tables and common FlyBase ID joinsschema-exportalso emitssemantic_summaryfor table/entity tag coverageschema-exportalso emits ready-to-runquery_templatesquery-planprints starter SQL without the larger schema payloadquery-plannow includes named biological templates such asgene-summary-by-fbgn,transcript-protein-links,publications-for-gene, and coordinate lookups when matching tables existquery-runselects one template and executes it with parameter values- useful first step before writing ad hoc SQL or building agent query plans
Example:
python3 flybase_cli.py schema-export \
--db data/flybase/FB2026_01.sqlite \
--sample-values 1
python3 flybase_cli.py query-plan \
--db data/flybase/FB2026_01.sqlite \
--sample-values 1 \
--limit 5
python3 flybase_cli.py query-run \
--db data/flybase/FB2026_01.sqlite \
--template-name gene-summary-by-fbgn \
--param fbgn_id=FBgn0002121
Notes
- nested JSON child tables keep lineage columns like
parent_record_id,parent_ordinal,ordinal. - many FlyBase files start with
##metadata lines; loader skips those. syncwrites a preset manifest underdata/flybase/manifests/<release>/.full-syncis the broadest offline path for release bulk data without going through the full Postgres dump.sync --release FB2026_01defaults todata/flybase/FB2026_01.sqliteto avoid cross-release mixing.sync-incrementaluses stable manifest keys so release-renamed files still land inupdatedinstead of noisy add/remove pairs.release-diffcompares releases either by raw prefix or by curated multi-prefix preset.manifest --urllets you crawl non-releases/FlyBase directories such as genome FASTA/GFF trees.sync-urlis the shortest path for genome assets once you know the directory URL.sync-genomeis the shortest path when you know the FlyBase release + genome build label.sync-genome --preset ...is the preferred path for common genome asset pulls.- some FlyBase
.gff.gzassets are tar-wrapped gzip archives; loader handles that transparently. sqlandquery-runshape results as record-oriented JSON with summary metadata for agent chaining.pg-loadstages the full Postgres import script forreleases/<release>/psql/<release>.sql.gz.pg-load --executeruns the staged script whencreatedbandpsqlare installed locally.- SQLite keeps setup minimal; switch to DuckDB/Postgres if you want bigger joins/faster scans.
- if you only need a few IDs, FlyBase Batch Download may be simpler than syncing files.
- use
--no-headerfor files whose first non-comment row is data, not column names.
Tests
python3 -m unittest discover -s tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flybase_cli-0.1.2.tar.gz.
File metadata
- Download URL: flybase_cli-0.1.2.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4957ed7d9a9097a2349283b947f7bb9051ee2fc99cc7ad1922b5951fe99cafc3
|
|
| MD5 |
0b71073a741d0c9d7e8ae543ffc56b9d
|
|
| BLAKE2b-256 |
7fda7cbf036cd00d5ca594d396e8ea05b8fbaa0a3e403ad60186036344349060
|
Provenance
The following attestation bundles were made for flybase_cli-0.1.2.tar.gz:
Publisher:
release.yml on gumadeiras/flybase-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flybase_cli-0.1.2.tar.gz -
Subject digest:
4957ed7d9a9097a2349283b947f7bb9051ee2fc99cc7ad1922b5951fe99cafc3 - Sigstore transparency entry: 1440383803
- Sigstore integration time:
-
Permalink:
gumadeiras/flybase-cli@cfc6df8afa567e9b5445618742dd8fb65412a5b7 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/gumadeiras
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@cfc6df8afa567e9b5445618742dd8fb65412a5b7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file flybase_cli-0.1.2-py3-none-any.whl.
File metadata
- Download URL: flybase_cli-0.1.2-py3-none-any.whl
- Upload date:
- Size: 33.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74821040bed158c6111892afb94de4c989cdfbf725d91ed8ac9f85d5a30708b9
|
|
| MD5 |
635d93d058d74d38172e9e9735e01749
|
|
| BLAKE2b-256 |
4b646c09caebfe17252d92d2bd09554225b752c6bec003f53f4b8fbe25340f6f
|
Provenance
The following attestation bundles were made for flybase_cli-0.1.2-py3-none-any.whl:
Publisher:
release.yml on gumadeiras/flybase-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flybase_cli-0.1.2-py3-none-any.whl -
Subject digest:
74821040bed158c6111892afb94de4c989cdfbf725d91ed8ac9f85d5a30708b9 - Sigstore transparency entry: 1440383862
- Sigstore integration time:
-
Permalink:
gumadeiras/flybase-cli@cfc6df8afa567e9b5445618742dd8fb65412a5b7 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/gumadeiras
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@cfc6df8afa567e9b5445618742dd8fb65412a5b7 -
Trigger Event:
push
-
Statement type: