Skip to main content

FlyBase sync/query helper for agents.

Project description

FlyBase local sync/query

Use FlyBase bulk files for agent workloads. Live API: helper only.

Why

  • https://api.flybase.org/api/v1.0/ exists.
  • some endpoints return useful JSON now, eg domain/FBgn0001250, sequence/id/FBgn0001250.
  • some plausible endpoints return empty body today.
  • bulk bucket + release files: better for repeatable agent queries.

Current surfaces checked

  • release bucket: https://s3ftp.flybase.org/releases/current/
  • precomputed files: https://s3ftp.flybase.org/releases/current/precomputed_files/
  • Postgres dump: https://s3ftp.flybase.org/releases/current/psql/FB2026_01.sql.gz
  • API root: https://api.flybase.org/api/v1.0/
  • batch download: https://flybase.org/batchdownload

Layout

  • src/flybase_cli/: package code
  • tests/: stdlib unittest
  • flybase_cli.py: thin repo-root shim
  • pyproject.toml: package metadata / console entrypoint

Install

PyPI with pipx:

pipx install flybase

PyPI with plain pip:

python3 -m pip install flybase

Homebrew:

brew tap gumadeiras/tap
brew install flybase

From source:

python3 -m pip install -e .

Release

Current release: v0.1.4.

Tag pushes like vX.Y.Z run the release workflow: build artifacts, create a GitHub release, publish to PyPI, and update gumadeiras/homebrew-tap.

Release prerequisites:

  • PyPI trusted publishing configured for this repo.
  • HOMEBREW_TAP_TOKEN repository secret can write to gumadeiras/homebrew-tap.

CLI

Naked commands show help instead of failing:

flybase
flybase sync
flybase query-run

Simple path:

flybase sources

flybase update --all --release FB2026_01

flybase fts-build --db data/flybase/FB2026_01.sqlite

flybase find 'memory formation' --db data/flybase/FB2026_01.sqlite

flybase examples

flybase examples --topic gene-lookup

flybase examples --topic local-files

flybase examples --topic sql-and-templates

Sync curated release slices:

flybase presets

flybase sync gene-core

flybase sync gene-core --release FB2026_01

flybase sync gene-expression --release FB2026_01

flybase sync gene-knowledge --release FB2026_01

flybase sync references --release FB2026_01

Build a broader local release database:

flybase full-sync --release FB2026_01

flybase full-sync \
  --release FB2026_01 \
  --include 'best_gene_summary|entity_publication'

flybase sync-incremental \
  gene-knowledge \
  --from-release FB2025_06 \
  --release FB2026_01

flybase release-diff \
  --preset gene-knowledge \
  --from-release FB2025_06 \
  --to-release FB2026_01

Genome assets:

flybase genomes --release FB2026_01

flybase genome-presets

flybase sync-genome \
  --release FB2026_01 \
  --genome dmel_r6.67 \
  --preset mirna-fasta

flybase sync-genome \
  --release FB2026_01 \
  --genome dmel_r6.67 \
  --section gff \
  --asset gff

Arbitrary FlyBase directories:

flybase manifest \
  --url https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r6.67_FB2026_01/fasta/ \
  --include 'miRNA'

flybase sync-url \
  --url https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r6.67_FB2026_01/fasta/ \
  --include 'miRNA'

Manifest, download, and local ingest:

flybase manifest \
  --release FB2026_01 \
  --prefix precomputed_files/genes/ \
  --include 'best_gene_summary|fbgn_annotation_ID'

flybase download \
  --manifest data/flybase/manifest.json \
  --include 'best_gene_summary'

flybase ingest \
  data/flybase/precomputed_files/genes/best_gene_summary_fb_2026_01.tsv.gz \
  --db data/flybase/FB2026_01.sqlite

flybase tables --db data/flybase/FB2026_01.sqlite --columns

flybase describe --db data/flybase/FB2026_01.sqlite --sample-values 2

Ingest local files and query SQLite:

python3 flybase_cli.py ingest \
  data/flybase/precomputed_files/genes/best_gene_summary_fb_2026_01.tsv.gz \
  data/flybase/precomputed_files/genes/fbgn_fbtr_fbpp_fb_2026_01.tsv.gz \
  data/flybase/precomputed_files/genes/fbgn_annotation_ID_fb_2026_01.tsv.gz

python3 flybase_cli.py tables --columns

python3 flybase_cli.py describe --sample-values 2
python3 flybase_cli.py schema-export --sample-values 1
python3 flybase_cli.py query-plan --sample-values 1 --limit 5
python3 flybase_cli.py query-run --template-name gene-summary-by-fbgn --param fbgn_id=FBgn0002121

python3 flybase_cli.py fts-build

python3 flybase_cli.py search 'memory formation'

python3 flybase_cli.py find 'Or59b' --json

python3 flybase_cli.py sql \
  "select fbgn_id, gene_symbol from fb_best_gene_summary_fb_2026_01 limit 5"

python3 flybase_cli.py pg-load --release FB2026_01

python3 flybase_cli.py sql \
  "select * from fb_best_gene_summary_fb_2026_01 limit 5"

python3 flybase_cli.py sql \
  "select s.fbgn_id, s.gene_symbol, a.annotation_id, p.flybase_fbtr, p.flybase_fbpp \
   from fb_best_gene_summary_fb_2026_01 s \
   join fb_fbgn_annotation_id_fb_2026_01 a on a.primary_fbgn = s.fbgn_id \
   left join fb_fbgn_fbtr_fbpp_fb_2026_01 p on p.flybase_fbgn = s.fbgn_id \
   limit 5"

python3 flybase_cli.py api domain/FBgn0001250

Sync presets

  • gene-core: summaries + FBgn/FBtr/FBpp + annotation IDs + SO annotations
  • gene-expression: curated/high-throughput/scRNA expression slices
  • references: publication/link tables
  • gene-knowledge: core gene facts + representative publications + orthology tables
  • orthology: ortholog, paralog, and disease-association tables
  • interactions: gene- and allele-level interaction tables

Full sync

  • full-sync crawls an entire release prefix, default precomputed_files/
  • default behavior: download only files the current loaders can ingest into SQLite
  • use --all-files if you want non-ingestable release artifacts too
  • use --include / --exclude to stage a narrower smoke or partial warehouse
  • default manifest path: data/flybase/manifests/<release>/full-sync.json

Discovery

  • genomes --release FB2026_01 lists genome builds linked from that FlyBase release
  • sync-url turns a crawlable FlyBase directory URL into a one-step local sync
  • sync-genome resolves a release/build pair into the right genome-section URL automatically
  • genome-presets lists reusable genome asset sync recipes

Genome sync

  • sections: fasta, gff, gtf, dna, chado-xml
  • asset shortcuts include mirna, transcript, translation, gene, chromosome, cds, ncrna, gff, gtf
  • presets include mirna-fasta, transcript-fasta, translation-fasta, gene-fasta, chromosome-fasta, ncrna-fasta, gff-all, gtf-all
  • use --include/--exclude for narrower file selection on top of the asset preset

Ingest formats

  • delimited: tsv, csv, gzipped variants
  • sequence: fasta, fa, fna, faa, gzipped variants
  • annotation: gff, gff3, gtf, gzipped variants
  • JSON: json, json.gz

JSON ingest

  • top-level scalar JSON fields become queryable SQLite columns
  • one nested dict level is flattened, eg gene.symbol -> gene_symbol
  • repeated top-level lists become child tables, eg symbolSynonyms -> <table>_symbolsynonyms
  • repeated lists nested inside child dict rows become descendant tables, eg genomeLocations[].exons[] -> <table>_genomelocations_exons
  • full source record remains in payload_json

Example:

python3 flybase_cli.py sql \
  "select record_id, symbol, gene_geneId from fb_ncrna_genes_fb_2026_01 limit 5"

python3 flybase_cli.py sql \
  "select parent_record_id, ordinal, value \
   from fb_ncrna_genes_fb_2026_01_symbolsynonyms \
   limit 5"

python3 flybase_cli.py sql \
  "select parent_record_id, parent_ordinal, ordinal, startPosition, endPosition \
   from fb_ncrna_genes_fb_2026_01_genomelocations_exons \
   limit 5"

Search

  • fts-build creates a local SQLite FTS5 index from ingested tables
  • search queries that index without calling the live FlyBase API
  • record ids prefer stable FlyBase-like columns such as fbgn_id, primary_fbgn, flybase_fbtr

Metadata

  • describe summarizes ingested tables with row counts, source paths, semantic tags, columns, and representative non-empty values
  • schema-export writes the same metadata to a deterministic JSON artifact beside the SQLite DB, eg FB2026_01.schema.json
  • schema-export also includes inferred relationships for nested child tables and common FlyBase ID joins
  • schema-export also emits semantic_summary for table/entity tag coverage
  • schema-export also emits ready-to-run query_templates
  • query-plan prints starter SQL without the larger schema payload
  • query-plan now includes named biological templates such as gene-summary-by-fbgn, transcript-protein-links, publications-for-gene, and coordinate lookups when matching tables exist
  • query-run selects one template and executes it with parameter values
  • useful first step before writing ad hoc SQL or building agent query plans

Example:

python3 flybase_cli.py schema-export \
  --db data/flybase/FB2026_01.sqlite \
  --sample-values 1

python3 flybase_cli.py query-plan \
  --db data/flybase/FB2026_01.sqlite \
  --sample-values 1 \
  --limit 5

python3 flybase_cli.py query-run \
  --db data/flybase/FB2026_01.sqlite \
  --template-name gene-summary-by-fbgn \
  --param fbgn_id=FBgn0002121

Notes

  • nested JSON child tables keep lineage columns like parent_record_id, parent_ordinal, ordinal.
  • many FlyBase files start with ## metadata lines; loader skips those.
  • sync writes a preset manifest under data/flybase/manifests/<release>/.
  • full-sync is the broadest offline path for release bulk data without going through the full Postgres dump.
  • sync --release FB2026_01 defaults to data/flybase/FB2026_01.sqlite to avoid cross-release mixing.
  • sync-incremental uses stable manifest keys so release-renamed files still land in updated instead of noisy add/remove pairs.
  • release-diff compares releases either by raw prefix or by curated multi-prefix preset.
  • manifest --url lets you crawl non-releases/ FlyBase directories such as genome FASTA/GFF trees.
  • sync-url is the shortest path for genome assets once you know the directory URL.
  • sync-genome is the shortest path when you know the FlyBase release + genome build label.
  • sync-genome --preset ... is the preferred path for common genome asset pulls.
  • some FlyBase .gff.gz assets are tar-wrapped gzip archives; loader handles that transparently.
  • sql and query-run shape results as record-oriented JSON with summary metadata for agent chaining.
  • pg-load stages the full Postgres import script for releases/<release>/psql/<release>.sql.gz.
  • pg-load --execute runs the staged script when createdb and psql are installed locally.
  • long-running sync, crawl, download, ingest, index, and Postgres load commands print progress to stderr.
  • SQLite keeps setup minimal; switch to DuckDB/Postgres if you want bigger joins/faster scans.
  • if you only need a few IDs, FlyBase Batch Download may be simpler than syncing files.
  • use --no-header for files whose first non-comment row is data, not column names.

Tests

python3 -m unittest discover -s tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flybase-0.1.5.tar.gz (43.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flybase-0.1.5-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file flybase-0.1.5.tar.gz.

File metadata

  • Download URL: flybase-0.1.5.tar.gz
  • Upload date:
  • Size: 43.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flybase-0.1.5.tar.gz
Algorithm Hash digest
SHA256 45c16b184390ab37022fa8f028ed51c8c4d9f4b8a754fefa3d7853ec699b2ca0
MD5 31919502b20cafae8010065c665ed93c
BLAKE2b-256 1f0f2b1ec6da4648aa5a6291026b81f48ec14809b26efdf2a86a73bdb60bafc3

See more details on using hashes here.

Provenance

The following attestation bundles were made for flybase-0.1.5.tar.gz:

Publisher: release.yml on gumadeiras/flybase-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flybase-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: flybase-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flybase-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 beecc846c7393342e6198e3ae514f2b6796c8f996a0b28666e6f0340ed519b88
MD5 8d02b067d467497149d22e707117e667
BLAKE2b-256 1d2f78eb2722756c029c736e393766f37792fbb6cd6b496b0ca9035f05000272

See more details on using hashes here.

Provenance

The following attestation bundles were made for flybase-0.1.5-py3-none-any.whl:

Publisher: release.yml on gumadeiras/flybase-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page