Skip to main content

Australian Government Interactive Functions Thesaurus (AGIFT) as a Neo4j knowledge graph with embeddings and dual edge types

Project description

AGIFT Graph

Australian Government Interactive Functions Thesaurus (AGIFT) as a knowledge graph with embeddings and dual edge types.

What it does

Fetches the full AGIFT vocabulary from the TemaTres API, builds a graph with structural hierarchy edges, generates embeddings (free local or Isaacus API), then creates semantic similarity edges between related terms.

TemaTres API ──►  Graph ──► Embeddings ──► Semantic Edges
  (AGIFT)        (PARENT_OF)     (384/512/768d)  (SIMILAR_TO)

Graph model

Two edge types with different weights for query-time flexibility:

Edge Type Weight Description
PARENT_OF structural 1.0 AGIFT hierarchy (L1 → L2 → L3)
SIMILAR_TO semantic 0.5 Cosine similarity above threshold

Nodes carry DCAT-AP theme mappings for interoperability with European open data standards.

Quick start

Path A — Docker (zero config)

pip install agift-graph[all]
docker compose up -d          # starts Neo4j on localhost:7687
agift                         # fetches AGIFT, builds graph, embeds

Path B — existing Neo4j

pip install agift-graph[all]
export NEO4J_URI=bolt://my-server:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=mypassword
agift

The CLI reads env vars with sensible defaults (bolt://localhost:7687, neo4j/changeme) that match the included docker-compose.yml.

Path C — CogDB (embedded, no server)

CogDB is a persistent embedded graph database written in pure Python. No server process, no Docker — data is stored to local files.

pip install agift-graph[cogdb]
agift --backend cogdb                # stores graph in ./agift_cogdb_data/
agift --backend cogdb --cogdb-dir /path/to/data

The pipeline runs identically to Neo4j — same fetch, embed, and semantic edge stages. CogDB terms, edges, and embeddings are stored as triples with JSON property blobs. Set COGDB_DATA_DIR as an environment variable or pass --cogdb-dir to control the storage location.

Install extras

Install What you get Size
pip install agift-graph Neo4j driver + fetch + graph build Lightweight
pip install agift-graph[cogdb] + CogDB embedded graph backend Small
pip install agift-graph[embeddings] + sentence-transformers + torch ~2 GB
pip install agift-graph[isaacus] + Isaacus API client Small
pip install agift-graph[all] Everything (Neo4j + CogDB + embeddings + Isaacus) ~2 GB

Embedding providers

Provider Cost Dimensions Setup
local (sentence-transformers) Free 384, 768 Nothing — runs on CPU
isaacus (kanon-2-embedder) Paid 256–1792 Set API key in dashboard

The local provider uses all-MiniLM-L6-v2 (384d) or all-mpnet-base-v2 (768d). Models are downloaded on first run and cached.

Programmatic usage

from agift import run_pipeline

# Run the full pipeline with Neo4j (default)
run_pipeline(provider="local", dimension=384)

# Explicit Neo4j connection
run_pipeline(
    neo4j_uri="bolt://localhost:7687",
    neo4j_user="neo4j",
    neo4j_password="changeme",
    skip_embed=True,
)

# Use CogDB instead of Neo4j
run_pipeline(backend_type="cogdb", provider="local", dimension=384)

Configuration

Variable Default Description
NEO4J_URI bolt://localhost:7687 Neo4j connection URI
NEO4J_USER neo4j Neo4j username
NEO4J_PASSWORD changeme Neo4j password
COGDB_DATA_DIR agift_cogdb_data CogDB storage directory
ISAACUS_API_KEY (empty) Isaacus API key (optional)

All other settings (dimension, provider, similarity threshold, semantic edge weight) are configured via the dashboard UI and stored in the graph backend.

Full Docker stack

The included docker-compose.yml runs Neo4j and the AGIFT container (dashboard + cron worker):

docker compose up -d --build

Then open the dashboard at http://localhost:5050 and click "Full Pipeline" or "Graph Only".

Service Port Description
Neo4j Browser 7474 Graph database UI
Neo4j Bolt 7687 Database protocol
AGIFT 5050 Dashboard, pipeline runner, cron worker

The AGIFT_MODE env var controls container behaviour:

Mode Description
dashboard (default) Gunicorn web server + optional cron
worker Cron only, no web server
cli Run pipeline once and exit

CLI usage

# Full pipeline (fetch + graph + embed + semantic edges)
agift

# Use CogDB instead of Neo4j
agift --backend cogdb
agift --backend cogdb --cogdb-dir /path/to/data

# Graph only (no embeddings)
agift --skip-embed --skip-semantic

# Local embeddings, 384 dimensions
agift --provider local --dimension 384

# Force re-embed all terms
agift --force-embed

# Custom similarity threshold for semantic edges
agift --threshold 0.65

# Faster run: skip alt label fetching
agift --skip-alt

# Dry run (fetch from API, no writes)
agift --dry-run

Project structure

agift/
├── __init__.py              # Public API exports
├── backend.py               # GraphBackend abstract interface
├── neo4j_backend.py         # Neo4j backend implementation
├── cogdb_backend.py         # CogDB backend implementation
├── cli.py                   # CLI entry point + run_pipeline()
├── common.py                # Constants, backend factory, summary
├── fetch.py                 # TemaTres API fetching (concurrent)
├── graph.py                 # Schema setup + node/edge upsert
├── embed.py                 # Embedding providers (local + Isaacus)
├── link.py                  # Cosine similarity + semantic edges
Dockerfile                   # Unified image (dashboard + worker + CLI)
entrypoint.sh                # Container mode dispatch
docker-compose.yml           # Full stack (Neo4j + AGIFT container)
dashboard/
├── app.py                   # Flask dashboard + in-process pipeline
├── templates/index.html
import_agift.py              # Backward-compatible entry point
pyproject.toml
CHANGELOG.md                 # Version history (release notes source)
release.sh                   # Version bump + tag helper
LICENSE                      # Apache 2.0

Data source

AGIFT is maintained by the National Archives of Australia and published via TemaTres at https://vocabularyserver.com/agift/

License

Apache 2.0 — see LICENSE.

Publishing

This project uses a tag-based release workflow with a single CI definition:

  • ci.yml is the single source of truth for lint, test, and Docker checks
  • publish.yml calls ci.yml as a reusable workflow — no duplicated jobs

Pipeline sequence on tag push:

  1. CI runs (lint + test + Docker tests)
  2. Build PyPI package
  3. Publish to PyPI
  4. Build and push Docker image to Docker Hub (deepcivic/agift)
  5. Create GitHub Release with changelog entry

To release:

  1. Update CHANGELOG.md with a new version entry
  2. Run ./release.sh 0.2.0 (updates pyproject.toml, commits, tags)
  3. Push: git push origin main --tags

Secrets required in GitHub repo settings:

  • DOCKERHUB_USERNAME and DOCKERHUB_TOKEN (repo-level secrets)
  • PyPI trusted publishing is configured via OIDC (no secret needed)

Changelog

See CHANGELOG.md for version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agift_graph-0.2.1.tar.gz (31.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agift_graph-0.2.1-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file agift_graph-0.2.1.tar.gz.

File metadata

  • Download URL: agift_graph-0.2.1.tar.gz
  • Upload date:
  • Size: 31.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agift_graph-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d7c357ee6786d7c9f9433317b96d011427193a77816a92c113a26e12c08913f0
MD5 be45aac1933281e70fe2719d93f4f8fb
BLAKE2b-256 522b813efbc233cfb9f7e1ddc65a129bf350723a85638b9e0e16006c924e4156

See more details on using hashes here.

Provenance

The following attestation bundles were made for agift_graph-0.2.1.tar.gz:

Publisher: publish.yml on DeepCivic/AGIFT-graph-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agift_graph-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: agift_graph-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agift_graph-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca30bba05870b148a16a5c8d923f82aa911386b9fed1a339a84158bc3e0d5cf1
MD5 66d97426bf26988a369e0a78df98c1e5
BLAKE2b-256 52c551f7ae659af53171f6ae734c5bb1c36c50c1f8b9932974fd41d45400834d

See more details on using hashes here.

Provenance

The following attestation bundles were made for agift_graph-0.2.1-py3-none-any.whl:

Publisher: publish.yml on DeepCivic/AGIFT-graph-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page