Skip to main content

Australian Government Interactive Functions Thesaurus (AGIFT) as a Neo4j knowledge graph with embeddings and dual edge types

Project description

AGIFT Graph

Australian Government Interactive Functions Thesaurus (AGIFT) as a knowledge graph with embeddings and dual edge types.

What it does

Fetches the full AGIFT vocabulary from the TemaTres API, builds a graph with structural hierarchy edges, generates embeddings (free local or Isaacus API), then creates semantic similarity edges between related terms.

TemaTres API ──►  Graph ──► Embeddings ──► Semantic Edges
  (AGIFT)        (PARENT_OF)     (384/512/768d)  (SIMILAR_TO)

Graph model

Two edge types with different weights for query-time flexibility:

Edge Type Weight Description
PARENT_OF structural 1.0 AGIFT hierarchy (L1 → L2 → L3)
SIMILAR_TO semantic 0.5 Cosine similarity above threshold

Nodes carry DCAT-AP theme mappings for interoperability with European open data standards.

Quick start

Path A — Docker (zero config)

pip install agift-graph[all]
docker compose up -d          # starts Neo4j on localhost:7687
agift                         # fetches AGIFT, builds graph, embeds

Path B — existing Neo4j

pip install agift-graph[all]
export NEO4J_URI=bolt://my-server:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=mypassword
agift

The CLI reads env vars with sensible defaults (bolt://localhost:7687, neo4j/changeme) that match the included docker-compose.yml.

Path C — CogDB (embedded, no server)

CogDB is a persistent embedded graph database written in pure Python. No server process, no Docker — data is stored to local files.

pip install agift-graph[cogdb]
agift --backend cogdb                # stores graph in ./agift_cogdb_data/
agift --backend cogdb --cogdb-dir /path/to/data

The pipeline runs identically to Neo4j — same fetch, embed, and semantic edge stages. CogDB terms, edges, and embeddings are stored as triples with JSON property blobs. Set COGDB_DATA_DIR as an environment variable or pass --cogdb-dir to control the storage location.

Install extras

Install What you get Size
pip install agift-graph Neo4j driver + fetch + graph build Lightweight
pip install agift-graph[cogdb] + CogDB embedded graph backend Small
pip install agift-graph[embeddings] + sentence-transformers + torch ~2 GB
pip install agift-graph[isaacus] + Isaacus API client Small
pip install agift-graph[all] Everything (Neo4j + CogDB + embeddings + Isaacus) ~2 GB

Embedding providers

Provider Cost Dimensions Setup
local (sentence-transformers) Free 384, 768 Nothing — runs on CPU
isaacus (kanon-2-embedder) Paid 256–1792 Set API key in dashboard

The local provider uses all-MiniLM-L6-v2 (384d) or all-mpnet-base-v2 (768d). Models are downloaded on first run and cached.

Programmatic usage

from agift import run_pipeline

# Run the full pipeline with Neo4j (default)
run_pipeline(provider="local", dimension=384)

# Explicit Neo4j connection
run_pipeline(
    neo4j_uri="bolt://localhost:7687",
    neo4j_user="neo4j",
    neo4j_password="changeme",
    skip_embed=True,
)

# Use CogDB instead of Neo4j
run_pipeline(backend_type="cogdb", provider="local", dimension=384)

Configuration

Variable Default Description
NEO4J_URI bolt://localhost:7687 Neo4j connection URI
NEO4J_USER neo4j Neo4j username
NEO4J_PASSWORD changeme Neo4j password
COGDB_DATA_DIR agift_cogdb_data CogDB storage directory
ISAACUS_API_KEY (empty) Isaacus API key (optional)

All other settings (dimension, provider, similarity threshold, semantic edge weight) are configured via the dashboard UI and stored in the graph backend.

Full Docker stack

The included docker-compose.yml runs Neo4j, the dashboard, and a cron worker:

docker compose up -d --build

Then open the dashboard at http://localhost:5050 and click "Full Pipeline" or "Graph Only".

Service Port Description
Neo4j Browser 7474 Graph database UI
Neo4j Bolt 7687 Database protocol
Dashboard 5050 Config, run controls, logs
Worker Cron-scheduled pipeline runs

CLI usage

# Full pipeline (fetch + graph + embed + semantic edges)
agift

# Use CogDB instead of Neo4j
agift --backend cogdb
agift --backend cogdb --cogdb-dir /path/to/data

# Graph only (no embeddings)
agift --skip-embed --skip-semantic

# Local embeddings, 384 dimensions
agift --provider local --dimension 384

# Force re-embed all terms
agift --force-embed

# Custom similarity threshold for semantic edges
agift --threshold 0.65

# Faster run: skip alt label fetching
agift --skip-alt

# Dry run (fetch from API, no writes)
agift --dry-run

Project structure

agift/
├── __init__.py              # Public API exports
├── backend.py               # GraphBackend abstract interface
├── neo4j_backend.py         # Neo4j backend implementation
├── cogdb_backend.py         # CogDB backend implementation
├── cli.py                   # CLI entry point + run_pipeline()
├── common.py                # Constants, backend factory, summary
├── fetch.py                 # TemaTres API fetching
├── graph.py                 # Schema setup + node/edge upsert
├── embed.py                 # Embedding providers (local + Isaacus)
├── link.py                  # Cosine similarity + semantic edges
docker-compose.yml           # Full stack (Neo4j + dashboard + worker)
dashboard/
├── app.py                   # Flask dashboard + run controls
├── templates/index.html
worker/
├── Dockerfile
├── entrypoint.sh            # Cron scheduler + manual trigger
import_agift.py              # Backward-compatible entry point
pyproject.toml
CHANGELOG.md                 # Version history (release notes source)
release.sh                   # Version bump + tag helper
LICENSE                      # Apache 2.0

Data source

AGIFT is maintained by the National Archives of Australia and published via TemaTres at https://vocabularyserver.com/agift/

License

Apache 2.0 — see LICENSE.

Publishing

This project uses a tag-based release workflow:

  1. Update CHANGELOG.md with a new version entry
  2. Run ./release.sh 0.2.0 (replaces version in pyproject.toml, commits, tags)
  3. Push: git push origin main --tags

The GitHub Actions workflow will:

  • Build and publish to PyPI
  • Build and push Docker images to Docker Hub (deepcivic/agift-dashboard, deepcivic/agift-worker)
  • Create a GitHub Release with the changelog entry

Secrets required in GitHub repo settings:

  • DOCKERHUB_USERNAME and DOCKERHUB_TOKEN (repo-level secrets)
  • PyPI trusted publishing is configured via OIDC (no secret needed)

Changelog

See CHANGELOG.md for version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agift_graph-0.1.1.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agift_graph-0.1.1-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file agift_graph-0.1.1.tar.gz.

File metadata

  • Download URL: agift_graph-0.1.1.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agift_graph-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c87904202f8ed491a3db46d77b74f66ac5b33f7ac3eaa0b9a0f3cb8a4e44d9bc
MD5 33fffccf3c13df932048b68d0c65381c
BLAKE2b-256 892ed7d0b249a44da86413a6540af74380801edb08f4705fcc92b026419102e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for agift_graph-0.1.1.tar.gz:

Publisher: publish.yml on DeepCivic/AGIFT-graph-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agift_graph-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: agift_graph-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agift_graph-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6061fed9ce5aaaa4b298ee8d3ac8a74e912685f69c5eab2823b734c284ea9162
MD5 6e19c4bd5c0b23d662b4f44fe6221377
BLAKE2b-256 9a104c1a8fb5bdd78822c8c2f7903013fb0bd0a916b712aba219d07d6434d0d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for agift_graph-0.1.1-py3-none-any.whl:

Publisher: publish.yml on DeepCivic/AGIFT-graph-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page