Australian Government Interactive Functions Thesaurus (AGIFT) as a Neo4j knowledge graph with embeddings and dual edge types
Project description
AGIFT Graph
Australian Government Interactive Functions Thesaurus (AGIFT) as a knowledge graph with embeddings and dual edge types.
What it does
Fetches the full AGIFT vocabulary from the TemaTres API, builds a graph with structural hierarchy edges, generates embeddings (free local or Isaacus API), then creates semantic similarity edges between related terms.
TemaTres API ──► Graph ──► Embeddings ──► Semantic Edges
(AGIFT) (PARENT_OF) (384/512/768d) (SIMILAR_TO)
Graph model
Two edge types with different weights for query-time flexibility:
| Edge | Type | Weight | Description |
|---|---|---|---|
PARENT_OF |
structural | 1.0 | AGIFT hierarchy (L1 → L2 → L3) |
SIMILAR_TO |
semantic | 0.5 | Cosine similarity above threshold |
Nodes carry DCAT-AP theme mappings for interoperability with European open data standards.
Quick start
Path A — Docker (zero config)
pip install agift-graph[all]
docker compose up -d # starts Neo4j on localhost:7687
agift # fetches AGIFT, builds graph, embeds
Path B — existing Neo4j
pip install agift-graph[all]
export NEO4J_URI=bolt://my-server:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=mypassword
agift
The CLI reads env vars with sensible defaults (bolt://localhost:7687, neo4j/changeme) that match the included docker-compose.yml.
Path C — CogDB (embedded, no server)
CogDB is a persistent embedded graph database written in pure Python. No server process, no Docker — data is stored to local files.
pip install agift-graph[cogdb]
agift --backend cogdb # stores graph in ./agift_cogdb_data/
agift --backend cogdb --cogdb-dir /path/to/data
The pipeline runs identically to Neo4j — same fetch, embed, and semantic edge stages. CogDB terms, edges, and embeddings are stored as triples with JSON property blobs. Set COGDB_DATA_DIR as an environment variable or pass --cogdb-dir to control the storage location.
Install extras
| Install | What you get | Size |
|---|---|---|
pip install agift-graph |
Neo4j driver + fetch + graph build | Lightweight |
pip install agift-graph[cogdb] |
+ CogDB embedded graph backend | Small |
pip install agift-graph[embeddings] |
+ sentence-transformers + torch | ~2 GB |
pip install agift-graph[isaacus] |
+ Isaacus API client | Small |
pip install agift-graph[all] |
Everything (Neo4j + CogDB + embeddings + Isaacus) | ~2 GB |
Embedding providers
| Provider | Cost | Dimensions | Setup |
|---|---|---|---|
| local (sentence-transformers) | Free | 384, 768 | Nothing — runs on CPU |
| isaacus (kanon-2-embedder) | Paid | 256–1792 | Set API key in dashboard |
The local provider uses all-MiniLM-L6-v2 (384d) or all-mpnet-base-v2 (768d). Models are downloaded on first run and cached.
Programmatic usage
from agift import run_pipeline
# Run the full pipeline with Neo4j (default)
run_pipeline(provider="local", dimension=384)
# Explicit Neo4j connection
run_pipeline(
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="changeme",
skip_embed=True,
)
# Use CogDB instead of Neo4j
run_pipeline(backend_type="cogdb", provider="local", dimension=384)
Configuration
| Variable | Default | Description |
|---|---|---|
NEO4J_URI |
bolt://localhost:7687 |
Neo4j connection URI |
NEO4J_USER |
neo4j |
Neo4j username |
NEO4J_PASSWORD |
changeme |
Neo4j password |
COGDB_DATA_DIR |
agift_cogdb_data |
CogDB storage directory |
ISAACUS_API_KEY |
(empty) | Isaacus API key (optional) |
All other settings (dimension, provider, similarity threshold, semantic edge weight) are configured via the dashboard UI and stored in the graph backend.
Full Docker stack
The included docker-compose.yml runs Neo4j and the AGIFT container (dashboard + cron worker):
docker compose up -d --build
Then open the dashboard at http://localhost:5050 and click "Full Pipeline" or "Graph Only".
| Service | Port | Description |
|---|---|---|
| Neo4j Browser | 7474 | Graph database UI |
| Neo4j Bolt | 7687 | Database protocol |
| AGIFT | 5050 | Dashboard, pipeline runner, cron worker |
The AGIFT_MODE env var controls container behaviour:
| Mode | Description |
|---|---|
dashboard (default) |
Gunicorn web server + optional cron |
worker |
Cron only, no web server |
cli |
Run pipeline once and exit |
CLI usage
# Full pipeline (fetch + graph + embed + semantic edges)
agift
# Use CogDB instead of Neo4j
agift --backend cogdb
agift --backend cogdb --cogdb-dir /path/to/data
# Graph only (no embeddings)
agift --skip-embed --skip-semantic
# Local embeddings, 384 dimensions
agift --provider local --dimension 384
# Force re-embed all terms
agift --force-embed
# Custom similarity threshold for semantic edges
agift --threshold 0.65
# Faster run: skip alt label fetching
agift --skip-alt
# Dry run (fetch from API, no writes)
agift --dry-run
Project structure
agift/
├── __init__.py # Public API exports
├── backend.py # GraphBackend abstract interface
├── neo4j_backend.py # Neo4j backend implementation
├── cogdb_backend.py # CogDB backend implementation
├── cli.py # CLI entry point + run_pipeline()
├── common.py # Constants, backend factory, summary
├── fetch.py # TemaTres API fetching (concurrent)
├── graph.py # Schema setup + node/edge upsert
├── embed.py # Embedding providers (local + Isaacus)
├── link.py # Cosine similarity + semantic edges
Dockerfile # Unified image (dashboard + worker + CLI)
entrypoint.sh # Container mode dispatch
docker-compose.yml # Full stack (Neo4j + AGIFT container)
dashboard/
├── app.py # Flask dashboard + in-process pipeline
├── templates/index.html
import_agift.py # Backward-compatible entry point
pyproject.toml
CHANGELOG.md # Version history (release notes source)
release.sh # Version bump + tag helper
LICENSE # Apache 2.0
Data source
AGIFT is maintained by the National Archives of Australia and published via TemaTres at https://vocabularyserver.com/agift/
License
Apache 2.0 — see LICENSE.
Publishing
This project uses a tag-based release workflow with a single CI definition:
ci.ymlis the single source of truth for lint, test, and Docker checkspublish.ymlcallsci.ymlas a reusable workflow — no duplicated jobs
Pipeline sequence on tag push:
- CI runs (lint + test + Docker tests)
- Build PyPI package
- Publish to PyPI
- Build and push Docker image to Docker Hub (
deepcivic/agift) - Create GitHub Release with changelog entry
To release:
- Update
CHANGELOG.mdwith a new version entry - Run
./release.sh 0.2.0(updatespyproject.toml, commits, tags) - Push:
git push origin main --tags
Secrets required in GitHub repo settings:
DOCKERHUB_USERNAMEandDOCKERHUB_TOKEN(repo-level secrets)- PyPI trusted publishing is configured via OIDC (no secret needed)
Changelog
See CHANGELOG.md for version history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agift_graph-0.2.1.tar.gz.
File metadata
- Download URL: agift_graph-0.2.1.tar.gz
- Upload date:
- Size: 31.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7c357ee6786d7c9f9433317b96d011427193a77816a92c113a26e12c08913f0
|
|
| MD5 |
be45aac1933281e70fe2719d93f4f8fb
|
|
| BLAKE2b-256 |
522b813efbc233cfb9f7e1ddc65a129bf350723a85638b9e0e16006c924e4156
|
Provenance
The following attestation bundles were made for agift_graph-0.2.1.tar.gz:
Publisher:
publish.yml on DeepCivic/AGIFT-graph-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agift_graph-0.2.1.tar.gz -
Subject digest:
d7c357ee6786d7c9f9433317b96d011427193a77816a92c113a26e12c08913f0 - Sigstore transparency entry: 1239248392
- Sigstore integration time:
-
Permalink:
DeepCivic/AGIFT-graph-builder@28ce2efec0cae33b02ea958c423b13bb45c51c6f -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/DeepCivic
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@28ce2efec0cae33b02ea958c423b13bb45c51c6f -
Trigger Event:
push
-
Statement type:
File details
Details for the file agift_graph-0.2.1-py3-none-any.whl.
File metadata
- Download URL: agift_graph-0.2.1-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca30bba05870b148a16a5c8d923f82aa911386b9fed1a339a84158bc3e0d5cf1
|
|
| MD5 |
66d97426bf26988a369e0a78df98c1e5
|
|
| BLAKE2b-256 |
52c551f7ae659af53171f6ae734c5bb1c36c50c1f8b9932974fd41d45400834d
|
Provenance
The following attestation bundles were made for agift_graph-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on DeepCivic/AGIFT-graph-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agift_graph-0.2.1-py3-none-any.whl -
Subject digest:
ca30bba05870b148a16a5c8d923f82aa911386b9fed1a339a84158bc3e0d5cf1 - Sigstore transparency entry: 1239248393
- Sigstore integration time:
-
Permalink:
DeepCivic/AGIFT-graph-builder@28ce2efec0cae33b02ea958c423b13bb45c51c6f -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/DeepCivic
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@28ce2efec0cae33b02ea958c423b13bb45c51c6f -
Trigger Event:
push
-
Statement type: