A simple client for submitting, downloading, and deleting data on the DBpedia Databus
Project description
Databus Python Client
Command-line and Python client for downloading and deploying datasets on DBpedia Databus.
Table of Contents
- Quickstart
- DBpedia
- CLI Usage
- Module Usage
- Development & Contributing
Quickstart
The client supports two main workflows: downloading datasets from the Databus and deploying datasets to the Databus. Below you can choose how to run it (Python or Docker), then follow the sections on DBpedia downloads, CLI usage, or module usage.
You can use either Python or Docker. Both methods support all client features. The Docker image is available at dbpedia/databus-python-client.
Python
Requirements: Python 3.11+ and pip
Before using the client, install it via pip:
python3 -m pip install databusclient
Note: the PyPI release was updated and this repository prepares version 0.15. If you previously installed databusclient via pip and observe different CLI behavior, upgrade to the latest release:
python3 -m pip install --upgrade databusclient==0.15
You can then use the client in the command line:
databusclient --help
databusclient deploy --help
databusclient delete --help
databusclient download --help
Docker
Requirements: Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client --help
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy --help
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help
DBpedia
Commands to download the DBpedia Knowledge Graphs generated by Live Fusion. DBpedia Live Fusion publishes two kinds of KGs:
- Open Core Knowledge Graphs under CC-BY-SA license, open with copyleft/share-alike, no registration needed.
- Industry Knowledge Graphs under BUSL 1.1 license, unrestricted for research and experimentation, commercial license for productive use, free registration needed.
Registration (Access Token)
To download BUSL 1.1 licensed datasets, you need to register and get an access token.
- If you do not have a DBpedia Account yet (Forum/Databus), please register at https://account.dbpedia.org
- Log in at https://account.dbpedia.org and create your token.
- Save the token to a file, e.g.
vault-token.dat.
DBpedia Knowledge Graphs
Download Live Fusion KG Dump (BUSL 1.1, registration needed)
High-frequency, conflict-resolved knowledge graph that merges Live Wikipedia and Wikidata signals into a single, queryable dump for enterprise consumption. More information
# Python
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump --vault-token vault-token.dat
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump --vault-token vault-token.dat
Download Enriched Knowledge Graphs (BUSL 1.1, registration needed)
DBpedia Wikipedia Extraction Enriched
DBpedia-based enrichment of structured Wikipedia extractions (currently EN DBpedia only). More information
# Python
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat
Download DBpedia Wikipedia Knowledge Graphs (CC-BY-SA, no registration needed)
Original extraction of structured Wikipedia data before enrichment. More information
# Python
databusclient download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dump
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dump
Download DBpedia Wikidata Knowledge Graphs (CC-BY-SA, no registration needed)
Original extraction of structured Wikidata data before enrichment. More information
# Python
databusclient download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dump
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dump
CLI Usage
To get started with the command-line interface (CLI) of the databus-python-client, you can use either the Python installation or the Docker image. The examples below show both methods.
Help and further general information:
# Python
databusclient --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client --help
# Output:
Usage: databusclient [OPTIONS] COMMAND [ARGS]...
Databus Client CLI
Options:
--help Show this message and exit.
Commands:
deploy Flexible deploy to Databus command supporting three modes:
download Download datasets from databus, optionally using vault access...
Download
With the download command, you can download datasets or parts thereof from the Databus. The download command expects one or more Databus URIs or a SPARQL query as arguments. The URIs can point to files, versions, artifacts, groups, or collections. If a SPARQL query is provided, the query must return download URLs from the Databus which will be downloaded.
# Python
databusclient download $DOWNLOADTARGET
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOADTARGET
-
$DOWNLOADTARGET- Can be any Databus URI including collections OR SPARQL query (or several thereof).
-
--localdir- If no
--localdiris provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the Databus layout, i.e../$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/.
- If no
-
--vault-token- If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with
--vault-token /path/to/vault-token.dat. See Registration (Access Token) for details on how to get a vault token.
Note: Vault tokens are only required for certain protected Databus hosts (for example:
data.dbpedia.io,data.dev.dbpedia.link). The client now detects those hosts and will fail early with a clear message if a token is required but not provided. Do not pass--vault-tokenfor public downloads. - If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with
-
--databus-key- If the databus is protected and needs API key authentication, you can provide the API key with
--databus-key YOUR_API_KEY.
- If the databus is protected and needs API key authentication, you can provide the API key with
Help and further information on download command:
# Python
databusclient download --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help
# Output:
Usage: databusclient download [OPTIONS] DATABUSURIS...
Download datasets from databus, optionally using vault access if vault
options are provided.
Options:
--localdir TEXT Local databus folder (if not given, databus folder
structure is created in current working directory)
--databus TEXT Databus URL (if not given, inferred from databusuri,
e.g. https://databus.dbpedia.org/sparql)
--vault-token TEXT Path to Vault refresh token file
--databus-key TEXT Databus API key to download from protected databus
--all-versions When downloading artifacts, download all versions
instead of only the latest
--authurl TEXT Keycloak token endpoint URL [default:
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
connect/token]
--clientid TEXT Client ID for token exchange [default: vault-token-
exchange]
--help Show this message and exit.
Examples of using the download command
Download File: download of a single file
# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2
Download Version: download of all files of a specific version
# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01
Download Artifact: download of all files with the latest version of an artifact
# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals
Download Group: download of all files with the latest version of all artifacts of a group
# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings
Download Collection: download of all files within a collection
# Python
databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12
Download Query: download of all files returned by a query (SPARQL endpoint must be provided with --databus)
# Python
databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
Deploy
With the deploy command, you can deploy datasets to the Databus. The deploy command supports three modes:
- Classic dataset deployment via list of distributions
- Metadata-based deployment via metadata JSON file
- Upload & deploy via Nextcloud/WebDAV
# Python
databusclient deploy [OPTIONS] [DISTRIBUTIONS]...
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy [OPTIONS] [DISTRIBUTIONS]...
Help and further information on deploy command:
# Python
databusclient deploy --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy --help
# Output:
Usage: databusclient deploy [OPTIONS] [DISTRIBUTIONS]...
Flexible deploy to Databus command supporting three modes:
- Classic deploy (distributions as arguments)
- Metadata-based deploy (--metadata <file>)
- Upload & deploy via Nextcloud (--webdav-url, --remote, --path)
Options:
--version-id TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
RSION> [required]
--title TEXT Dataset title [required]
--abstract TEXT Dataset abstract max 200 chars [required]
--description TEXT Dataset description [required]
--license TEXT License (see dalicc.net) [required]
--apikey TEXT API key [required]
--metadata PATH Path to metadata JSON file (for metadata mode)
--webdav-url TEXT WebDAV URL (e.g.,
https://cloud.example.com/remote.php/webdav)
--remote TEXT rclone remote name (e.g., 'nextcloud')
--path TEXT Remote path on Nextcloud (e.g., 'datasets/mydataset')
--help Show this message and exit.
Mode 1: Classic Deploy (Distributions)
# Python
databusclient deploy \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 \
--title "Client Testing" \
--abstract "Testing the client...." \
--description "Testing the client...." \
--license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 \
--apikey MYSTERIOUS \
'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 \
--title "Client Testing" \
--abstract "Testing the client...." \
--description "Testing the client...." \
--license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 \
--apikey MYSTERIOUS \
'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger
A few more notes for CLI usage:
- The content variants can be left out ONLY IF there is just one distribution
- For complete inferred: Just use the URL with
https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml - If other parameters are used, you need to leave them empty like
https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116
- For complete inferred: Just use the URL with
Mode 2: Deploy with Metadata File
Use a JSON metadata file to define all distributions. The metadata.json should list all distributions and their metadata. All files referenced there will be registered on the Databus.
# Python
databusclient deploy \
--metadata ./metadata.json \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Metadata Deploy Example" \
--abstract "This is a short abstract of the dataset." \
--description "This dataset was uploaded using metadata.json." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY"
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--metadata ./metadata.json \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Metadata Deploy Example" \
--abstract "This is a short abstract of the dataset." \
--description "This dataset was uploaded using metadata.json." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY"
Example metadata.json metadata file structure (file_format and compression are optional):
[
{
"checksum": "0929436d44bba110fc7578c138ed770ae9f548e195d19c2f00d813cca24b9f39",
"size": 12345,
"url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.ttl",
"file_format": "ttl"
},
{
"checksum": "2238acdd7cf6bc8d9c9963a9f6014051c754bf8a04aacc5cb10448e2da72c537",
"size": 54321,
"url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.csv.gz",
"file_format": "csv",
"compression": "gz"
}
]
Mode 3: Upload & Deploy via Nextcloud
Upload local files or folders to a WebDAV/Nextcloud instance and automatically deploy to DBpedia Databus. Rclone is required.
# Python
databusclient deploy \
--webdav-url https://cloud.example.com/remote.php/webdav \
--remote nextcloud \
--path datasets/mydataset \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Test Dataset" \
--abstract "Short abstract of dataset" \
--description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY" \
./localfile1.ttl \
./data_folder
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--webdav-url https://cloud.example.com/remote.php/webdav \
--remote nextcloud \
--path datasets/mydataset \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Test Dataset" \
--abstract "Short abstract of dataset" \
--description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY" \
./localfile1.ttl \
./data_folder
Delete
With the delete command you can delete collections, groups, artifacts, and versions from the Databus. Deleting files is not supported via API.
Note: Deleting datasets will recursively delete all data associated with the dataset below the specified level. Please use this command with caution. As security measure, the delete command will prompt you for confirmation before proceeding with any deletion.
# Python
databusclient delete [OPTIONS] DATABUSURIS...
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete [OPTIONS] DATABUSURIS...
Help and further information on delete command:
# Python
databusclient delete --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete --help
# Output:
Usage: databusclient delete [OPTIONS] DATABUSURIS...
Delete a dataset from the databus.
Delete a group, artifact, or version identified by the given databus URI.
Will recursively delete all data associated with the dataset.
Options:
--databus-key TEXT Databus API key to access protected databus [required]
--dry-run Perform a dry run without actual deletion
--force Force deletion without confirmation prompt
--help Show this message and exit.
To authenticate the delete request, you need to provide an API key with --databus-key YOUR_API_KEY.
If you want to perform a dry run without actual deletion, use the --dry-run option. This will show you what would be deleted without making any changes.
As security measure, the delete command will prompt you for confirmation before proceeding with the deletion. If you want to skip this prompt, you can use the --force option.
Examples of using the delete command
Delete Version: delete a specific version
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
Delete Artifact: delete an artifact and all its versions
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
Delete Group: delete a group and all its artifacts and versions
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
Delete Collection: delete collection
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
Module Usage
Deploy
Step 1: Create lists of distributions for the dataset
from databusclient import create_distribution
# create a list
distributions = []
# minimal requirements
# compression and filetype will be inferred from the path
# this will trigger the download of the file to evaluate the shasum and content length
distributions.append(
create_distribution(url="https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml", cvs={"type": "swagger"})
)
# full parameters
# will just place parameters correctly, nothing will be downloaded or inferred
distributions.append(
create_distribution(
url="https://example.org/some/random/file.csv.bz2",
cvs={"type": "example", "realfile": "false"},
file_format="csv",
compression="bz2",
sha256_length_tuple=("7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653", 367116)
)
)
A few notes:
- The dict for content variants can be empty ONLY IF there is just one distribution
- There can be no compression if there is no file format
Step 2: Create dataset
from databusclient import create_dataset
# minimal way
dataset = create_dataset(
version_id="https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18",
title="Client Testing",
abstract="Testing the client....",
description="Testing the client....",
license_url="http://dalicc.net/licenselibrary/AdaptivePublicLicense10",
distributions=distributions,
)
# with group metadata
dataset = create_dataset(
version_id="https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18",
title="Client Testing",
abstract="Testing the client....",
description="Testing the client....",
license_url="http://dalicc.net/licenselibrary/AdaptivePublicLicense10",
distributions=distributions,
group_title="Title of group1",
group_abstract="Abstract of group1",
group_description="Description of group1"
)
NOTE: Group metadata is applied only if all group parameters are set.
Step 3: Deploy to Databus
from databusclient import deploy
# to deploy something you just need the dataset from the previous step and an API key
# API key can be found (or generated) at https://$$DATABUS_BASE$$/$$USER$$#settings
deploy(dataset, "mysterious API key")
Development & Contributing
Install development dependencies yourself or via Poetry:
poetry install --with dev
Linting
The used linter is Ruff. Ruff is configured in pyproject.toml and is enforced in CI (.github/workflows/ruff.yml).
For development, you can run linting locally with ruff check . and optionally auto-format with ruff format ..
To ensure compatibility with the pyproject.toml configured dependencies, run Ruff via Poetry:
# To check for linting issues:
poetry run ruff check .
# To auto-format code:
poetry run ruff format .
Testing
When developing new features please make sure to add appropriate tests and ensure that all tests pass. Tests are under tests/ and use pytest as test framework.
When fixing bugs or refactoring existing code, please make sure to add tests that cover the affected functionality. The current test coverage is very low, so any additional tests are highly appreciated.
To run tests locally, use:
pytest tests/
Or to ensure compatibility with the pyproject.toml configured dependencies, run pytest via Poetry:
poetry run pytest tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databusclient-0.15.tar.gz.
File metadata
- Download URL: databusclient-0.15.tar.gz
- Upload date:
- Size: 28.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cbe63202c0fe3853d53495119b136522f6a5b7701d12242fd86882935b4b4c3
|
|
| MD5 |
ea5216cccb178e3e4bff4912897cd6d3
|
|
| BLAKE2b-256 |
94ed4ed9d970a4065c33e60a40e4b0083090c55e3ebf61086696ebe774ea786a
|
File details
Details for the file databusclient-0.15-py3-none-any.whl.
File metadata
- Download URL: databusclient-0.15-py3-none-any.whl
- Upload date:
- Size: 28.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18ef68cc0a10dcba98f9303b013984cc3fcadf409c7e1a4cbf8bc3767c2265a8
|
|
| MD5 |
7085166d52307eabb713c44d732cfde3
|
|
| BLAKE2b-256 |
1447b0f48c52162b4ce80a2ad4cae8b04f987f1f7c838a46fc7b7fcf89981367
|