Skip to main content

Package for fetching and loading datasets for ArangoDB deployments.

Project description

ArangoDB Datasets

Package for loading pre-configured Graph datasets into an ArangoDB Instance.

Installation

pip install arango-datasets

Usage

from arango import ArangoClient
from arango_datasets import Datasets

# Connect to database
db = ArangoClient(hosts=...).db(username=..., password=..., verify=True)

# Connect to datasets
datasets = Datasets(db)

# List datasets
print(datasets.list_datasets())

# List more information about a particular dataset
print(datasets.dataset_info("FLIGHTS")

# Load a dataset
datasets.load("FLIGHTS")

Notable Datasets

Synthea P100

Synthea is an open-source synthetic patient dataset that simulates health records for a diverse set of fictional individuals. It includes demographic, clinical, and social data such as diagnoses, medications, procedures, and encounters over a patient’s lifetime. The data is generated using realistic patterns derived from real-world healthcare statistics, enabling its use in research, development, and testing of health IT systems while preserving patient privacy.

Source: https://synthea.mitre.org/

Size: 145514 nodes, 311701 edges

print(datasets.dataset_info("SYNTHEA_P100")

datasets.load("SYNTHEA_P100")

Common Vulnerability Exposures

This dataset contains information on Common Vulnerabilities and Exposures (CVE), providing details on known security vulnerabilities in software and hardware. It includes fields such as CVE ID, descriptions, severity scores (CVSS), affected products, and references. The dataset is useful for cybersecurity research, threat analysis, and vulnerability management, helping organizations track and mitigate security risks.

Source: https://www.kaggle.com/datasets/andrewkronser/cve-common-vulnerabilities-and-exposures

Size: 145506 nodes, 316967 edges

print(datasets.dataset_info("CVE")

datasets.load("CVE")

Flights

The Flights dataset in contains flight-related data, including information on routes, airports, and airlines. It is structured as a graph dataset, where airports act as nodes and flights between them as edges. This dataset is useful for demonstrating graph queries, shortest path analysis, and network connectivity.

Source: https://github.com/arangodb/example-datasets/tree/master/Data%20Loader

Size: 3375 nodes, 286463 edges

print(datasets.dataset_info("FLIGHTS")

datasets.load("FLIGHTS")

GDELT Open Intelligence

The GDELT Project (Global Database of Events, Language, and Tone) is an open dataset that monitors global news media in real-time. It captures and analyzes events, themes, emotions, and relationships across countries, organizations, and people. Covering millions of articles from various sources, GDELT provides insights into geopolitical trends, conflicts, and societal changes. The dataset is widely used in research, journalism, and AI applications for tracking global events and sentiment analysis.

Source: https://www.gdeltproject.org/

Size: 80047 nodes, 321819 edges

print(datasets.dataset_info("OPEN_INTELLIGENCE")

datasets.load("OPEN_INTELLIGENCE")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arango_datasets-1.2.3.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

arango_datasets-1.2.3-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file arango_datasets-1.2.3.tar.gz.

File metadata

  • Download URL: arango_datasets-1.2.3.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for arango_datasets-1.2.3.tar.gz
Algorithm Hash digest
SHA256 d71de3a29e9d52bd4745f0fcddf1b76f1caacdef1616983542076617db436963
MD5 435a4dc59be2870c10547374af3f7e2d
BLAKE2b-256 e4b60eb89eacbb3e28fc49b0e2b6eef41772a9c409cfeadf1c4c53226f97554e

See more details on using hashes here.

File details

Details for the file arango_datasets-1.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for arango_datasets-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3922e4dc9e865665dfef5816f582b534b40cd6b30341087e20191ecf7297b083
MD5 22eae80ea6a8488824c1ff3ca3ef1b8e
BLAKE2b-256 11eeb2647a517efee1a2fedf4764e2a97eefe9719399aff9d7ef20d34ebd6b36

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page