Skip to main content

A high-performance tool for importing Neo4j JSONL graph data exports into ArangoDB

Project description

ArangoImport

A high-performance tool for importing Neo4j JSONL graph data exports into ArangoDB.

Features

  • Import Neo4j database exports into ArangoDB
  • Efficient parallel processing of large JSONL files
  • Support for both local and Docker ArangoDB instances
  • Dynamic memory management and batch sizing
  • Connection pooling for optimal performance
  • Progress tracking and detailed logging
  • Available as both CLI tool and Python package

Installation

pip install arangoimport

Quick Start

  1. Export your Neo4j database to JSONL:

    CALL apoc.export.json.all("path/to/export.jsonl", {useTypes: true})
    
  2. Import into ArangoDB using either method:

    A. Command Line Interface (CLI)

    After installation, the arangoimport command is available in your terminal:

    # Show help and available options
    arangoimport --help
    
    # Import data with default settings (will prompt for password)
    arangoimport import-data /path/to/neo4j_export.jsonl
    
    # Import with custom settings
    arangoimport import-data /path/to/neo4j_export.jsonl \
        --db-name my_graph \
        --host arangodb.example.com \
        --port 8530 \
        --username graph_user
    

    B. Python API

    from arangoimport.connection import ArangoConfig
    from arangoimport.importer import parallel_load_data
    
    # Configure database connection
    db_config = ArangoConfig(
        host="localhost",
        port=8529,
        username="root",
        password="your_password",  # Or use ARANGO_PASSWORD env var
        db_name="db_name"
    )
    
    # Import the data
    nodes, edges = parallel_load_data(
        "path/to/neo4j_export.jsonl",
        dict(db_config),
        num_processes=None  # None means use (CPU count - 1)
    )
    
    print(f"Successfully imported {nodes:,} nodes and {edges:,} edges!")
    

Environment Variables

  • ARANGO_PASSWORD: Database password (avoid hardcoding in scripts)
  • ARANGO_USER: Username (default: root)

CLI Options

General Options

  • --file <string>: The file to import ("-" for stdin)
  • --type <string>: Input format (auto/csv/json/jsonl/tsv, default: auto)
  • --collection <string>: Target collection name
  • --create-collection <boolean>: Create collection if missing (default: false)
  • --create-collection-type <string>: Collection type if created (document/edge, default: document)
  • --create-database <boolean>: Create database if missing (default: false)
  • --threads <uint32>: Number of parallel import threads (default: 32)
  • --batch-size <uint64>: Data batch size in bytes (default: 8MB)
  • --progress <boolean>: Show progress (default: true)

Server Connection

  • --server.database <string>: Target database (default: "_system")
  • --server.endpoint <string>: Server endpoint (default: "http+tcp://127.0.0.1:8529")
  • --server.username <string>: Username (default: "root")
  • --server.password <string>: Password (prompted if not provided)
  • --server.authentication <boolean>: Require authentication (default: true)

Performance Options

  • --auto-rate-limit <boolean>: Auto-adjust loading rate (default: false)
  • --compress-transfer <boolean>: Compress data transfer (default: false)
  • --max-errors <uint64>: Maximum errors before stopping (default: 20)
  • --skip-validation <boolean>: Skip schema validation (default: false)

For a complete list of options, run:

arangoimport --help

Docker Support

When using Docker, ensure your ArangoDB container is running:

docker run -p 8529:8529 -e ARANGO_ROOT_PASSWORD=yourpassword arangodb:latest

Then import using either the CLI or Python API, pointing to the exposed port.

Performance Tuning

The importer automatically optimizes for:

  • Available system memory
  • CPU cores (uses CPU count - 1 by default)
  • Network conditions

You can fine-tune performance with:

  • --threads: Control parallel threads
  • --batch-size: Adjust batch size
  • --auto-rate-limit: Enable automatic rate limiting
  • --compress-transfer: Enable data compression

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arangoimport-0.1.6.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arangoimport-0.1.6-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file arangoimport-0.1.6.tar.gz.

File metadata

  • Download URL: arangoimport-0.1.6.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.10.12 Linux/6.9.3-76060903-generic

File hashes

Hashes for arangoimport-0.1.6.tar.gz
Algorithm Hash digest
SHA256 9e7fcf3599685b04ed8a76b0e560df99aaebb4694c0e09c57c9ea3ad6d9530f2
MD5 f56d9a84b7d9c80c192acca4c2e325cd
BLAKE2b-256 41358dfe041b7d353cf2c712026f3ae39c9eee68d49598e54dbafa034e7d05ca

See more details on using hashes here.

File details

Details for the file arangoimport-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: arangoimport-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.10.12 Linux/6.9.3-76060903-generic

File hashes

Hashes for arangoimport-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9e507792b15d4e49fba9801ac8e9e0e7bd8d6c4c14bbd7582a2d758d685abc7b
MD5 f3d24ed95bec067de65cba83b24a430c
BLAKE2b-256 363fb02c58475ea0b06565c1c6ed2f42e65ccb4e731adfe5c6a8e20ead777309

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page