Skip to main content

A simple utility package to import json files to document db and export data from document db collections.

Project description

Document DB import export

A simple utility package to import json files to document db and export data from document db collections.

https://github.com/msankhala/docdb-import-export

Sponsor

Crohn's & Colitis Foundation - https://www.crohnscolitisfoundation.org

Roadmap

    • [x] Provider importer script to import data from json files to document db.
    • [x] Provider a simple python api to extend the functionality of the package.
    • [ ] Provider exporter script to export data from document db collections to json files.
    • [ ] Provider exporter script to export all data from document db collections to json files in a given directory.

Setup

  1. Create a EC2 instance in the same VPC as your document db

    1. Connecting to an Amazon DocumentDB Cluster from Outside an Amazon VPC
  2. Run SSH tunnel to the EC2 instance

    ssh -i <path/to/ec2-private-key.pem> -L 27017:<DOCUMENT-DB-SERVER-HOSTNAME>:27017 ec2-user@EC2-INSTANCE-DNS-ENDPOINT -N
    

    keep this command running in a separate terminal window.

  3. Create .env file with the following variables and set the values.

    DOCDB_HOST="YOUR_DOCUMENT_DB_HOSTNAME"
    DOCDB_PORT=YOUR_DOCUMENT_DB_PORT
    DOCDB_USERNAME="YOUR_DOCUMENT_DB_USERNAME"
    DOCDB_PASSWORD="YOUR_DOCUMENT_DB_PASSWORD"
    DOCDB_REPLICA_SET="rs0"
    DOCDB_READ_PREFERENCE="secondaryPreferred"
    DOCDB_RETRY_WRITES="false"
    DOCDB_DBNAME="dbname"
    DOCDB_IS_TLS_CONNECTION="false"
    DOCDB_TLS_CA_FILE_PATH="aws/aws-documentdb-ca-global-bundle.pem"
    DOCDB_TLS_ALLOW_INVALID_HOSTNAMES="false"
    DOCDB_DIRECT_CONNECTION="false"
    COLLECTION_NAME=recipe
    USER_COLLECTION_NAME=user
    

Uses

  1. Import data from a json file to document db

    python -m docdb_import_export import \
    --env-file=/path/to/.env \
    --fromjson=../my-data-folder/my.json \
    --db=test \
    --collection=temp \
    --drop
    
  2. Import data from a json file to document db using custom importer class

    python -m docdb_import_export import \
    --env-file=/path/to/.env \
    --fromjson=../my-data-folder/my.json \
    --db=test \
    --collection=temp \
    --import-class=some-dir/MyCustomImporter.py \
    --drop
    

    The importer class filename and classname should be same and importer class should be a subclass of DocDbDefaultJsonImporter class and should implement all abstract methods.

  3. Import data from a directory to document db

    python -m docdb_import_export import \
    --env-file=/path/to/.env \
    --fromjsondir=../my-data-folder/ \
    --db=test \
    --collection=temp \
    --drop
    
  4. Import data from a directory to document db using custom importer class

    python -m docdb_import_export import \
    --env-file=/path/to/.env \
    --fromjsondir=../my-data-folder/ \
    --db=test \
    --collection=temp \
    --import-class=some-dir/MyCustomImporter.py \
    --drop
    

    The importer class filename and classname should be same and importer class should be a subclass of DocDbDefaultJsonImporter class and should implement all abstract methods.

Providing your own custom importer class

Create a custom importer class that extends DocDbDefaultJsonImporter class and implement all abstract methods.

src/some-path/MyCustomImporter.py

import json
from dotenv import load_dotenv
from docdb_import_export.docdb_client import DocDbClient
from docdb_import_export.docdb_json_importer import DocDbDefaultJsonImporter

load_dotenv()

class MyCustomImporter(DocDbDefaultJsonImporter):

  def __init__(self, source_json_file_path, db_name, collection_name, drop_collection, update):
    super().__init__(source_json_file_path, db_name, collection_name, drop_collection, update)

  def import_json(self):
    # Only add if you want to add support for --drop option.
    self.delete_collection()

    # Read the json data from the file.
    with open(self.source_json_file_path) as f:
      json_list = json.load(f)

    items = []
    for index in json_list:
      # Call the transform_item method to transform the json data.
      items.append(self.transform_item(json_list[index]))
    # Insert the items into DocumentDB.
    self.docdb[self.db][self.collection].insert_many(items)
    print("Successfully imported json file: " + self.source_json_file_path)

  # This method allows you to transform the json data so that you can add or
  # remove the fields from the json data.
  def transform_item(self, item):
    item["_id"] = item["id"]
    del item["id"]
    # Add more transformations here if you want to.
    return item

Example usage:

python -m docdb_import_export import --env-file=src/docdb_import_export/.env --fromjson=../recipe-finder-data/ccf.json --db=test --collection=recipe --import-class=docdb-migration/RecipeImporter.py --drop
This will import the provided json file to the "test" database and "recipe" collection using the custom import class "docdb-migration/RecipeImporter.py". Are you sure you want to continue? [y/N]: y
Importing json file: ../recipe-finder-data/ccf.json
Successfully imported json file: ../recipe-finder-data/ccf.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docdb_import_export-0.6.2.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docdb_import_export-0.6.2-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file docdb_import_export-0.6.2.tar.gz.

File metadata

  • Download URL: docdb_import_export-0.6.2.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for docdb_import_export-0.6.2.tar.gz
Algorithm Hash digest
SHA256 3c4fca4aa9cefc39a631c05760b95c15d250ebd0a2d61d8ebce0447f55b19ca0
MD5 dc6650541aba7908a249c9b9e0c48478
BLAKE2b-256 35945e9626c34d4a20c0e41347a199fd60f14194a436aac989840ebaabc7e2fe

See more details on using hashes here.

File details

Details for the file docdb_import_export-0.6.2-py3-none-any.whl.

File metadata

File hashes

Hashes for docdb_import_export-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 faba3f769e6a01c09c470268b756cfea6c4a56bf7315ab65c85f9d6e8a3c9d50
MD5 ae11299871480b7852e6d711f79f4b03
BLAKE2b-256 451cd0b40f6751eaff1e4df1b293906ce4f8592ecec876c2561425975771be59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page