Skip to main content

A simple utility package to import json files to document db and export data from document db collections.

Project description

Document DB import export

A simple utility package to import json files to document db and export data from document db collections.

https://github.com/msankhala/docdb-import-export

Sponsor

Crohn's & Colitis Foundation - https://www.crohnscolitisfoundation.org

Roadmap

    • [x] Provider importer script to import data from json files to document db.
    • [x] Provider a simple python api to extend the functionality of the package.
    • [ ] Provider exporter script to export data from document db collections to json files.
    • [ ] Provider exporter script to export all data from document db collections to json files in a given directory.

Setup

  1. Create a EC2 instance in the same VPC as your document db

    1. Connecting to an Amazon DocumentDB Cluster from Outside an Amazon VPC
  2. Run SSH tunnel to the EC2 instance

    ssh -i <path/to/ec2-private-key.pem> -L 27017:<DOCUMENT-DB-SERVER-HOSTNAME>:27017 ec2-user@EC2-INSTANCE-DNS-ENDPOINT -N
    

    keep this command running in a separate terminal window.

  3. Create .env file with the following variables and set the values.

    DOCDB_HOST="YOUR_DOCUMENT_DB_HOSTNAME"
    DOCDB_PORT=YOUR_DOCUMENT_DB_PORT
    DOCDB_USERNAME="YOUR_DOCUMENT_DB_USERNAME"
    DOCDB_PASSWORD="YOUR_DOCUMENT_DB_PASSWORD"
    DOCDB_REPLICA_SET="rs0"
    DOCDB_READ_PREFERENCE="secondaryPreferred"
    DOCDB_RETRY_WRITES="false"
    DOCDB_DBNAME="dbname"
    DOCDB_IS_TLS_CONNECTION="false"
    DOCDB_TLS_CA_FILE_PATH="aws/aws-documentdb-ca-global-bundle.pem"
    DOCDB_TLS_ALLOW_INVALID_HOSTNAMES="false"
    DOCDB_DIRECT_CONNECTION="false"
    COLLECTION_NAME=recipe
    USER_COLLECTION_NAME=user
    

Uses

  1. Import data from a json file to document db

    python -m docdb_import_export import \
    --env-file=/path/to/.env \
    --fromjson=../my-data-folder/my.json \
    --db=test \
    --collection=temp \
    --drop
    
  2. Import data from a json file to document db using custom importer class

    python -m docdb_import_export import \
    --env-file=/path/to/.env \
    --fromjson=../my-data-folder/my.json \
    --db=test \
    --collection=temp \
    --import-class=some-dir/MyCustomImporter.py \
    --drop
    

    The importer class filename and classname should be same and importer class should be a subclass of DocDbDefaultJsonImporter class and should implement all abstract methods.

  3. Import data from a directory to document db

    python -m docdb_import_export import \
    --env-file=/path/to/.env \
    --fromjsondir=../my-data-folder/ \
    --db=test \
    --collection=temp \
    --drop
    
  4. Import data from a directory to document db using custom importer class

    python -m docdb_import_export import \
    --env-file=/path/to/.env \
    --fromjsondir=../my-data-folder/ \
    --db=test \
    --collection=temp \
    --import-class=some-dir/MyCustomImporter.py \
    --drop
    

    The importer class filename and classname should be same and importer class should be a subclass of DocDbDefaultJsonImporter class and should implement all abstract methods.

Providing your own custom importer class

Create a custom importer class that extends DocDbDefaultJsonImporter class and implement all abstract methods.

src/some-path/MyCustomImporter.py

import json
from dotenv import load_dotenv
from docdb_import_export.docdb_client import DocDbClient
from docdb_import_export.docdb_json_importer import DocDbDefaultJsonImporter

load_dotenv()

class MyCustomImporter(DocDbDefaultJsonImporter):

  def __init__(self, source_json_file_path, db_name, collection_name, drop_collection, update):
    super().__init__(source_json_file_path, db_name, collection_name, drop_collection, update)

  def import_json(self):
    # Only add if you want to add support for --drop option.
    self.delete_collection()

    # Read the json data from the file.
    with open(self.source_json_file_path) as f:
      json_list = json.load(f)

    items = []
    for index in json_list:
      # Call the transform_item method to transform the json data.
      items.append(self.transform_item(json_list[index]))
    # Insert the items into DocumentDB.
    self.docdb[self.db][self.collection].insert_many(items)
    print("Successfully imported json file: " + self.source_json_file_path)

  # This method allows you to transform the json data so that you can add or
  # remove the fields from the json data.
  def transform_item(self, item):
    item["_id"] = item["id"]
    del item["id"]
    # Add more transformations here if you want to.
    return item

Example usage:

python -m docdb_import_export import --env-file=src/docdb_import_export/.env --fromjson=../recipe-finder-data/ccf.json --db=test --collection=recipe --import-class=docdb-migration/RecipeImporter.py --drop
This will import the provided json file to the "test" database and "recipe" collection using the custom import class "docdb-migration/RecipeImporter.py". Are you sure you want to continue? [y/N]: y
Importing json file: ../recipe-finder-data/ccf.json
Successfully imported json file: ../recipe-finder-data/ccf.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docdb_import_export-0.6.2.tar.gz (7.4 kB view hashes)

Uploaded Source

Built Distribution

docdb_import_export-0.6.2-py3-none-any.whl (9.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page