A simple utility package to import json files to document db and export data from document db collections.
Project description
Document DB import export
A simple utility package to import json files to document db and export data from document db collections.
https://github.com/msankhala/docdb-import-export
Sponsor
Crohn's & Colitis Foundation - https://www.crohnscolitisfoundation.org
Roadmap
-
- [x] Provider importer script to import data from json files to document db.
-
- [x] Provider a simple python api to extend the functionality of the package.
-
- [ ] Provider exporter script to export data from document db collections to json files.
-
- [ ] Provider exporter script to export all data from document db collections to json files in a given directory.
Setup
-
Create a EC2 instance in the same VPC as your document db
-
Run SSH tunnel to the EC2 instance
ssh -i <path/to/ec2-private-key.pem> -L 27017:<DOCUMENT-DB-SERVER-HOSTNAME>:27017 ec2-user@EC2-INSTANCE-DNS-ENDPOINT -N
keep this command running in a separate terminal window.
-
Create
.env
file with the following variables and set the values.DOCDB_HOST="YOUR_DOCUMENT_DB_HOSTNAME" DOCDB_PORT=YOUR_DOCUMENT_DB_PORT DOCDB_USERNAME="YOUR_DOCUMENT_DB_USERNAME" DOCDB_PASSWORD="YOUR_DOCUMENT_DB_PASSWORD" DOCDB_REPLICA_SET="rs0" DOCDB_READ_PREFERENCE="secondaryPreferred" DOCDB_RETRY_WRITES="false" DOCDB_DBNAME="dbname" DOCDB_IS_TLS_CONNECTION="false" DOCDB_TLS_CA_FILE_PATH="aws/aws-documentdb-ca-global-bundle.pem" DOCDB_TLS_ALLOW_INVALID_HOSTNAMES="false" DOCDB_DIRECT_CONNECTION="false" COLLECTION_NAME=recipe USER_COLLECTION_NAME=user
Uses
-
Import data from a json file to document db
python -m docdb_import_export import \ --env-file=/path/to/.env \ --fromjson=../my-data-folder/my.json \ --db=test \ --collection=temp \ --drop
-
Import data from a json file to document db using custom importer class
python -m docdb_import_export import \ --env-file=/path/to/.env \ --fromjson=../my-data-folder/my.json \ --db=test \ --collection=temp \ --import-class=some-dir/MyCustomImporter.py \ --drop
The importer class filename and classname should be same and importer class should be a subclass of
DocDbDefaultJsonImporter
class and should implement all abstract methods. -
Import data from a directory to document db
python -m docdb_import_export import \ --env-file=/path/to/.env \ --fromjsondir=../my-data-folder/ \ --db=test \ --collection=temp \ --drop
-
Import data from a directory to document db using custom importer class
python -m docdb_import_export import \ --env-file=/path/to/.env \ --fromjsondir=../my-data-folder/ \ --db=test \ --collection=temp \ --import-class=some-dir/MyCustomImporter.py \ --drop
The importer class filename and classname should be same and importer class should be a subclass of
DocDbDefaultJsonImporter
class and should implement all abstract methods.
Providing your own custom importer class
Create a custom importer class that extends DocDbDefaultJsonImporter
class and implement all abstract methods.
src/some-path/MyCustomImporter.py
import json
from dotenv import load_dotenv
from docdb_import_export.docdb_client import DocDbClient
from docdb_import_export.docdb_json_importer import DocDbDefaultJsonImporter
load_dotenv()
class MyCustomImporter(DocDbDefaultJsonImporter):
def __init__(self, source_json_file_path, db_name, collection_name, drop_collection, update):
super().__init__(source_json_file_path, db_name, collection_name, drop_collection, update)
def import_json(self):
# Only add if you want to add support for --drop option.
self.delete_collection()
# Read the json data from the file.
with open(self.source_json_file_path) as f:
json_list = json.load(f)
items = []
for index in json_list:
# Call the transform_item method to transform the json data.
items.append(self.transform_item(json_list[index]))
# Insert the items into DocumentDB.
self.docdb[self.db][self.collection].insert_many(items)
print("Successfully imported json file: " + self.source_json_file_path)
# This method allows you to transform the json data so that you can add or
# remove the fields from the json data.
def transform_item(self, item):
item["_id"] = item["id"]
del item["id"]
# Add more transformations here if you want to.
return item
Example usage:
python -m docdb_import_export import --env-file=src/docdb_import_export/.env --fromjson=../recipe-finder-data/ccf.json --db=test --collection=recipe --import-class=docdb-migration/RecipeImporter.py --drop
This will import the provided json file to the "test" database and "recipe" collection using the custom import class "docdb-migration/RecipeImporter.py". Are you sure you want to continue? [y/N]: y
Importing json file: ../recipe-finder-data/ccf.json
Successfully imported json file: ../recipe-finder-data/ccf.json
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for docdb_import_export-0.6.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c4fca4aa9cefc39a631c05760b95c15d250ebd0a2d61d8ebce0447f55b19ca0 |
|
MD5 | dc6650541aba7908a249c9b9e0c48478 |
|
BLAKE2b-256 | 35945e9626c34d4a20c0e41347a199fd60f14194a436aac989840ebaabc7e2fe |
Hashes for docdb_import_export-0.6.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | faba3f769e6a01c09c470268b756cfea6c4a56bf7315ab65c85f9d6e8a3c9d50 |
|
MD5 | ae11299871480b7852e6d711f79f4b03 |
|
BLAKE2b-256 | 451cd0b40f6751eaff1e4df1b293906ce4f8592ecec876c2561425975771be59 |