A simple utility package to import json files to document db and export data from document db collections.
Project description
Document DB import export
A simple utility package to import json files to document db and export data from document db collections.
https://github.com/msankhala/docdb-import-export
Sponsor
Crohn's & Colitis Foundation - https://www.crohnscolitisfoundation.org
Roadmap
-
- [x] Provider importer script to import data from json files to document db.
-
- [x] Provider a simple python api to extend the functionality of the package.
-
- [ ] Provider exporter script to export data from document db collections to json files.
-
- [ ] Provider exporter script to export all data from document db collections to json files in a given directory.
Setup
-
Create a EC2 instance in the same VPC as your document db
-
Run SSH tunnel to the EC2 instance
ssh -i <path/to/ec2-private-key.pem> -L 27017:<DOCUMENT-DB-SERVER-HOSTNAME>:27017 ec2-user@EC2-INSTANCE-DNS-ENDPOINT -N
keep this command running in a separate terminal window.
-
Create
.envfile with the following variables and set the values.DOCDB_HOST="YOUR_DOCUMENT_DB_HOSTNAME" DOCDB_PORT=YOUR_DOCUMENT_DB_PORT DOCDB_USERNAME="YOUR_DOCUMENT_DB_USERNAME" DOCDB_PASSWORD="YOUR_DOCUMENT_DB_PASSWORD" DOCDB_REPLICA_SET="rs0" DOCDB_READ_PREFERENCE="secondaryPreferred" DOCDB_RETRY_WRITES="false" DOCDB_DBNAME="dbname" DOCDB_IS_TLS_CONNECTION="false" DOCDB_TLS_CA_FILE_PATH="aws/aws-documentdb-ca-global-bundle.pem" DOCDB_TLS_ALLOW_INVALID_HOSTNAMES="false" DOCDB_DIRECT_CONNECTION="false" COLLECTION_NAME=recipe USER_COLLECTION_NAME=user
Uses
-
Import data from a json file to document db
python -m docdb_import_export import \ --env-file=/path/to/.env \ --fromjson=../my-data-folder/my.json \ --db=test \ --collection=temp \ --drop
-
Import data from a json file to document db using custom importer class
python -m docdb_import_export import \ --env-file=/path/to/.env \ --fromjson=../my-data-folder/my.json \ --db=test \ --collection=temp \ --import-class=some-dir/MyCustomImporter.py \ --drop
The importer class filename and classname should be same and importer class should be a subclass of
DocDbDefaultJsonImporterclass and should implement all abstract methods. -
Import data from a directory to document db
python -m docdb_import_export import \ --env-file=/path/to/.env \ --fromjsondir=../my-data-folder/ \ --db=test \ --collection=temp \ --drop
-
Import data from a directory to document db using custom importer class
python -m docdb_import_export import \ --env-file=/path/to/.env \ --fromjsondir=../my-data-folder/ \ --db=test \ --collection=temp \ --import-class=some-dir/MyCustomImporter.py \ --drop
The importer class filename and classname should be same and importer class should be a subclass of
DocDbDefaultJsonImporterclass and should implement all abstract methods.
Providing your own custom importer class
Create a custom importer class that extends DocDbDefaultJsonImporter class and implement all abstract methods.
src/some-path/MyCustomImporter.py
import json
from dotenv import load_dotenv
from docdb_import_export.docdb_client import DocDbClient
from docdb_import_export.docdb_json_importer import DocDbDefaultJsonImporter
load_dotenv()
class MyCustomImporter(DocDbDefaultJsonImporter):
def __init__(self, source_json_file_path, db_name, collection_name, drop_collection, update):
super().__init__(source_json_file_path, db_name, collection_name, drop_collection, update)
def import_json(self):
# Only add if you want to add support for --drop option.
self.delete_collection()
# Read the json data from the file.
with open(self.source_json_file_path) as f:
json_list = json.load(f)
items = []
for index in json_list:
# Call the transform_item method to transform the json data.
items.append(self.transform_item(json_list[index]))
# Insert the items into DocumentDB.
self.docdb[self.db][self.collection].insert_many(items)
print("Successfully imported json file: " + self.source_json_file_path)
# This method allows you to transform the json data so that you can add or
# remove the fields from the json data.
def transform_item(self, item):
item["_id"] = item["id"]
del item["id"]
# Add more transformations here if you want to.
return item
Example usage:
python -m docdb_import_export import --env-file=src/docdb_import_export/.env --fromjson=../recipe-finder-data/ccf.json --db=test --collection=recipe --import-class=docdb-migration/RecipeImporter.py --drop
This will import the provided json file to the "test" database and "recipe" collection using the custom import class "docdb-migration/RecipeImporter.py". Are you sure you want to continue? [y/N]: y
Importing json file: ../recipe-finder-data/ccf.json
Successfully imported json file: ../recipe-finder-data/ccf.json
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docdb_import_export-0.6.2.tar.gz.
File metadata
- Download URL: docdb_import_export-0.6.2.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c4fca4aa9cefc39a631c05760b95c15d250ebd0a2d61d8ebce0447f55b19ca0
|
|
| MD5 |
dc6650541aba7908a249c9b9e0c48478
|
|
| BLAKE2b-256 |
35945e9626c34d4a20c0e41347a199fd60f14194a436aac989840ebaabc7e2fe
|
File details
Details for the file docdb_import_export-0.6.2-py3-none-any.whl.
File metadata
- Download URL: docdb_import_export-0.6.2-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
faba3f769e6a01c09c470268b756cfea6c4a56bf7315ab65c85f9d6e8a3c9d50
|
|
| MD5 |
ae11299871480b7852e6d711f79f4b03
|
|
| BLAKE2b-256 |
451cd0b40f6751eaff1e4df1b293906ce4f8592ecec876c2561425975771be59
|