Singer.io tap for extracting data from MongoDB - Datazip compatible
Project description
dz-tap-mongodb
This is a Singer tap that produces JSON-formatted data following the Singer spec from a MongoDB source.
Set up local dev environment:
make setup
Activate virtual environment
. venv/bin/activate
Set up Config file
Create json file called config.json
, with the following contents:
{
"password": "<password>",
"user": "<username>",
"host": "<host ip address>",
"auth_database": "<database name to authenticate on>",
"database": "<database name to sync from>"
}
The following parameters are optional for your config file:
Name | Type | Default value | Description |
---|---|---|---|
srv |
Boolean | false | uses a mongodb+srv protocol to connect. Disables the usage of port argument if set to True |
port |
Integer | false | Connection port. Required if a non-srv connection is being used. |
replica_set |
string | null | name of replica set |
ssl |
Boolean | false | can be set to true to connect using ssl |
verify_mode |
Boolean | true | Default SSL verify mode |
include_schemas_in_destination_stream_name |
Boolean | false | forces the stream names to take the form <database_name>-<collection_name> instead of <collection_name> |
update_buffer_size |
int | 1 | [LOG_BASED] The size of the buffer that holds detected update operations in memory, the buffer is flushed once the size is reached |
await_time_ms |
int | 1000 | [LOG_BASED] The maximum amount of time in milliseconds the loge_base method waits for new data changes before exiting. |
full_load_on_empty_state |
bool | false | [LOG_BASED] A flag which forces full load when no previous token is found in state. |
All of the above attributes are required by the tap to connect to your mongo instance. here is a sample configuration file.
Run in discovery mode
Run the following command and redirect the output into the catalog file
dz-tap-mongodb --config ~/config.json --discover > ~/catalog.json
Your catalog file should now look like this:
{
"streams": [
{
"table_name": "<table name>",
"tap_stream_id": "<tap_stream_id>",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"row-count":<int>,
"is-view": <bool>,
"database-name": "<database name>",
"table-key-properties": [
"_id"
],
"valid-replication-keys": [
"_id"
]
}
}
],
"stream": "<stream name>",
"schema": {
"type": "object"
}
}
]
}
Edit Catalog file
Using valid json, edit the config.json file
To select a stream, enter the following to the stream's metadata:
"selected": true,
"replication-method": "<replication method>",
<replication-method>
must be either FULL_TABLE
, INCREMENTAL
or LOG_BASED
, if it's INCREMENTAL
, make sure to add a "replication-key"
.
For example, if you were to edit the example stream to select the stream as well as add a projection, config.json should look this:
{
"streams": [
{
"table_name": "<table name>",
"tap_stream_id": "<tap_stream_id>",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"row-count": <int>,
"is-view": <bool>,
"database-name": "<database name>",
"table-key-properties": [
"_id"
],
"valid-replication-keys": [
"_id"
],
"selected": true,
"replication-method": "<replication method>"
}
}
],
"stream": "<stream name>",
"schema": {
"type": "object"
}
}
]
}
Run in sync mode:
dz-tap-mongodb --config ~/config.json --catalog ~/catalog.json
The tap will write bookmarks to stdout which can be captured and passed as an optional --state state.json
parameter to the tap for the next sync.
Logging configuration
The tap uses a predefined logging config if none is provided, however, you can set your own config by setting the environment variable LOGGING_CONFIG_FILE
as the path to the logging config.
A sample config is available here.
Copyright © 2020 TransferWise
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dz-tap-mongodb-1.4.1.tar.gz
.
File metadata
- Download URL: dz-tap-mongodb-1.4.1.tar.gz
- Upload date:
- Size: 27.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4e80f274fd4783d1b102f955b97f1ca4e7c13fd73700f960c0ee6c387c7f60a |
|
MD5 | 5835f5923af1564c9045135dca862da4 |
|
BLAKE2b-256 | 31c9c077e66e16329dd7cafe4819d8ac14dbfd47f3fd5defdbfe7b326eec286b |