Skip to main content

Singer.io tap for extracting data from MongoDB - Pipelinewise compatible

Project description

pipelinewise-tap-mongodb

This is a Singer tap that produces JSON-formatted data following the Singer spec from a MongoDB source.

Set up local dev environment:

make setup

Activate virtual environment

. venv/bin/activate

Set up Config file

Create json file called config.json, with the following contents:

{
  "password": "<password>",
  "user": "<username>",
  "host": "<host ip address>",
  "port": "<port>",
  "auth_database": "<database name to authenticate on>",
  "database": "<database name to sync from>"
}

The following parameters are optional for your config file:

Name Type Description
replica_set string name of replica set
ssl Boolean can be set to true to connect using ssl, default false
verify_mode Boolean Default SSL verify mode, default true
include_schemas_in_destination_stream_name Boolean forces the stream names to take the form <database_name>-<collection_name> instead of <collection_name>, default false

All of the above attributes are required by the tap to connect to your mongo instance. here is a sample configuration file.

Run in discovery mode

Run the following command and redirect the output into the catalog file

tap-mongodb --config ~/config.json --discover > ~/catalog.json

Your catalog file should now look like this:

{
  "streams": [
    {
      "table_name": "<table name>",
      "tap_stream_id": "<tap_stream_id>",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "row-count":<int>,
            "is-view": <bool>,
            "database-name": "<database name>",
            "table-key-properties": [
              "_id"
            ],
            "valid-replication-keys": [
              "_id"
            ]
          }
        }
      ],
      "stream": "<stream name>",
      "schema": {
        "type": "object"
      }
    }
  ]
}

Edit Catalog file

Using valid json, edit the config.json file

To select a stream, enter the following to the stream's metadata:

"selected": true,
"replication-method": "<replication method>",

<replication-method> must be either FULL_TABLE, INCREMENTAL or LOG_BASED, if it's INCREMENTAL, make sure to add a "replication-key".

For example, if you were to edit the example stream to select the stream as well as add a projection, config.json should look this:

{
  "streams": [
    {
      "table_name": "<table name>",
      "tap_stream_id": "<tap_stream_id>",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "row-count": <int>,
            "is-view": <bool>,
            "database-name": "<database name>",
            "table-key-properties": [
              "_id"
            ],
            "valid-replication-keys": [
              "_id"
            ],
            "selected": true,
            "replication-method": "<replication method>"
          }
        }
      ],
      "stream": "<stream name>",
      "schema": {
        "type": "object"
      }
    }
  ]
}

Run in sync mode:

tap-mongodb --config ~/config.json --catalog ~/catalog.json

The tap will write bookmarks to stdout which can be captured and passed as an optional --state state.json parameter to the tap for the next sync.

Logging configuration

The tap uses a predefined logging config if none is provided, however, you can set your own config by setting the environment variable LOGGING_CONFIG_FILE as the path to the logging config. A sample config is available here.


Copyright © 2020 TransferWise

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelinewise-tap-mongodb-1.0.1.tar.gz (15.1 kB view hashes)

Uploaded Source

Built Distribution

pipelinewise_tap_mongodb-1.0.1-py3-none-any.whl (30.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page