Skip to main content

Singer.io tap for extracting data from MongoDB - Datazip compatible

Project description

as-mongodb

This is a Singer tap that produces JSON-formatted data following the Singer spec from a MongoDB source.

Set up local dev environment:

make setup

Activate virtual environment

. venv/bin/activate

Set up Config file

Create json file called config.json, with the following contents:

{
  "password": "<password>",
  "user": "<username>",
  "host": "<host ip address>",
  "auth_database": "<database name to authenticate on>",
  "database": "<database name to sync from>"
}

The following parameters are optional for your config file:

Name Type Default value Description
srv Boolean false uses a mongodb+srv protocol to connect. Disables the usage of port argument if set to True
port Integer false Connection port. Required if a non-srv connection is being used.
replica_set string null name of replica set
ssl Boolean false can be set to true to connect using ssl
verify_mode Boolean true Default SSL verify mode
include_schemas_in_destination_stream_name Boolean false forces the stream names to take the form <database_name>-<collection_name> instead of <collection_name>
update_buffer_size int 1 [LOG_BASED] The size of the buffer that holds detected update operations in memory, the buffer is flushed once the size is reached
await_time_ms int 1000 [LOG_BASED] The maximum amount of time in milliseconds the loge_base method waits for new data changes before exiting.
full_load_on_empty_state bool false [LOG_BASED] A flag which forces full load when no previous token is found in state.

All of the above attributes are required by the tap to connect to your mongo instance. here is a sample configuration file.

Run in discovery mode

Run the following command and redirect the output into the catalog file

as-mongodb --config ~/config.json --discover > ~/catalog.json

Your catalog file should now look like this:

{
  "streams": [
    {
      "table_name": "<table name>",
      "tap_stream_id": "<tap_stream_id>",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "row-count":<int>,
            "is-view": <bool>,
            "database-name": "<database name>",
            "table-key-properties": [
              "_id"
            ],
            "valid-replication-keys": [
              "_id"
            ]
          }
        }
      ],
      "stream": "<stream name>",
      "schema": {
        "type": "object"
      }
    }
  ]
}

Edit Catalog file

Using valid json, edit the config.json file

To select a stream, enter the following to the stream's metadata:

"selected": true,
"replication-method": "<replication method>",

<replication-method> must be either FULL_TABLE, INCREMENTAL or LOG_BASED, if it's INCREMENTAL, make sure to add a "replication-key".

For example, if you were to edit the example stream to select the stream as well as add a projection, config.json should look this:

{
  "streams": [
    {
      "table_name": "<table name>",
      "tap_stream_id": "<tap_stream_id>",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "row-count": <int>,
            "is-view": <bool>,
            "database-name": "<database name>",
            "table-key-properties": [
              "_id"
            ],
            "valid-replication-keys": [
              "_id"
            ],
            "selected": true,
            "replication-method": "<replication method>"
          }
        }
      ],
      "stream": "<stream name>",
      "schema": {
        "type": "object"
      }
    }
  ]
}

Run in sync mode:

as-mongodb --config ~/config.json --catalog ~/catalog.json

The tap will write bookmarks to stdout which can be captured and passed as an optional --state state.json parameter to the tap for the next sync.

Logging configuration

The tap uses a predefined logging config if none is provided, however, you can set your own config by setting the environment variable LOGGING_CONFIG_FILE as the path to the logging config. A sample config is available here.


Copyright © 2020 TransferWise

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

as-mongodb-1.4.1.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

as_mongodb-1.4.1-py3-none-any.whl (52.0 kB view details)

Uploaded Python 3

File details

Details for the file as-mongodb-1.4.1.tar.gz.

File metadata

  • Download URL: as-mongodb-1.4.1.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for as-mongodb-1.4.1.tar.gz
Algorithm Hash digest
SHA256 a0eb0dc48174531373989456242d63e97a4915a5b686ea1cd9ce572146a59ec2
MD5 f0408aa3ce5916e61e57e49ef36e7c25
BLAKE2b-256 6443753d2ee1a3e3d93a8185c4748441e06f51158f8f0c09681014ecb31089b5

See more details on using hashes here.

File details

Details for the file as_mongodb-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: as_mongodb-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 52.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for as_mongodb-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 81be1708c70a4630a6ff0d92130a8ffe31763ac56f5c0f9a3f2c26787bdd9c72
MD5 d9a9f0e772163d9d7d68c8f0261534a7
BLAKE2b-256 b6d56572ce4c664ef4e5fa4f7360c55f184c35099ee78affde9263afb48f1140

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page