Skip to main content

A small wrapper for connecting MongoDB collections to Prodigy

Project description

mondigy

Mondigy is a small library for using a Mongodb database as a data loader for Prodigy annotation applications.

Motivation

Prodigy naviely supports loading text data from files and dataset objects, but annotating data that is stored in a MongoDB database requires a custom data loader. With mondigy you can simply write a small config file with your database config and have an easy way to get data from Mongo to Prodigy.

Features

  • Annotate text data from MongoDB
  • Pipe data directly from your MongoDB database to your Prodigy application

Code Example

Let's define a db connection and start annotating data from our MongoDB database!

Step 1. Create a config file. For this example, we'll call it my_db_config.json. This config gets the first 1000 entries that are in_stock from the products collection of our database, in order of decreasing date_added.

my_db_config.json
{
  "host": "my.database.com",
  "user": "mongo_user",
  "password": "mongo_pass",
  "database": "my_db",
  "auth_source": "admin",
  "collection": "products",
  "text_field": "description",
  "other_fields": ["product_name", "product_id"],
  "sort": ["date_added", -1],
  "query": {"in_stock": true},
  "limit": 1000
}

Step 2. Start your Prodigy server and let mondigy point your MongoDB collection at it by supplying the paths of your config file and the Mondigy loader.

prodigy mongo-loader my_db_config.json -F mondigy/loader.py | prodigy ner.manual ner_test en_core_web_sm - --label FEATURE,KEYWORD

Step 3. Annotate!

Installation & Setup

To install Mondigy, simply clone this repo via git clone https://github.com/jdagdelen/mondigy.git.

Mondigy will set up the collections it requires in your mongo database. They are named with a _p.<collection> convention. Don't delete these collections or manually edit any of the documents in them.

License

MIT © John Dagdelen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mondigy-1.0.1.tar.gz (5.4 kB view details)

Uploaded Source

File details

Details for the file mondigy-1.0.1.tar.gz.

File metadata

  • Download URL: mondigy-1.0.1.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/46.0.0.post20200311 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.10

File hashes

Hashes for mondigy-1.0.1.tar.gz
Algorithm Hash digest
SHA256 adea21ba5e6ffb4429e04d1de434412914c281400f7a0f1a38732b758a8e644e
MD5 5eb78213ebaadb6a518f92e93dc5c093
BLAKE2b-256 8b708f41116e1adacc49e826660cf717ec821aba507b32b4b2bcfb0541a9188a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page