Skip to main content

A small wrapper for connecting MongoDB collections to Prodigy

Project description

mondigy

Mondigy is a small library for using a Mongodb database as a data loader for Prodigy annotation applications.

Motivation

Prodigy naviely supports loading text data from files and dataset objects, but annotating data that is stored in a MongoDB database is not natively supported.

With mondigy you can annotate data from a MongoDB collection and store your annotations in a MongoDB database.

Features

  • Annotate text data from MongoDB
  • Pipe data directly from your MongoDB database to your Prodigy application

Installation & Setup

Mondigy can be installed via pip install mondigy or by cloning this repo and running python setup.py in the project root.

Mondigy will set up the collections it requires in your mongo database. They are named with a _p.<collection_name>convention. Don't delete these collections or manually edit any of the documents in them.

To set up mondigy, just enter your MongoDB connection info into your prodigy.json config file, which is found in your PRODIGY_HOME directory. The source database and annotations database (where your completed annotations are stored by Prodigy) can be configured independently or the same database can be specified for both if you want everything in the same db. See /example_config/prodigy.json for an example config file.

Code Example

Let's define a db connection and start annotating data from our MongoDB database!

Step 1. Add configuration parameters to prodigy.json in your PRODIGY_HOME directory. For this example, we'll be limiting our annotations to the 1000 entries that are in_stock from the products collection of our database. We'll also include the product name and product id in the data returned to Prodigy so we can include that information in a custom view. .

my_db_config.json
  ...
  "db": "mondigy.db",
  "db_settings": {
    "mongodb": {
      "source_db": {
        "host": "my.database.com",
        "user": "mongo_user",
        "password": "mongo_pass",
        "database": "my_db",
        "auth_source": "admin",
        "collection": "products",
        "text_field": "description",
        "other_fields": ["product_name", "product_id"],
        "query": {"in_stock": true},
        "limit": 1000
      },
      "annotations_db": {
        "host": "my.database.com",
        "user": "mongo_user",
        "password": "mongo_pass",
        "database": "my_db",
        "auth_source": "admin",
      }
    }
  },
  ...
}

Step 2. Start your Prodigy server and let mondigy point your MongoDB collection at it by supplying the paths of your config file and the Mondigy loader.

prodigy ner.manual my_ner_task en_core_web_sm - --label FEATURE,KEYWORD

Step 3. Annotate!

License

MIT © John Dagdelen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mondigy-1.0.2.tar.gz (5.7 kB view details)

Uploaded Source

File details

Details for the file mondigy-1.0.2.tar.gz.

File metadata

  • Download URL: mondigy-1.0.2.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/46.0.0.post20200311 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.10

File hashes

Hashes for mondigy-1.0.2.tar.gz
Algorithm Hash digest
SHA256 b0c3cb9a098232e62a962914c9ce4c26767de02a4f82eec92e732aaa748d0a2f
MD5 705ec52e50339dc9ff5c7277c9a31937
BLAKE2b-256 e3963d8c643764e2a3007f6de39a1cd72f61230d9c6045c8e8fcf9289441ee65

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page