Skip to main content

data annotation platform for siamese models

Project description

PyPi Build Status Supported Python Version Docs

Quesdadiya is a data annotation project management platform where you can manage a project through Command Line Interface (CLI) and annotate data on Web GUI to generate a triplet data set for developing Siamese models.

Quickstart

Installation

You can install quesadiya by running

$ pip install quesadiya

Check installation by

$ quesadiya

Installation from Source

  1. git clone this repo.

  2. cd quesadiya.

  3. run pip install ..

  4. check installation by running quesadiya on your terminal.

Project Management

Quesadiya provides the command-line interface (CLI) to manage data annotation projects.

Create Project

You can create a data annotation project by

$ quesadiya create <project_name> <admin_name> <datapath> [OPTIONS]

For example,

$ quesadiya create queso me data/sample_triplets.jsonl
Loading input data: 5 row [00:00, 1495.40 row/s]
Admin password:
Repeat for confirmation:
Inserting data. This may take a while...
Finish creating a new project 'queso'

Caution: <datapath> must be a jsonline file, where each row must follow the format below:

{
  "anchor_sample_id": "string (max 100 char)",
  "anchor_sample_text": "list of text", // each element is a paragraph
  "anchor_sample_title": "text (nullable)",
  "candidate_group_id": "string (max 100 char)",
  "candidates": [
    "item": {
      "candidate_sample_id": "string (max 100 char)",
      "candidate_sample_text": "list of text", // each element is a paragraph
      "candidate_sample_title": "text (nullable)"
    }
  ]
}

anchor is the sample you want to compare to the positive sample and the negative sample. candidates is a list of candidates for a positive and a negative sample. The sample collaborator selects is recorded as a positive sample and quesadiya chooses a negative sample from the rest.

Tips: You can add collaborators from a jsonline file when you create a project by

$ quesadiya create queso me data/triplets.jsonl -a data/sample_collaborators1.jsonl

You can view sample data here.

Note that <collaborator_path> must be a jsonline file, where each row must follow the format below:

{
  'name': "string (max 150 char)",
  'password': "string (max 128 char)",
  'contact': "string (max 254 char)"
}

See Command Line Interface Guide for more details.

Run Project

You can annotate a data set by running quesadiya:

$ quesadiya run [OPTION]

You can specify the port number to run the quesadiya server by option. For example,

$ quesadiya run -p 4000

Quesadiya’s default port number is 1133.

Once you run a project, open your browser and access http://localhost:1133/.

Then, select a project and type admin name and password.

This leads you to the admin page. In the admin page, you can do the followings:
  • view discarded samples

  • view progress of each collaborator

  • edit collaborators

Tips: Admin user cannot annotate data. If you’re the admin and like to annotate samples, make a collaborator account for yourself and login with the account.

See Admin Guide for more details.

Data Annotation

Data annotation is very simple and intuitive in Quesadiya. Anchor text is shown on the left hand side of the screen and Candidates are on the right. Collaborators can either select positive sample among candidates or discard a sample if the sample is corrupted for some reason. Admin can view discarded samples and push a sample back to the project in the admin page.

Export Data

You can export a snapshot of annotated data set by

$ quesadiya export <project_name> <output_path>

The output path must be a jsonline file. Each row follows the format below:

{
  "anchor_sample_id": "text",
  "positive_sample_id": "text",
  "negative_sample_id": "text"
}

Note that this operation requires the admin privilege.

The operation above only generates a triplet data set with samples ids. If you’d like to include text for each sample, add -i option. For example,

$ quesadiya export queso data.jsonl -i

This will generate a jsonline file, where each row follows:

{
    "anchor_sample_id": "text",
    "positive_sample_id": "text",
    "negative_sample_id": "text",
    "anchor_sample_text": "list of text" // each element is a paragraph,
    "positive_sample_text": "list of text",
    "negative_sample_text": "list of text"
}

Security

A disclaimer: Quesadiya and its contributors take no responsibility for protecting your data.

That said, we encrypt all passwords using argon2.

If you’d like to prohibit any other user on your environment from accessing your data, we encourage you to change the accessibility of project folder. You can see the path to the quesadiya root by

$ quesadiya path

This command shows the absolute path to quesadiya project folder. Go to the directory, and you’ll find your project folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quesadiya-1.0.tar.gz (5.4 MB view details)

Uploaded Source

File details

Details for the file quesadiya-1.0.tar.gz.

File metadata

  • Download URL: quesadiya-1.0.tar.gz
  • Upload date:
  • Size: 5.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for quesadiya-1.0.tar.gz
Algorithm Hash digest
SHA256 cf613d9afd7dd868810a359bab7538e66dcde759448512be1773207ac84b9c09
MD5 f3249141063f3cf4ad43fbcb3029ae42
BLAKE2b-256 ce937a74a814664815cd78fb9764259d50933337bb254121f3c7b2c044856db7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page