Skip to main content

No project description provided

Project description

library.qai.utilities

v2! A REST server and helper functions and classes for interacting with the rest of the Qordoba platform.

See GitHub history for 1.x docs.

Changes is 2.4.0

  • We now enforce if QRest.batching == True: QRest.workers = 1
  • If you don't specify QRest.workers on instantiating, before setting workers=cpu_count we first check the configs, and use the value "WORKER_COUNT" @yakivy

Upgrading to v2.3.5

Dependant services must include nltk==3.4.5 in requirements.txt (setup.py bug).

Docker file of the dependant service must include the line (assure printing payload does not cause encoding exceptions):

ENV PYTHONIOENCODING=utf-8

Dependant services can pass to QRest extra flags:

  • debug (default: False) - if True don't escape exceptions (don't use in prod)
  • batching (default: False) - if True pass array of segments to qallback instead of consecutive qallback calls
  • verbose (default: False) - if True print full output
  • ignore_html (default: True) - if batching=False ignore sentences with html tags
  • sentence_token_limit (default: 1024) - if batching=False ignore sentences longer than 1024 words/tokens

Upgrading to v2

Upgrading to v2 does require a few changes. Some noteable one:

  • get_config will break. Sorry, you have to deal with that yourself
  • QConnect is gone, and QRest is imported more explicitly
  • There is no more qai Docker image. You are free to use whatever base image you want. May I recommend qsam/spacy_alpine.
  • qai is now a pip dependency, so must be in your requirements.txt

However! There's help. Follow these steps:

cd qai_v1_service
vactivate  # or however you go into a virtualenv
pip uninstall -y qai
# uninstalls old qai
pip install qai
# installs qai from PyPi
python -m qai.upgrade .
# shows you how it would change your files to make the project ready for v2
# n to reject, y to accept

Now all that remains is seeing if you use get_configs, and if so: pass get_configs an absolute path (v2), instead of a relative path split into a list (v1).

Note: I went a bit fast, and long story short, versions 2.0.x and 2.1.x are not salvageable. Just use 2.2.0+.

Things to know

See The Changelog for details.

Required "conventions"

All projects must have a config.json, and that config must specify SUPPORTED_LANGUAGES, which is either a string or list of strings, of the form "en" or ["en", "de", "zh"] (the prefix of the ISO code). QAI will not let your service start unless it thinks you have a valid SUPPORTED_LANGUAGES field. By default, QAI will look for this in conf/config.json. This is overridable. Here is the minimal config:

{
  "SUPPORTED_LANGUAGES": "en"
}

You can specify the service name in the config file with

{
  "SUPPORTED_LANGUAGES": "en",
  "SERVICE_NAME": "hey look at me service",
}

To change the config path to, for example, ./my_config_dir/a_sub_dir/my_wacky_config.json:

QRest(analyzer,
         category='service name, e.g. formality',
         white_lister=white_lister,
         config_path=['my_config_dir', 'a_sub_dir', 'my_wacky_config.json'])

By default, QAI sends no-issues-response whenever a call to dependant library fails. To turn on the debug mode:

QRest(analyzer,
         category='service name, e.g. formality',
         white_lister=white_lister,
         debug=True)

Print out input segments:

QRest(analyzer,
         category='service name, e.g. formality',
         white_lister=white_lister,
         verbose = True)
  • verbose (default: False) - print full output

To process batches instead of looping over segments (send by a mediator):

QRest(analyzer,
         category='service name, e.g. formality',
         white_lister=white_lister,
         batching=True)

Important: QAI does not define batch size, if batching enabled, just passes the entire mediator input. In order to change mediator batch size look for segmentDelegator.read.batchSize in config/application.conf of the dependant service.

To customize input filters:

QRest(analyzer,
         category='service name, e.g. formality',
         white_lister=white_lister,
         batching=False,
         sentence_token_limit=1024,
         ignore_html=True)
  • ignore_html (default: True) - if batching=False ignore sentences with html tags
  • sentence_token_limit (default: 1024) - if batching=False ignore sentences longer than 1024 words/tokens

Usage

You can explicitly create a REST connection like this:

from app import Analyzer, whitelist

from qai.qconnect.qrest import QRest


SERVICE_NAME = 'service_name'
host = '0.0.0.0'
port = 5000


if __name__ == '__main__':
    analyzer = Analyzer()
    rest_connection = QRest(analyzer,
                            category=category,
                            white_lister=white_lister,
                            host=host,
                            port=port)
    # create a blocking connection:
    rest_connection.connect()

The above will create as many workers as you have cores. This is great, unless you are using AutoML. There is a known bug where AutoML crashes if you are using more than one worker.

So if you're using AutoML, the above would look like:

from app import Analyzer, whitelist

from qai.qconnect.qrest import QRest


SERVICE_NAME = 'service_name'
host = '0.0.0.0'
port = 5000
workers = 1


if __name__ == '__main__':
    analyzer = Analyzer()
    rest_connection = QRest(analyzer,
                            category=category,
                            white_lister=white_lister,
                            host=host,
                            port=port,
                            workers=workers)
    # create a blocking connection:
    rest_connection.connect()

There is also a helper class for turning spaCy Spans into issues the rest of the platform can process:

from spacy.tokens import Span
from app.factor import SpacyFactor


SOV = SpacyFactor(
    "subject_object_verb_spacing",
    "Keep the subject, verb, and object of a sentence close together to help the reader understand the sentence."
)

Span.set_extension("score", default=0)
Span.set_extension("suggestions", default=[])

doc = nlp("Holders of the Class A and Class B-1 certificates will be entitled to receive on each Payment Date, to the extent monies are available therefor (but not more than the Class A Certificate Balance or Class B-1 Certificate Balance then outstanding), a distribution.")
score = analyze(doc)
if score is not None:
    span = Span(doc, 0, len(doc))  # or whichever TOKENS are the issue (don't have to worry about character indexes)
    span._.score = score
    span._.suggestions = get_suggestions(doc)
    issues = SOV(span)

Installation

pip install qai

Testing

See Confluence for docs on input format expectations.

scripts/test_qai.sh has some helpful testing functions.

Development

Source of truth is VERSION file, read by setup.py and Jenkinsfile. When you run python setup.py sdist/bdist, this creates qai/version.py, which is read in qai/__init__.py. This was done for reasons having to do with python's module system being frustrating. It allows one to not have to know the absolute path of a file at runtime, which is a big bonus in Python. Anyway, that means VERSION is the source of truth.

CI/CD

Jenkins will push to PyPi when you build master or v2 branch. It also might automatically build v2 branch on git push, testing that now.

To get Jenkins to build this, we had to throw it in Docker... so the Jenkinsfile calls the Dockerfile which calls the release script... It's a house of cards, but seems to work.

License

This software is not licensed. If you do not work at Qordoba, you are not legally allowed to use it. Also, it's just helper functions that really won't help you. If something in it does look interesting, and you would like access, open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qai-2.5.3.tar.gz (19.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page