No project description provided
Project description
library.qai.utilities
A REST server and helper functions and classes for interacting with the rest of the Qordoba platform.
See GitHub history for older docs.
Changes in 3.0.0
Remove code for processing older pipeline formats. We now only process the new style, which has chain
.
Changes in 2.4.0
- We now enforce
if QRest.batching == True: QRest.workers = 1
- If you don't specify
QRest.workers
on instantiating, before settingworkers=cpu_count
we first check the configs, and use the value"WORKER_COUNT"
@yakivy
Upgrading to v2.3.5
Dependant services must include nltk==3.4.5
in requirements.txt
(setup.py
bug).
Docker file of the dependant service must include the line (assure printing payload does not cause encoding exceptions):
ENV PYTHONIOENCODING=utf-8
Dependant services can pass to QRest
extra flags:
debug
(default:False
) - ifTrue
don't escape exceptions (don't use in prod)batching
(default:False
) - ifTrue
pass array of segments to qallback instead of consecutive qallback callsverbose
(default:False
) - ifTrue
print full outputignore_html
(default:True
) - ifbatching=False
ignore sentences with html tagssentence_token_limit
(default:1024
) - ifbatching=False
ignore sentences longer than 1024 words/tokens
Upgrading to v2
Upgrading to v2 does require a few changes. Some noteable one:
get_config
will break. Sorry, you have to deal with that yourselfQConnect
is gone, andQRest
is imported more explicitly- There is no more
qai
Docker image. You are free to use whatever base image you want. May I recommendqsam/spacy_alpine
. qai
is now a pip dependency, so must be in yourrequirements.txt
However! There's help. Follow these steps:
cd qai_v1_service
vactivate # or however you go into a virtualenv
pip uninstall -y qai
# uninstalls old qai
pip install qai
# installs qai from PyPi
python -m qai.upgrade .
# shows you how it would change your files to make the project ready for v2
# n to reject, y to accept
Now all that remains is seeing if you use get_configs
, and if so: pass get_configs
an absolute path (v2), instead of a relative path split into a list (v1).
Note: I went a bit fast, and long story short, versions 2.0.x and 2.1.x are not salvageable. Just use 2.2.0+.
Things to know
See The Changelog for details.
Required "conventions"
All projects must have a config.json
, and that config must specify SUPPORTED_LANGUAGES
, which is either a string or list of strings, of the form "en"
or ["en", "de", "zh"]
(the prefix of the ISO code). QAI will not let your service start unless it thinks you have a valid SUPPORTED_LANGUAGES
field. By default, QAI will look for this in conf/config.json
. This is overridable. Here is the minimal config:
{
"SUPPORTED_LANGUAGES": "en"
}
You can specify the service name in the config file with
{
"SUPPORTED_LANGUAGES": "en",
"SERVICE_NAME": "hey look at me service",
}
To change the config path to, for example, ./my_config_dir/a_sub_dir/my_wacky_config.json
:
QRest(analyzer,
category='service name, e.g. formality',
white_lister=white_lister,
config_path=['my_config_dir', 'a_sub_dir', 'my_wacky_config.json'])
By default, QAI sends no-issues-response
whenever a call to dependant library fails. To turn on the debug
mode:
QRest(analyzer,
category='service name, e.g. formality',
white_lister=white_lister,
debug=True)
Print out input segments:
QRest(analyzer,
category='service name, e.g. formality',
white_lister=white_lister,
verbose = True)
verbose
(default:False
) - print full output
To process batches instead of looping over segments (send by a mediator):
QRest(analyzer,
category='service name, e.g. formality',
white_lister=white_lister,
batching=True)
Important: QAI does not define batch size, if batching enabled, just passes the entire mediator input. In order to change mediator batch size look for segmentDelegator.read.batchSize
in config/application.conf
of the dependant service.
To customize input filters:
QRest(analyzer,
category='service name, e.g. formality',
white_lister=white_lister,
batching=False,
sentence_token_limit=1024,
ignore_html=True)
ignore_html
(default:True
) - ifbatching=False
ignore sentences with html tagssentence_token_limit
(default:1024
) - ifbatching=False
ignore sentences longer than 1024 words/tokens
Usage
You can explicitly create a REST connection like this:
from app import Analyzer, whitelist
from qai.qconnect.qrest import QRest
SERVICE_NAME = 'service_name'
host = '0.0.0.0'
port = 5000
if __name__ == '__main__':
analyzer = Analyzer()
rest_connection = QRest(analyzer,
category=category,
white_lister=white_lister,
host=host,
port=port)
# create a blocking connection:
rest_connection.connect()
The above will create as many workers as you have cores. This is great, unless you are using AutoML. There is a known bug where AutoML crashes if you are using more than one worker.
So if you're using AutoML, the above would look like:
from app import Analyzer, whitelist
from qai.qconnect.qrest import QRest
SERVICE_NAME = 'service_name'
host = '0.0.0.0'
port = 5000
workers = 1
if __name__ == '__main__':
analyzer = Analyzer()
rest_connection = QRest(analyzer,
category=category,
white_lister=white_lister,
host=host,
port=port,
workers=workers)
# create a blocking connection:
rest_connection.connect()
There is also a helper class for turning spaCy Span
s into issues the rest of the platform can process:
from spacy.tokens import Span
from app.factor import SpacyFactor
SOV = SpacyFactor(
"subject_object_verb_spacing",
"Keep the subject, verb, and object of a sentence close together to help the reader understand the sentence."
)
Span.set_extension("score", default=0)
Span.set_extension("suggestions", default=[])
doc = nlp("Holders of the Class A and Class B-1 certificates will be entitled to receive on each Payment Date, to the extent monies are available therefor (but not more than the Class A Certificate Balance or Class B-1 Certificate Balance then outstanding), a distribution.")
score = analyze(doc)
if score is not None:
span = Span(doc, 0, len(doc)) # or whichever TOKENS are the issue (don't have to worry about character indexes)
span._.score = score
span._.suggestions = get_suggestions(doc)
issues = SOV(span)
Installation
pip install qai
Testing
See Confluence for docs on input format expectations.
scripts/test_qai.sh
has some helpful testing functions.
Development
Source of truth is VERSION
file, read by setup.py
and Jenkinsfile
. When you run python setup.py sdist/bdist
, this creates qai/version.py
, which is read in qai/__init__.py
. This was done for reasons having to do with python's module system being frustrating. It allows one to not have to know the absolute path of a file at runtime, which is a big bonus in Python. Anyway, that means VERSION
is the source of truth.
CI/CD
Jenkins will push to PyPi when you build master
or v2
branch. It also might automatically build v2 branch on git push, testing that now.
To get Jenkins to build this, we had to throw it in Docker... so the Jenkinsfile calls the Dockerfile which calls the release script... It's a house of cards, but seems to work.
License
This software is not licensed. If you do not work at Qordoba, you are not legally allowed to use it. Also, it's just helper functions that really won't help you. If something in it does look interesting, and you would like access, open an issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.