Official Crowlingo SDK. Access to all NLP and NLU services that analyze texts regardless of the language.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PyCrowlingo: Python SDK for Crowlingo APIs

Here is the official Python client for Crowlingo. Access to all NLP and NLU services that analyze texts regardless of the language.

Installation

You can use pip to install the library:

$ pip install PyCrowlingo

Alternatively, you can just clone the repository and run the setup.py script:

$ python setup.py install

Usage

First of all, you will need to instantiate a client of Crowlingo. You can do it using your API token:

from PyCrowlingo import Client
client = Client('<TOKEN>')

Or using your account credentials:

from PyCrowlingo import Client
client = Client(username='<EMAIL>', password='<PASSWORD>')

QuickStart

You can call all the endpoints available on Crowlingo. All of them are detailed with examples on the documentation.

text = "Est-il recommandé d'utiliser MongoDb pour indexer mes documents ?"
res = client.languages.detect(text)
print(res)
# => Detect(sentences=[Sentence(start=0, end=65, languages_confidence=[ConfidenceLang(name='French', code='fr', confidence=98.0)], text="Est-il recommandé d'utiliser MongoDb pour indexer mes documents ?")], languages_confidence=[ConfidenceLang(name='French', code='fr', confidence=98.0)])

The response will be Pydantic object. So, you can get the values with the response's attributes:

print(client.languages.detect(text).languages_confidence)
# =>  '[ConfidenceLang(name='French', code='fr', confidence=98.0)]'

Pipeline

If you need to analyze texts through different services, it can be cumbersome to call the API for every step of processing. Gain some speed and productivity by using a Pipeline. It allows you to create a workflow of processing for your data. To do so, you have to use the ApiModels instead of the client function.

from PyCrowlingo import Pipeline
from PyCrowlingo.ApiModels import *
text = "On 26 April 1986, Chernobyl suffered the world’s worst nuclear disaster. An experiment designed to test the safety of the power plant went wrong and caused a fire which spewed radiation for 10 days. Clouds carrying radioactive particles drifted for thousands of miles, releasing toxic rain all over Europe. Those living close to Chernobyl - about 116,000 people - were immediately evacuated. A 30 km exclusion zone was imposed around the damaged reactor. This was later expanded to cover more affected areas."
pipeline = Pipeline(client, text=text) 
# Put the client on the pipeline and the common variables using keywords arguments
pipeline.add(Concepts.Extract, precision=0.9).add(Entities.Extract, visualize=True).add(Entities.Duckling)
# Add each step using pipeline.add(EndpointModel, *individuals arguments)
res = pipeline.call()
# Execute the pipeline
print(res)
# => responses={'[POST] /entities/duckling': {'duckling': [{'body': 'On 26 April 1986', 'start': 0, 'value': {'values': [{'value': '1986-04-26T00:00:00.000-08:00', 'grain': 'day', 'type': 'value'}], 'value': '1986-04-26T00:00:00.000-08:00', 'grain': 'day', 'type': 'value'}, 'end': 16, 'dim': 'time', 'latent': False}, {'body': '10 days', 'start': 190, 'value': {'value': 10, 'day': 10, 'type': 'value', 'unit': 'day', 'normalized': {'value': 864000, 'unit': 'second'}}, 'end': 197, 'dim': 'duration', 'latent': False}, {'body': 'thousands', 'start': 249, 'value': {'value': 1000, 'type': 'value'}, 'end': 258, 'dim': 'number', 'latent': False}, {'body': '116,000', 'start': 347, 'value': {'value': 116000, 'type': 'value'}, 'end': 354, 'dim': 'number', 'latent': False}, {'body': 'immediately', 'start': 369, 'value': {'values': [{'value': '2020-05-25T15:57:30.724-07:00', 'grain': 'second', 'type': 'value'}], 'value': '2020-05-25T15:57:30.724-07:00', 'grain': 'second', 'type': 'value'}, 'end': 380, 'dim': 'time', 'latent': False}, {'body': '30 km', 'start': 394, 'value': {'value': 30, 'type': 'value', 'unit': 'kilometre'}, 'end': 399, 'dim': 'distance', 'latent': False}]}, '[POST] /entities/extract': {'entities': [{'start': 3, 'end': 16, 'ent_type': 'DATE', 'text': '26 April 1986'}, {'start': 18, 'end': 27, 'ent_type': 'GPE', 'text': 'Chernobyl'}, {'start': 190, 'end': 197, 'ent_type': 'DATE', 'text': '10 days'}, {'start': 249, 'end': 267, 'ent_type': 'QUANTITY', 'text': 'thousands of miles'}, {'start': 299, 'end': 305, 'ent_type': 'LOC', 'text': 'Europe'}, {'start': 329, 'end': 338, 'ent_type': 'GPE', 'text': 'Chernobyl'}, {'start': 341, 'end': 354, 'ent_type': 'CARDINAL', 'text': 'about 116,000'}, {'start': 394, 'end': 399, 'ent_type': 'QUANTITY', 'text': '30 km'}], 'visualization': '<div class="entities" style="line-height: 2.5; direction: ltr">On \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    26 April 1986\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">DATE</span>\n</mark>\n, \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Chernobyl\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n suffered the world’s worst nuclear disaster. An experiment designed to test the safety of the power plant went wrong and caused a fire which spewed radiation for \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    10 days\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">DATE</span>\n</mark>\n. Clouds carrying radioactive particles drifted for \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    thousands of miles\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">QUANTITY</span>\n</mark>\n, releasing toxic rain all over \n<mark class="entity" style="background: #ff9561; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Europe\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">LOC</span>\n</mark>\n. Those living close to \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Chernobyl\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n - \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    about 116,000\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">CARDINAL</span>\n</mark>\n people - were immediately evacuated. A \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    30 km\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">QUANTITY</span>\n</mark>\n exclusion zone was imposed around the damaged reactor. This was later expanded to cover more affected areas.</div>'}, '[POST] /concepts/extract': {'concepts': [{'id': 'Q129677', 'weight': 0.19254024269693001, 'labels': [{'text': 'Chernobyl', 'mentions': [{'start': 18, 'end': 27}, {'start': 329, 'end': 338}]}]}, {'id': 'Q11448', 'weight': 0.13384788867053848, 'labels': [{'text': 'radioactive', 'mentions': [{'start': 215, 'end': 226}]}, {'text': 'radiation', 'mentions': [{'start': 176, 'end': 185}]}]}, {'id': 'Q46', 'weight': 0.11258210752213413, 'labels': [{'text': 'Europe', 'mentions': [{'start': 299, 'end': 305}]}]}, {'id': 'Q274160', 'weight': 0.07002172766602058, 'labels': [{'text': 'toxic', 'mentions': [{'start': 279, 'end': 284}]}]}, {'id': 'Q7925', 'weight': 0.06886892370214791, 'labels': [{'text': 'rain', 'mentions': [{'start': 285, 'end': 289}]}]}, {'id': 'Q101965', 'weight': 0.06562043143894636, 'labels': [{'text': 'experiment', 'mentions': [{'start': 76, 'end': 86}]}]}, {'id': 'Q3196', 'weight': 0.06482017292518794, 'labels': [{'text': 'fire', 'mentions': [{'start': 158, 'end': 162}]}]}, {'id': 'Q356936', 'weight': 0.06390318225879862, 'labels': [{'text': 'exclusion zone', 'mentions': [{'start': 400, 'end': 414}]}]}, {'id': 'Q486', 'weight': 0.06317545950269358, 'labels': [{'text': 'nuclear disaster', 'mentions': [{'start': 55, 'end': 71}]}, {'text': 'disaster', 'mentions': []}]}, {'id': 'Q11369', 'weight': 0.057931103203040506, 'labels': [{'text': 'particles', 'mentions': [{'start': 227, 'end': 236}]}]}, {'id': 'Q8074', 'weight': 0.05530684102502764, 'labels': [{'text': 'Clouds', 'mentions': [{'start': 199, 'end': 205}]}]}, {'id': 'Q11573', 'weight': 0.05138191938853427, 'labels': [{'text': 'km', 'mentions': [{'start': 397, 'end': 399}]}]}]}}
print(res.responses[Entities.Extract.eid()])
# => {'entities': [{'start': 3, 'end': 16, 'ent_type': 'DATE', 'text': '26 April 1986'}, {'start': 18, 'end': 27, 'ent_type': 'GPE', 'text': 'Chernobyl'}, {'start': 190, 'end': 197, 'ent_type': 'DATE', 'text': '10 days'}, {'start': 249, 'end': 267, 'ent_type': 'QUANTITY', 'text': 'thousands of miles'}, {'start': 299, 'end': 305, 'ent_type': 'LOC', 'text': 'Europe'}, {'start': 329, 'end': 338, 'ent_type': 'GPE', 'text': 'Chernobyl'}, {'start': 341, 'end': 354, 'ent_type': 'CARDINAL', 'text': 'about 116,000'}, {'start': 394, 'end': 399, 'ent_type': 'QUANTITY', 'text': '30 km'}], 'visualization': '<div class="entities" style="line-height: 2.5; direction: ltr">On \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    26 April 1986\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">DATE</span>\n</mark>\n, \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Chernobyl\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n suffered the world’s worst nuclear disaster. An experiment designed to test the safety of the power plant went wrong and caused a fire which spewed radiation for \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    10 days\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">DATE</span>\n</mark>\n. Clouds carrying radioactive particles drifted for \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    thousands of miles\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">QUANTITY</span>\n</mark>\n, releasing toxic rain all over \n<mark class="entity" style="background: #ff9561; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Europe\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">LOC</span>\n</mark>\n. Those living close to \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Chernobyl\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n - \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    about 116,000\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">CARDINAL</span>\n</mark>\n people - were immediately evacuated. A \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    30 km\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">QUANTITY</span>\n</mark>\n exclusion zone was imposed around the damaged reactor. This was later expanded to cover more affected areas.</div>'}

# EndpointModel.ied() returns the id of endpoint which is used in the response

Bulk Request

Most of the time, you will need to apply this process on a dataset. Again, you will gain speed by using bulk request. It allows to perform many operations in the same time. Here is an example on how to do it:

from PyCrowlingo import Bulk, Pipeline
from PyCrowlingo.ApiModels import *
text = "Est-il recommandé d'utiliser MongoDb pour indexer mes documents ?"
pipelines = [Pipeline().add(Languages.Detect, text=text)] * 300
res = Bulk(client, pipelines).call()
assert len(res.responses) == 300 # True

You can also do it in an iterative way:

from PyCrowlingo import Bulk, Pipeline
from PyCrowlingo.ApiModels import *
text = "Est-il recommandé d'utiliser MongoDb pour indexer mes documents ?"
bulk = Bulk(client)
for i in range(300):
    bulk.add(Pipeline().add(Languages.Detect, text=text))
res = bulk.call()
assert len(res.responses) == 300 # True

Using a bulk will automatically make API requests using batch (you can controle its size using batch_size argument). So that, you don't have to worry about the management of the query size.

Errors

Sometimes, you may face error when you call an endpoint. Every errors are identifiable by their ID. It can be easily managed on a pythonic way:

from PyCrowlingo.Errors import ModelNotFound, CrowlingoException

model_id = "AskUbuntu"
try:
    client.classifier.clear_model(model_id)
except ModelNotFound:
    client.classifier.create_model(model_id)
except CrowlingoException as e:
    print(e)

Here is the list of available exceptions:

Class	Error ID	Status code	Description
TrainingError	TRAINING_ERROR	400	An error happened during the training
TokenNotFound	TOKEN_NOT_FOUND	401	Token not found. Insert your token in the query parameter with api_key=[YOUR_TOKEN] or in the headers with x-api-key:[YOUR TOKEN].
BadCredentials	BAD_CREDENTIALS	401	Could not validate credentials. Their might be an error in your token or email/password. Maybe your account has been disabled. Please contact us if you do not understand the reason.
TestModelForbidden	TEST_MODEL_FORBIDDEN	403	You do not have access to the test version of this model. Ask the access to the owner of the model or use the prod version of this model.
BadModelsPerms	BAD_MODELS_PERMS	403	You do not have the permissions to perform this action on this model. Ask for the owner of this model to provides you more rights.
BadModelCategory	BAD_MODEL_CATEGORY	404	This model cannot be use for this kind of request. Create a new model or use another endpoint.
ModelNotDeployed	MODEL_NOT_DEPLOYED	404	This model is not deployed. Use the test model or deploy it first.
CollaboratorNotFound	COLLABORATOR_NOT_FOUND	404	This collaborator was not found. Maybe it has already delete the model or you did not add it as collaborator on this model.
ModelNotFound	MODEL_NOT_FOUND	404	We cannot find a model with this id. You have to create a model before using it.
DocumentNotFound	DOCUMENT_NOT_FOUND	404	We cannot find a document with this id. You have to create a document before using it.
DuplicateModelId	DUPLICATE_MODEL_ID	409	You already have a model with this id, please delete the model first if you want to overwrite it or use the endpoint update to create a new version of this model.
ContentLengthRequired	CONTENT_LENGTH_REQUIRED	411	You need to provide a content length header for POST and PATCH requests.
RequestEntityTooLarge	REQUEST_ENTITY_TOO_LARGE	413	The payload of your body is too large. Try to split your request with smaller payload.
BadParametersQuery	BAD_PARAMETERS_QUERY	422	The parameters of the query do not correspond to the documentation description. The query cannot be processed.
ModelNotTrained	MODEL_NOT_TRAINED	423	This model is not trained yet. You have to wait until it is trained or run the training before performing this action.
MinuteLimitReached	MINUTE_LIMIT_REACHED	429	Minute limit reached, wait the number of seconds indicated by the header: x-minute-reset or change subscription plan.
PeriodLimitReached	PERIOD_LIMIT_REACHED	429	Period limit reached, wait the number of seconds indicated by the header: x-period-reset or change subscription plan.
ModelsLimitReached	MODELS_LIMIT_REACHED	429	You have reached the maximal number of custom models. If you want to create a new one, you have to delete one of your custom models first or change your subscription plan.
InternalError	INTERNAL_ERROR	500	Internal Error, we have been notified and will fix the problem as soon as possible. Try again later and do not hesitate to contact us if you need help.

Upload Data

If you want to build custom models, you will have to upload your dataset. You can do it automatically on a CSV by using the function classifier.upload_documents.

client.classifier.upload_csv(model_id, "data.csv", fieldnames=["text", "class_id"], delimiter="\t")

It will split the dataset in several parts to avoid exceed the payload size limit. If you have a more specific dataset format, you can do it by using the functions listed on the API documentation.

Wait for asynchronous actions

Some functions of Crowlingo might be long, so they are asynchronous. that means it will send you a response before the end of the process. For each one, you have a function to watch the progression and wait until the end of the task. Here are the functions to wait for each task:

Async Function	Wait Function
`client.model.train`	`client.model.wait_training`
`client.model.deploy`	`client.model.wait_deploying`
`client.search_engine.create_documents`	`client.search_engine.wait_indexing`

For example, use these lines to train, and wait until the model is deployed:

client.model.train(model_id)
client.model.wait_training(model_id)
client.model.deploy(model_id)
client.model.wait_deploying(model_id)

Rasa

Crowlingo services can be very useful to create a polyglot chatbot using an existing one. The easiest way is to do it through Rasa. PyCrowlingo provides packages to easily integrate on Rasa.

Installation

To install rasa dependencies, simply enter the following command:

pip install PyCrowlingo[rasa]

Follow the Rasa quick start guide to build your chatbot.

Usage

Open the file config.yml and modify the pipeline to integrate Crowlingo NLU components.

Here is an example of a chatbot created with Rasa quick start guide:

language: en
pipeline:
  - name: PyCrowlingo.Rasa.EntitiesExtractor
    token: "<TOKEN>"
  - name: PyCrowlingo.Rasa.IntentClassifier
    token: "<TOKEN>"
    model_id: "intent_rasa"

Train the model:

rasa train

And now, enjoy your multilingual chatbot:

rasa shell
>>> Your input -> Bonjour !
<<< Hey! How are you ?
>>> Your input -> Va bene :)
<<< Great! Carry on!
>>> Your input -> Bist du ein Roboter oder ein Mensch?
<<< I am a bot powered by Rasa

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.6.4

Aug 11, 2021

0.6.3

May 2, 2021

0.6.2

Mar 15, 2021

0.6.0

Jan 7, 2021

0.5.1

Dec 22, 2020

0.5.0

Oct 15, 2020

0.4.9

Oct 2, 2020

0.4.8

Sep 23, 2020

0.4.7

Sep 22, 2020

0.4.6

Sep 9, 2020

0.4.5

Sep 9, 2020

0.4.4

Jul 20, 2020

0.4.3

Jul 7, 2020

0.4.2

Jul 3, 2020

0.4.1

Jun 27, 2020

0.4

Jun 15, 2020

0.3

Jun 5, 2020

0.2

May 26, 2020

0.1

May 25, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyCrowlingo-0.6.4.tar.gz (54.9 kB view hashes)

Uploaded Aug 11, 2021 Source

Built Distribution

PyCrowlingo-0.6.4-py3-none-any.whl (78.2 kB view hashes)

Uploaded Aug 11, 2021 Python 3

Hashes for PyCrowlingo-0.6.4.tar.gz

Hashes for PyCrowlingo-0.6.4.tar.gz
Algorithm	Hash digest
SHA256	`bd30663665e95c5993783c7689c69e7447e047f0cf8c76918a94c819db79a186`
MD5	`13ca6d49e905029425fc26a0c6be250b`
BLAKE2b-256	`fb8433bb551e79ad8cd87c5ef5ade7668280953668bfd6cbc9d6c7f045ad09f0`

Hashes for PyCrowlingo-0.6.4-py3-none-any.whl

Hashes for PyCrowlingo-0.6.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45205c6cfbb348cfb7974a0b8d36cce22a89c76e78b0a0064b6ba9455f3c5a6b`
MD5	`38f249784cd57f3f7480e50856eda105`
BLAKE2b-256	`d3d4434debbf59d5886c005c641098dd63326cdcb72ac418a9e88848f7b3db79`