Python client for expert.ai Natural Language API
Project description
expert.ai Natural Language API for Python
Python client for the expert.ai Natural Language APIs adds Natural Language understanding capabilities to your Python apps. The client can use either the Cloud based Natural Language API or a local instance of Edge NL API.
Installation (development)
You can use pip
to install the library:
$ pip install expertai-nlapi
Installation (contributor)
Clone the repository and run the following script:
$ cd nlapi-python
$ pip install -r requirements-dev.txt
As good practice it's recommended to work in an isolated Python environment, creating a virtual environment with virtualenv package before building the package. You can create your environment with the command:
$ virtualenv expertai $ source expertai/bin/activate
Usage
The Python client code expects expert.ai developer account credentials to be available as environment variables:
- Linux:
export EAI_USERNAME=YOUR_USER
export EAI_PASSWORD=YOUR_PASSWORD
- Windows:
SET EAI_USERNAME=YOUR_USER
SET EAI_PASSWORD=YOUR_PASSWORD
You can also define them inside your code:
import os
os.environ["EAI_USERNAME"] = 'your@account.email'
os.environ["EAI_PASSWORD"] = 'yourpwd'
If you don't have an account, sign up on the developer portal.
The next thing to do is instantiating the client:
- To use Natural Language API:
from expertai.nlapi.cloud.client import ExpertAiClient
client = ExpertAiClient()
- To use the local Edge NL API:
from expertai.nlapi.edge.client import ExpertAiClient
client = ExpertAiClient()
Then, set the text and the language:
text = 'Facebook is looking at buying an American startup for $6 million based in Springfield, IL .'
language= 'en'
If you use Edge NL API, the language must match that of your service. If you are doing document analysis, check the availability of specific capabilities for the language here.
If you are using the Cloud NL API out-of-the-box classification, check the availability of the taxonomy for the language here.
Sample analysis
To perform the deep linguistic analysis of the text:
- Natural Language API:
document = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'disambiguation'
})
- Edge NL API:
document = client.deep_linguistic_analysis(text)
Tokenization & Lemmatization
Lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a morphological analysis to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.
print (f'{"TOKEN":{20}} {"LEMMA":{8}}')
for token in document.tokens:
print (f'{text[token.start:token.end]:{20}} {token.lemma:{8}}')
TOKEN LEMMA
Facebook Facebook Inc.
is is
looking at look at
buying buy
an an
American American
startup startup
for for
$6 million 6,000,000 dollar
based base
in in
Springfield, IL Springfield
. .
Part of Speech
Analysis determines the part-of-speech of tokens. PoS labels are from the Universal Dependencies framework.
print (f'{"TOKEN":{18}} {"PoS":{4}}')
for token in document.tokens:
print (f'{text[token.start:token.end]:{18}} {token.pos:{4}} ' )
TOKEN PoS
Facebook PROPN
is AUX
looking at VERB
buying VERB
an DET
American ADJ
startup NOUN
for ADP
$6 million NOUN
based VERB
in ADP
Springfield, IL PROPN
. PUNCT
Dependency parsing information
The analysis returns the dependency parsing information assigned to each token, using the Universal Dependencies framework as well.
print (f'{"TOKEN":{18}} {"Dependency label":{8}}')
for token in document.tokens:
print (f'{text[token.start:token.end]:{18}} {token.dependency.label:{4}} ' )
TOKEN Dependency label
Facebook nsubj
is aux
looking at root
buying advcl
an det
American amod
startup obj
for case
$6 million obl
based acl
in case
Springfield, IL obl
. punct
Named Entities
Going a step beyond linguistic analysis, named entities add another layer of context. Named entities are recognized by the entities
analysis.
- Natural Language API:
document = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'entities'})
- Edge NL API:
# Edge API
document = client.named_entity_recognition(text)
Printing results:
print (f'{"ENTITY":{40}} {"TYPE":{10}})
for entity in document.entities:
print (f'{entity.lemma:{40}} {entity.type_{10}}')
ENTITY TYPE
6,000,000 dollar MON
Springfield GEO
Facebook Inc. COM
In addition to the entity type, the API provides some metadata from Linked Open Data sources such as WikiData and GeoNames.
For example, you can get the open data connected with the entity Springfield, IL
print(document.entities[1].lemma)
Springfield
for entry in document.knowledge:
if (entry.syncon == document.entities[1].syncon):
for prop in entry.properties:
print (f'{prop.type_:{12}} {prop.value:{30}}')
Coordinate Lat:39.47.58N/39.799446;Long:89.39.18W/-89.654999
DBpediaId dbpedia.org/page/Springfield
GeoNamesId 4250542
WikiDataId Q28515
Springfield has been recognized as Q28515 on Wikidata, that is the Q-id for Springfield, IL (i.e.not for Springfield in Vermont o in California)
Key Elements
Key elements are obtained with the relevants
analysis and identified from the document as main sentences, main concepts (called "syncons"), main lemmas and relevant topics; let's focus on the main lemmas of the document; each lemma is provided with a relevance score.
- Natural Language API:
document = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'relevants'})
- Edge NL API:
document = client.keyphrase_extraction(text)
print (f'{"LEMMA":{20}} {"SCORE":{5}} ')
for mainlemma in document.main_lemmas:
print (f'{mainlemma.value:{20}} {mainlemma.score:{5}}')
LEMMA SCORE
Facebook Inc. 43.5
startup 40.4
Springfield 15
Sentiment
Sentiment is obtained with the sentiment
analysis and it determines how positive or negative the tone of the text is.
text='Today is a good day. I love to go to mountain.'
- Natural Language API:
document = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'sentiment'})
- Edge NL API:
document = client.sentiment(text)
Printing results:
print("sentiment:", response.sentiment.overall)
Relations
Relations are obtained with the relations
analysis that labels concepts expressed in the text with their semantic role.
text='John sent a letter to Mary.'
- Natural Language API:
# cloud API
document = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'relations'})
- Edge NL API:
# Edge API
document = client.relations(text)
Printing results:
for rel in document.relations:
print("Verb:", rel.verb.lemma)
for r in rel.related:
print("Relation:", r.relation, "Lemma:", r.lemma )
Classification
Let's see how to classify documents according to the IPTC Media Topics Taxonomy provided by the Natural Language API; we're going to use a text that has more textual information and then we'll use the matplot lib to show a bar chart with the categorization results.
text = """Strategic acquisitions have been important to the growth of Facebook (FB).
Mark Zuckerberg founded the company in 2004, and since then it has acquired scores of companies,
ranging from tiny two-person start-ups to well-established businesses such as WhatsApp. For 2019,
Facebook reported 2.5 billion monthly active users (MAU) and $70.69 billion in revenue."""
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
taxonomy='iptc'
document = client.classification(body={"document": {"text": text}}, params={'language': language})
categories = []
scores = []
print (f'{"CATEGORY":{27}} {"IPTC ID":{10}} {"FREQUENCY":{8}}')
for category in document.categories:
categories.append(category.label)
scores.append(category.frequency)
print (f'{category.label:{27}} {category.id_:{10}}{category.frequency:{8}}')
CATEGORY ID FREQUENCY
Earnings 20000178 29.63
Social networking 20000769 21.95
plt.bar(categories, scores, color='#17a2b8')
plt.xlabel("Categories")
plt.ylabel("Frequency")
plt.title("Media Topics Classification")
plt.show()
Standard Edge NL API packages dont't provide document classification, but you can create your own document classification service using expert.ai Studio.
To request classification to the Edge NL API simply use:
document = client.classification(text)
Results structure is the same as for the Natural Language API.
Good job! You're an expert in the expert.ai community! :clap: :tada:
Check out other language SDKs available on our Github page.
Capabilites
These are all the analysis and classification capabilities of the Natural Language APIs.
- Natural Language API
- Document analysis
- Document Classification
- Edge NL API
- Document analysis:
- Document classification
- Information extraction
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for expertai_nlapi-2.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 964255371ff5e2ad2f8506c5f5ec7ff6595c39251f9dce9054e4b21d75d6c721 |
|
MD5 | 49dd02406cccb9b0c05e9abb193921d6 |
|
BLAKE2b-256 | 9d2d28ccf1f3d4d941665c584e32c32b44d8541279033096114ae861bc3aa4f7 |