Python client for expert.ai Natural Language API
Project description
expert.ai Natural Language API for Python
Python client for the expert.ai Natural Language API. Leverage Natural Language understanding from your Python apps.
Installation (development)
You can use pip to install the library:
$ pip install expertai-nlapi
Installation (contributor)
Clone the repository and run the following script:
$ cd nlapi-python
$ pip install -r requirements-dev.txt
As good practice it's recommended to work in an isolated Python environment, creating a virtual environment with virtualenv package before building the package. You can create your environment with the command
$ virtualenv expertai
$ source expertai/bin/activate
Usage
Before making requests to the API, you need to create an instance of the ExpertClient
. You have to set your API Credentials as environment variables:
export EAI_USERNAME=YOUR_USER
export EAI_PASSWORD=YOUR_PASSWORD
or to define them as part of your code
import os
os.environ["EAI_USERNAME"] = 'your@account.email'
os.environ["EAI_PASSWORD"] = 'yourpwd'
Currently, the API supports five languages, i.e. English, French, Spanish, Italian and German. You have to define the text you want to process and the language model to use for the analysis.
from expertai.nlapi.cloud.client import ExpertAiClient
client = ExpertAiClient()
text = 'Facebook is looking at buying an American startup for $6 million based in Springfield, IL .'
language= 'en'
Quick run
Let's start with the first API call just sending the text.
document = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'disambiguation'
})
A disambiguation
analysis returns all the information that the Natural Language engine comprehended from the text. Let's see in the details the API response.
Tokenization & Lemmatization
Lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a morphological analysis to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.
print (f'{"TOKEN":{20}} {"LEMMA":{8}}')
for token in document.tokens:
print (f'{text[token.start:token.end]:{20}} {token.lemma:{8}}')
TOKEN LEMMA
Facebook Facebook Inc.
is is
looking at look at
buying buy
an an
American American
startup startup
for for
$6 million 6,000,000 dollar
based base
in in
Springfield, IL Springfield
. .
Part of Speech
We also looked at the part-of-speech information assigned to each token; PoS values are from the Universal Dependencies framework
print (f'{"TOKEN":{18}} {"PoS":{4}}')
for token in document.tokens:
print (f'{text[token.start:token.end]:{18}} {token.pos.key:{4}} ' )
TOKEN PoS
Facebook PROPN
is AUX
looking at VERB
buying VERB
an DET
American ADJ
startup NOUN
for ADP
$6 million NOUN
based VERB
in ADP
Springfield, IL PROPN
. PUNCT
Dependency Parsing information
The analysis returns the dependency parsing information assigned to each token, using the Universal Dependencies framework as well.
print (f'{"TOKEN":{18}} {"Dependency label":{8}}')
for token in document.tokens:
print (f'{text[token.start:token.end]:{18}} {token.dependency.label:{4}} ' )
TOKEN Dependency label
Facebook nsubj
is aux
looking at root
buying advcl
an det
American amod
startup obj
for case
$6 million obl
based acl
in case
Springfield, IL obl
. punct
Named Entities
Going a step beyond tokens, named entities add another layer of context. Named entities are obtained with the entities
analysis.
document = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'entities'})
print (f'{"ENTITY":{40}} {"TYPE":{10}})
for entity in document.entities:
print (f'{entity.lemma:{40}} {entity.type_{10}}')
ENTITY TYPE
6,000,000 dollar MON
Springfield GEO
Facebook Inc. COM
In addition to the entity type, the API provides some metadata from Linked Open Data sources such as WikiData and GeoNames.
For example, you can get the open data connected with the entity Springfield, IL
print(document.entities[1].lemma)
Springfield
for entry in document.knowledge:
if (entry.syncon == document.entities[1].syncon):
for prop in entry.properties:
print (f'{prop.type_:{12}} {prop.value:{30}}')
Coordinate Lat:39.47.58N/39.799446;Long:89.39.18W/-89.654999
DBpediaId dbpedia.org/page/Springfield
GeoNamesId 4250542
WikiDataId Q28515
Springfield has been recognized as Q28515 on Wikidata, that is the Q-id for Springfield, IL (i.e.not for Springfield in Vermont o in California)
Key Elements
Key elements are obtained with the relevants
analysis and identified from the document as main sentences, main keywords, main lemmas and relevant topics; let's focus on the main lemmas of the document; each lemma is provided with a relevance score.
document = client.specific_resource_analysis(
body={"document": {"text": text}},
params={'language': language, 'resource': 'relevants'})
print (f'{"LEMMA":{20}} {"SCORE":{5}} ')
for mainlemma in document.main_lemmas:
print (f'{mainlemma.value:{20}} {mainlemma.score:{5}}')
LEMMA SCORE
Facebook Inc. 43.5
startup 40.4
Springfield 15
Classification
Let's see how to classify documents according to the IPTC Media Topics Taxonomy; we're going to use a text that has more textual information and then we'll use the matplot lib to show a bar chart with the categorization results
text = """Strategic acquisitions have been important to the growth of Facebook (FB).
Mark Zuckerberg founded the company in 2004, and since then it has acquired scores of companies,
ranging from tiny two-person start-ups to well-established businesses such as WhatsApp. For 2019,
Facebook reported 2.5 billion monthly active users (MAU) and $70.69 billion in revenue."""
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
taxonomy='iptc'
document = client.classification(body={"document": {"text": text}}, params={'taxonomy': taxonomy, 'language': language})
categories = []
scores = []
print (f'{"CATEGORY":{27}} {"IPTC ID":{10}} {"FREQUENCY":{8}}')
for category in document.categories:
categories.append(category.label)
scores.append(category.frequency)
print (f'{category.label:{27}} {category.id_:{10}}{category.frequency:{8}}')
CATEGORY IPTC ID FREQUENCY
Earnings 20000178 29.63
Social networking 20000769 21.95
plt.bar(categories, scores, color='#17a2b8')
plt.xlabel("Categories")
plt.ylabel("Frequency")
plt.title("Media Topics Classification")
plt.show()
Good job! You're an expert in the expert.ai community! :clap: :tada:
Check out other language SDKs available on our Github page.
Capabilites
These are all the analysis and classification capabilities of the API.
Document Analysis
-
Partial analyses:
Document Classification
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for expertai_nlapi-2.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86121299e20ca82307b4b72db3af2a03d9a8a84d7d9292517d9cd3764acd7866 |
|
MD5 | 0a75bd65dc306e53a7ecfe178dde4c9d |
|
BLAKE2b-256 | 68e47843c23c98f2080e4adb8089bb10e2a38ea3309ec4a2cf60203186af3534 |