Skip to main content

Python client for expert.ai Natural Language API

Project description

expert.ai Natural Language API for Python

Python client for the expert.ai Natural Language API. Leverage Natural Language understanding from your Python apps.

Installation (development)

You can use pip to install the library:

$ pip install expertai-nlapi

Installation (contributor)

Clone the repository and run the following script:

$ cd nlapi-python
$ pip install -r requirements-dev.txt

As good practice it's recommended to work in an isolated Python environment, creating a virtual environment with virtualenv package before building the package. You can create your environment with the command

$ virtualenv expertai
$ source expertai/bin/activate

Usage

Before making requests to the API, you need to create an instance of the ExpertClient. You have to set your API Credentials as environment variables:

export EAI_USERNAME=YOUR_USER
export EAI_PASSWORD=YOUR_PASSWORD

or to define them as part of your code

import os
os.environ["EAI_USERNAME"] = 'your@account.email'
os.environ["EAI_PASSWORD"] = 'yourpwd'

Currently, the API supports five languages, i.e. English, French, Spanish, Italian and German. You have to define the text you want to process and the language model to use for the analysis.

from expertai.nlapi.cloud.client import ExpertAiClient
client = ExpertAiClient()
text = 'Facebook is looking at buying an American startup for $6 million based in Springfield, IL .' 
language= 'en'

Quick run

Let's start with the first API call just sending the text.

document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'disambiguation'
})

A disambiguation analysis returns all the information that the Natural Language engine comprehended from the text. Let's see in the details the API response.

Tokenization & Lemmatization

Lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a morphological analysis to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.

print (f'{"TOKEN":{20}} {"LEMMA":{8}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{20}} {token.lemma:{8}}')
TOKEN                LEMMA   
Facebook             Facebook Inc.
is                   is      
looking at           look at 
buying               buy     
an                   an      
American             American
startup              startup 
for                  for     
$6 million           6,000,000 dollar
based                base    
in                   in      
Springfield, IL      Springfield
.                    .       

Part of Speech

We also looked at the part-of-speech information assigned to each token; PoS values are from the Universal Dependencies framework

print (f'{"TOKEN":{18}} {"PoS":{4}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{18}} {token.pos.key:{4}}  ' )
TOKEN              PoS   
Facebook           PROPN  
is                 AUX    
looking at         VERB   
buying             VERB   
an                 DET    
American           ADJ    
startup            NOUN   
for                ADP    
$6 million         NOUN   
based              VERB   
in                 ADP    
Springfield, IL    PROPN  
.                  PUNCT   

Dependency Parsing information

The analysis returns the dependency parsing information assigned to each token, using the Universal Dependencies framework as well.

print (f'{"TOKEN":{18}} {"Dependency label":{8}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{18}} {token.dependency.label:{4}} ' )
TOKEN              Dependency label
Facebook           nsubj 
is                 aux  
looking at         root 
buying             advcl 
an                 det  
American           amod 
startup            obj  
for                case 
$6 million         obl  
based              acl  
in                 case 
Springfield, IL    obl  
.                  punct 

Named Entities

Going a step beyond tokens, named entities add another layer of context. Named entities are obtained with the entities analysis.

document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'entities'})


print (f'{"ENTITY":{40}} {"TYPE":{10}})

for entity in document.entities:
    print (f'{entity.lemma:{40}} {entity.type_{10}}')
ENTITY               TYPE
6,000,000 dollar     MON        
Springfield          GEO        
Facebook Inc.        COM

In addition to the entity type, the API provides some metadata from Linked Open Data sources such as WikiData and GeoNames. For example, you can get the open data connected with the entity Springfield, IL

print(document.entities[1].lemma)
Springfield
for entry in document.knowledge:
    if (entry.syncon == document.entities[1].syncon):
            for prop in entry.properties:
                print (f'{prop.type_:{12}} {prop.value:{30}}')
Coordinate   Lat:39.47.58N/39.799446;Long:89.39.18W/-89.654999
DBpediaId    dbpedia.org/page/Springfield  
GeoNamesId   4250542                       
WikiDataId   Q28515                        

Springfield has been recognized as Q28515 on Wikidata, that is the Q-id for Springfield, IL (i.e.not for Springfield in Vermont o in California)

Key Elements

Key elements are obtained with the relevants analysis and identified from the document as main sentences, main keywords, main lemmas and relevant topics; let's focus on the main lemmas of the document; each lemma is provided with a relevance score.

document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'relevants'})


print (f'{"LEMMA":{20}} {"SCORE":{5}} ')

for mainlemma in document.main_lemmas:
    print (f'{mainlemma.value:{20}} {mainlemma.score:{5}}')
LEMMA                SCORE 
Facebook Inc.         43.5
startup               40.4
Springfield             15

Classification

Let's see how to classify documents according to the IPTC Media Topics Taxonomy; we're going to use a text that has more textual information and then we'll use the matplot lib to show a bar chart with the categorization results

text = """Strategic acquisitions have been important to the growth of Facebook (FB). 
Mark Zuckerberg founded the company in 2004, and since then it has acquired scores of companies, 
ranging from tiny two-person start-ups to well-established businesses such as WhatsApp. For 2019, 
Facebook reported 2.5 billion monthly active users (MAU) and $70.69 billion in revenue."""
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

taxonomy='iptc'

document = client.classification(body={"document": {"text": text}}, params={'taxonomy': taxonomy, 'language': language})

categories = []
scores = []

print (f'{"CATEGORY":{27}} {"IPTC ID":{10}} {"FREQUENCY":{8}}')
for category in document.categories:
    categories.append(category.label)
    scores.append(category.frequency)
    print (f'{category.label:{27}} {category.id_:{10}}{category.frequency:{8}}')
CATEGORY                    IPTC ID    FREQUENCY
Earnings                    20000178     29.63
Social networking           20000769     21.95
plt.bar(categories, scores, color='#17a2b8')
plt.xlabel("Categories")
plt.ylabel("Frequency")
plt.title("Media Topics Classification")

plt.show()

png

Good job! You're an expert in the expert.ai community! :clap: :tada:

Check out other language SDKs available on our Github page.

Capabilites

These are all the analysis and classification capabilities of the API.

Document Analysis

Document Classification

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expertai-nlapi-2.0.1.tar.gz (31.3 kB view hashes)

Uploaded Source

Built Distribution

expertai_nlapi-2.0.1-py3-none-any.whl (83.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page