Python client for expert.ai Natural Language API

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

expert.ai Natural Language API for Python

Python client for the expert.ai Natural Language API. Leverage Natural Language understanding from your Python apps.

Installation (development)

You can use pip to install the library:

$ pip install expertai-nlapi

Installation (contributor)

Clone the repository and run the following script:

$ cd nlapi-python
$ pip install -r requirements-dev.txt

As good practice it's recommended to work in a isolated Python environment, creating a virtual environment with virtualenv package before building the package. You can create an isolated environment with the command

$ virtualenv expertai
$ source expertai/bin/activate

Usage

Before making requests to the API, you need to create an instance of the ExpertClient. You will set your API Credentials as environment variables:

export EAI_USERNAME=YOUR_USER
export EAI_PASSWORD=YOUR_PASSWORD

Currently the API supports five languages such as English, French, Spanish, Italian and German. You have the define the text you want to process and the language model to use for the analysis.

from expertai.client import ExpertAiClient
client = ExpertAiClient()

text = 'Facebook is looking at buying an American startup for $6 million based in Springfield, IL .' 
language= 'en'

Quick run

Let's start with the fist API call to check what does, just sending the text. This is how it looks like.

document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'disambiguation'
})

We request a disambiguation analysis that returns all the information that the Natural Language engine comprehended from the text. Let's see in the details

Tokenization & Lemmatization

Lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a morphological analysis to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.

print (f'{"TOKEN":{20}} {"LEMMA":{8}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{20}} {token.lemma:{8}}')

TOKEN                LEMMA   
Facebook             Facebook Inc.
is                   is      
looking at           look at 
buying               buy     
an                   an      
American             American
startup              startup 
for                  for     
$6 million           6,000,000 dollar
based                base    
in                   in      
Springfield, IL      Springfield
.                    .

Part of Speech

We also looked at the part-of-speech information assigned to each token

print (f'{"TOKEN":{18}} {"Type":{4}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{18}} {token.type_:{4}}  ' )

TOKEN              PoS PoS Description
TOKEN              Type
Facebook             NPR.COM 
is                   AUX  
looking at           VER    
buying               VER  
an                   ART  
American             ADJ  
startup              NOU  
for                  PRE  
$6 million           NOU.MON 
based                VER  
in                   PRE  
Springfield, IL      NPR.GEO 
.                    PNT

Dependency Parsing information

We also looked at the dependency parsing information assigned to each token

print (f'{"TOKEN":{18}} {"Dependency label":{8}}')

for token in document.tokens:
    print (f'{text[token.start:token.end]:{18}} {token.dependency.label:{4}} ' )

TOKEN              Dependency label
Facebook           nsubj 
is                 aux  
looking at         root 
buying             advcl 
an                 det  
American           amod 
startup            obj  
for                case 
$6 million         obl  
based              acl  
in                 case 
Springfield, IL    obl  
.                  punct

Named Entities

Going a step beyond tokens, named entities add another layer of context. Named entities are accessible through the entities object.

document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'entities'})


print (f'{"ENTITY":{20}} {"TYPE":{10}} {"TYPE_EXPLAINED":{10}}')

for entity in document.entities:
    print (f'{entity.lemma:{20}} {entity.type_.key:{10}} {entity.type_.description:{10}}')

ENTITY               TYPE       TYPE_EXPLAINED
6,000,000 dollar     MON        Money     
Springfield          GEO        Administrative geographic areas
Facebook Inc.        COM        Businesses / companies

Then you can get the open data connected with the entity Springfield, IL

print(document.entities[1].lemma)

Springfield

for entry in document.knowledge:
    if (entry.syncon == document.entities[1].syncon):
            for prop in entry.properties:
                print (f'{prop.type_:{12}} {prop.value:{30}}')

Coordinate   Lat:39.47.58N/39.799446;Long:89.39.18W/-89.654999
DBpediaId    dbpedia.org/page/Springfield  
GeoNamesId   4250542                       
WikiDataId   Q28515

Springfield has been recognize as Q28515 on Wikidata, that is the Q-id for Springfied, IL (i.e.not for Springfield in Vermont o in California)

Key Elements

Key elements are identified from the document as main sentences, main keywords, main lemmas and relevant topics; let's focus on the main lemmas of the document.

document = client.specific_resource_analysis(
    body={"document": {"text": text}}, 
    params={'language': language, 'resource': 'relevants'})


print (f'{"LEMMA":{20}} {"SCORE":{5}} ')

for mainlemma in document.main_lemmas:
    print (f'{mainlemma.value:{20}} {mainlemma.score:{5}}')

LEMMA                SCORE 
Facebook Inc.         43.5
startup               40.4
Springfield             15

Classification

Let's see how to classify documents according the IPTC Media Topics Taxonomy; we're going to use a text that has more textual information and then we'll use the matplot lib to show the categorization result

text = """Strategic acquisitions have been important to the growth of Facebook (FB). 
Mark Zuckerberg founded the company in 2004, and since then it has acquired scores of companies, 
ranging from tiny two-person start-ups to well-established businesses such as WhatsApp. For 2019, 
Facebook reported 2.5 billion monthly active users (MAU) and $70.69 billion in revenue."""

import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

document = client.iptc_media_topics_classification(body={"document": {"text": text}}, params={'language': language})

categories = []
scores = []

print (f'{"CATEGORY":{27}} {"IPTC ID":{10}} {"FREQUENCY":{8}}')
for category in document.categories:
    categories.append(category.label)
    scores.append(category.frequency)
    print (f'{category.label:{27}} {category.id_:{10}}{category.frequency:{8}}')

CATEGORY                    IPTC ID    FREQUENCY
Earnings                    20000178     29.63
Social networking           20000769     21.95

plt.bar(categories, scores, color='#17a2b8')
plt.xlabel("Categories")
plt.ylabel("Frequency")
plt.title("Media Topics Classification")

plt.show()

png

Good job! You're an expert of expert.ai community!

Check out other language SDKs available on our github page.

Available endpoints

These are all the endpoints of the API. For more information about each endpoint, check out the API documentation.

Document Analysis

Document Classification

IPTC Media Topics classification

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.5.0

Dec 16, 2022

2.4.1

Mar 4, 2022

2.4.0

Feb 18, 2022

2.3.1

May 5, 2021

2.3.0

May 5, 2021

2.2.0

Mar 15, 2021

2.1.3

Nov 30, 2020

2.1.2

Nov 30, 2020

2.1.1

Nov 27, 2020

2.1.0

Nov 16, 2020

2.0.1

Oct 30, 2020

1.2.5

Oct 15, 2020

1.2.4

Oct 15, 2020

This version

1.2.1

Oct 15, 2020

1.0.4

Oct 7, 2020

1.0.3

Oct 7, 2020

1.0.2

Oct 2, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expertai-nlapi-1.2.1.tar.gz (21.2 kB view hashes)

Uploaded Oct 15, 2020 Source

Built Distribution

expertai_nlapi-1.2.1-py3-none-any.whl (44.7 kB view hashes)

Uploaded Oct 15, 2020 Python 3

Hashes for expertai-nlapi-1.2.1.tar.gz

Hashes for expertai-nlapi-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`80692cb49add4e6749d9b067a4e7e3aa66d69a49f41d784775cd3d96cff63d21`
MD5	`c8de558cb6f59295996ecc7bae8bba73`
BLAKE2b-256	`a535d1973033b912fe69c234a013ea24028227eb9aa1d20b8feb908fff632b9d`

Hashes for expertai_nlapi-1.2.1-py3-none-any.whl

Hashes for expertai_nlapi-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2096daa02851c4af3ccbb3cc76108074625adf1ba6a2d4a71096e2cf4f14af29`
MD5	`ce158d174282cdcbae9948023550301e`
BLAKE2b-256	`40799af440c0dcc9455abec5e74fd471029fc44dbce94d78b5e70983e8119edb`