Skip to main content

Pandas extension with NLP functionalities

Project description

Pandas NLP

It's an extension for pandas providing some NLP functionalities for strings.

build version codecov pyversion-button

Setup

Requirements

  • python >= 3.8

Installation

Execute:

pip install -U pandas-nlp

To install the default spacy English model:

spacy install en_core_web_md

Key features

Language detection

import pandas as pd
import pandas_nlp

pandas_nlp.register()

df = pd.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "text": [
        "I like cats",
        "Me gustan los gatos",
        "M'agraden els gats",
        "J'aime les chats",
        "Ich mag Katzen",
    ],
})
df.text.nlp.language()

Output

0    en
1    es
2    ca
3    fr
4    de
Name: text_language, dtype: object

with confidence:

df.text.nlp.language(confidence=True).apply(pd.Series)

Output

  language  confidence
0       en    0.897090
1       es    0.982045
2       ca    0.999806
3       fr    0.999713
4       de    0.997995

String embedding

import pandas as pd
import pandas_nlp

pandas_nlp.register()

df = pd.DataFrame(
    {"id": [1, 2, 3], "text": ["cat", "dog", "violin"]}
)
df.text.nlp.embedding()

Output

0    [3.7032, 4.1982, -5.0002, -11.322, 0.031702, -...
1    [1.233, 4.2963, -7.9738, -10.121, 1.8207, 1.40...
2    [-1.4708, -0.73871, 0.49911, -2.1762, 0.56754,...
Name: text_embedding, dtype: object

Closest concept

import pandas as pd
import pandas_nlp

pandas_nlp.register()

themed = pd.DataFrame({
    "id": [0, 1, 2, 3],
    "text": [
        "My computer is broken",
        "I went to a piano concert",
        "Chocolate is my favourite",
        "Mozart played the piano"
    ]
})

themed.text.nlp.closest(["music", "informatics", "food"])

Output

0    informatics
1          music
2           food
3          music
Name: text_closest, dtype: object

Sentence extraction

import pandas as pd
import pandas_nlp

pandas_nlp.register()

df = pd.DataFrame(
    {"id": [0, 1], "text": ["Hello, how are you?", "Code. Sleep. Eat"]}
)
df.text.nlp.sentences()

Output

0    [Hello, how are you?]
1     [Code., Sleep., Eat]
Name: text_sentences, dtype: object

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas-nlp-0.6.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

pandas_nlp-0.6.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file pandas-nlp-0.6.0.tar.gz.

File metadata

  • Download URL: pandas-nlp-0.6.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.8.10 Darwin/21.6.0

File hashes

Hashes for pandas-nlp-0.6.0.tar.gz
Algorithm Hash digest
SHA256 823182206672ace92bd85e9a06931c659703dceccd87db0bc09f3e044f461eaa
MD5 7ed9100de3408b44cca6aaeffd61ddb4
BLAKE2b-256 8af90d08c1273a532caa0ef27d5e5942ee1961a49c16d3af891aa795883fd517

See more details on using hashes here.

File details

Details for the file pandas_nlp-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: pandas_nlp-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.8.10 Darwin/21.6.0

File hashes

Hashes for pandas_nlp-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50fbe2e9d9ecfc80fc0d30c05e8b94bd33ae57d47ae36dc8e022dc25b4322214
MD5 f66ea6aa007fabf95dafc864e98238d2
BLAKE2b-256 e8b6d4e4a483d288ca2d7805031bbb9c949dadb9406a3b58f1ca4be6a20ac939

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page