Pandas extension with NLP functionalities
Project description
Pandas NLP
It's an extension for pandas providing some NLP functionalities for strings.
Setup
Requirements
- python >= 3.8
Installation
Execute:
pip install -U pandas-nlp
To install the default spacy English model:
spacy install en_core_web_md
Key features
Language detection
import pandas as pd
import pandas_nlp
pandas_nlp.register()
df = pd.DataFrame({
"id": [1, 2, 3, 4, 5],
"text": [
"I like cats",
"Me gustan los gatos",
"M'agraden els gats",
"J'aime les chats",
"Ich mag Katzen",
],
})
df.text.nlp.language()
Output
0 en
1 es
2 ca
3 fr
4 de
Name: text_language, dtype: object
with confidence:
df.text.nlp.language(confidence=True).apply(pd.Series)
Output
language confidence
0 en 0.897090
1 es 0.982045
2 ca 0.999806
3 fr 0.999713
4 de 0.997995
String embedding
import pandas as pd
import pandas_nlp
pandas_nlp.register()
df = pd.DataFrame(
{"id": [1, 2, 3], "text": ["cat", "dog", "violin"]}
)
df.text.nlp.embedding()
Output
0 [3.7032, 4.1982, -5.0002, -11.322, 0.031702, -...
1 [1.233, 4.2963, -7.9738, -10.121, 1.8207, 1.40...
2 [-1.4708, -0.73871, 0.49911, -2.1762, 0.56754,...
Name: text_embedding, dtype: object
Closest concept
import pandas as pd
import pandas_nlp
pandas_nlp.register()
themed = pd.DataFrame({
"id": [0, 1, 2, 3],
"text": [
"My computer is broken",
"I went to a piano concert",
"Chocolate is my favourite",
"Mozart played the piano"
]
})
themed.text.nlp.closest(["music", "informatics", "food"])
Output
0 informatics
1 music
2 food
3 music
Name: text_closest, dtype: object
Sentence extraction
import pandas as pd
import pandas_nlp
pandas_nlp.register()
df = pd.DataFrame(
{"id": [0, 1], "text": ["Hello, how are you?", "Code. Sleep. Eat"]}
)
df.text.nlp.sentences()
Output
0 [Hello, how are you?]
1 [Code., Sleep., Eat]
Name: text_sentences, dtype: object
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandas-nlp-0.6.0.tar.gz
(6.9 kB
view details)
Built Distribution
File details
Details for the file pandas-nlp-0.6.0.tar.gz
.
File metadata
- Download URL: pandas-nlp-0.6.0.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.8.10 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 823182206672ace92bd85e9a06931c659703dceccd87db0bc09f3e044f461eaa |
|
MD5 | 7ed9100de3408b44cca6aaeffd61ddb4 |
|
BLAKE2b-256 | 8af90d08c1273a532caa0ef27d5e5942ee1961a49c16d3af891aa795883fd517 |
File details
Details for the file pandas_nlp-0.6.0-py3-none-any.whl
.
File metadata
- Download URL: pandas_nlp-0.6.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.8.10 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50fbe2e9d9ecfc80fc0d30c05e8b94bd33ae57d47ae36dc8e022dc25b4322214 |
|
MD5 | f66ea6aa007fabf95dafc864e98238d2 |
|
BLAKE2b-256 | e8b6d4e4a483d288ca2d7805031bbb9c949dadb9406a3b58f1ca4be6a20ac939 |