A Python package for exploratory analysis of text data

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Arabica

A Python package for exploratory analysis of text data

Text data is often recorded as a time series with significant variability over time. Some examples of time-series text data include Twitter tweets, product reviews, and newspaper headlines. Arabica provides functions to make the exploratory analysis of such datasets simple.

Arabica provides these methods:

arabica_freq: calculates unigram, bigram, and trigram frequencies over a period (year, month)

It can apply all or a selected combination of the following cleaning operations:

Remove digits from the text
Remove punctuations from the text
Remove standard list of stopwords

arabica uses clean-text for punctuation cleaning and nltk corpus of stopwords.

Installation

Arabica requires Python 3, NLTK, and clean-text, to execute. To install using pip, use:

pip install arabica

Usage

Import the library:

from arabica import arabica_freq

Choose a method:

Arabica returns a dataframe with aggregated unigrams, bigrams, and trigrams frequencies over a period. To remove stopwords, select aggregation period, and choose a specific set of cleaning operations:

def arabica_freq(text: str, # Text
                 time: str, # Time
                 stopwords: str, # Language for stop words
                 punct: bool = False, # Remove all punctuations
                 max_words: int='', # Max number for unigrams, bigrams and trigrams displayed
                 time_freq: str='', # Aggregation period, 'Y'/'M'
                 numbers: bool = False # Remove all digits
)

Example

import pandas as pd
from arabica import arabica_freq

data = pd.DataFrame({'text': ['The ordering process was very easy & straight forward. They have great customer service and sorted any issues out very quickly.',
                              'So far seems to be the wrong product for me :-/',
                              'Excellent, service, thank you really, really, really much!!!'],
                     'time': ['2013-08-8', '2013-09-8','2014-10-8']})

arabica_freq(text= data['text'],time=data['time'],time_freq='M',max_words=2,stopwords='english', numbers = True, punct=True)

Tutorial

For more examples of coding, read a tutorial here.

License

MIT

For any questions, issues, bugs, and suggestions, please visit here

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.8.2

Nov 23, 2024

1.8.1

Jul 27, 2024

1.8.0

Jul 27, 2024

1.7.9

Jul 26, 2024

1.7.8

Jul 26, 2024

1.7.7

Dec 15, 2023

1.7.6

Oct 23, 2023

1.7.4

Oct 3, 2023

1.7.2

Sep 10, 2023

1.7.1

Aug 20, 2023

1.7.0

Aug 16, 2023

1.6.9

Aug 2, 2023

1.6.8

Jul 5, 2023

1.6.7

Jun 29, 2023

1.6.6

Jun 29, 2023

1.6.5

Jun 28, 2023

1.6.4

Jun 24, 2023

1.6.3

Jun 22, 2023

1.6.2

Jun 17, 2023

1.6.1

Jun 17, 2023

1.6.0

Jun 15, 2023

1.5.2

May 20, 2023

1.5.0

May 18, 2023

1.4.9

Apr 29, 2023

1.4.8

Apr 29, 2023

1.4.7

Apr 21, 2023

1.4.6

Apr 17, 2023

1.4.5

Apr 17, 2023

1.4.4

Apr 17, 2023

1.4.3

Apr 16, 2023

1.4.2

Apr 16, 2023

1.4.1

Mar 21, 2023

1.4.0

Mar 20, 2023

1.3.9

Mar 19, 2023

1.3.8

Mar 14, 2023

1.3.6

Mar 10, 2023

1.3.5

Mar 4, 2023

1.2.2

Feb 17, 2023

1.2.1

Jan 20, 2023

1.2.0

Jan 20, 2023

1.1.9

Jan 3, 2023

1.1.8

Jan 2, 2023

1.1.7

Dec 26, 2022

1.1.6

Dec 24, 2022

1.1.5

Dec 22, 2022

1.1.4

Dec 20, 2022

1.1.3

Dec 19, 2022

1.1.2

Dec 19, 2022

1.1.1

Dec 16, 2022

1.0.5

Dec 10, 2022

1.0.4

Nov 28, 2022

1.0.3

Nov 12, 2022

1.0.2

Oct 20, 2022

1.0.1

Oct 18, 2022

1.0.0

Oct 17, 2022

0.0.5

Sep 11, 2022

This version

0.0.4

Sep 9, 2022

0.0.3

Sep 8, 2022

0.0.2

Sep 8, 2022

0.0.1

Sep 8, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arabica-0.0.4.tar.gz (5.7 kB view details)

Uploaded Sep 9, 2022 Source

Built Distribution

arabica-0.0.4-py3-none-any.whl (6.5 kB view details)

Uploaded Sep 9, 2022 Python 3

File details

Details for the file arabica-0.0.4.tar.gz.

File metadata

Download URL: arabica-0.0.4.tar.gz
Upload date: Sep 9, 2022
Size: 5.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for arabica-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`a3213d29416b71e6d226d1245c499b017ad42c5e67021e4c2fe957d6ec63b0b0`
MD5	`0fb344c353cb28db4983c73673b1a998`
BLAKE2b-256	`a473b09119336a81258cc8e60e7348db7586b4c0925f8d24c859786f29956f2f`

See more details on using hashes here.

File details

Details for the file arabica-0.0.4-py3-none-any.whl.

File metadata

Download URL: arabica-0.0.4-py3-none-any.whl
Upload date: Sep 9, 2022
Size: 6.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for arabica-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`78bc3812e325ddd75eb1c91288f50752817b465c34e225b985f2d8a1676ed5ac`
MD5	`eb02ef96de54776885ce4c28d12aa041`
BLAKE2b-256	`af7db49e9108021d261b79e84b53401658a668ad0fcf8a0bdc5efc5fac10aca2`

See more details on using hashes here.

arabica 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Arabica

Installation

Usage

Example

Tutorial

License

MIT

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes