Computational Linguistic
Project description
compling
Computational Linguistic with Python
compling is a Python module that provides some Natural Language Processing and Computational Linguistics functionalities to work with human language data. It incorporates various Data and Text Mining features from other famous libraries (e.g. spacy, nltk, sklearn, ...) in order to arrange a pipeline aimed at the analysis of corpora of JSON documents.
Documentation
See documentation here.
Installation
You can install compling with:
$ pip install compling
compling requires:
- Python (>= 3.6)
- numpy
- spacy
- nltk
- gensim
- tqdm
- unicodedata2
- unidecode
- configparser_
- vaderSentiment
- wordcloud
You also need to download:
-
a ++spacy language model++
See here the available models. You can choose based on the language of your corpus documents. By default, complig expects you to download sm models. You can still choose to download larger models, but remember to edit the confg.ini file, so it can work properly.Example
Let's assume the language of your documents is English. You could download the spacy small english model:python -m spacy download en_core_web_sm
-
some ++nltk functionalities++:
- stopwords
$ python -m nltk.downloader stopwords
- punkt
$ python -m nltk.downloader punkt
- stopwords
config.ini
The functionalities offered by compling may require a large variety of parameters. To facilitate their use, default values are provided for some parameters:
- some can be changed in the function invocation. Many functions provide optional parameters;
- others are stored in the ++config.ini++ file. This file configures the processing of your corpora. It contains the values of some special parameters. (e.g. the language of documents in your corpus.)
You can see a preview below:
[Corpus]
;The language of documents in your corpus.
language = english
;Documents in your corpus store their text in this key.
text_key = text
;Documents in your corpus store their date values as string in this format.
;For a complete list of formatting directives, see: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior.
date_format = %d/%m/%Y
;The size of spacy model you want it to be used in the text processing
spacy_model_size = md
[Document_record]
;Document records metadata:
;If lower==1, A lowercase version will be stored for each document.
lower = 0
;If lemma==1, A version with tokens replace by their lemma will be stored for each document.
lemma = 0
;If stem==1, A version with tokens replace by their stem will be stored for each document.
stem = 0
;If negations==1, A version where negated token are preceded by 'NOT_' prefix will be stored for each document.
negations = 1
;If named_entities==1, the occurring named entities will be stored in a list for each document.
named_entities = 1
; ...
ConfigManager
compling provides the ConfigManager class to make it easier for you to edit the config.ini file and to help you handling the corpora processing .
example of usage (compling)
You can see a short example of usage at https://github.com/FrancescoPeriti/compling.
See the documentation for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file compling-0.0.38.tar.gz
.
File metadata
- Download URL: compling-0.0.38.tar.gz
- Upload date:
- Size: 5.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8900eb0fbd69e45d1317d50e9acef65d43e968a7c5d9f41650040fb81ebc0150 |
|
MD5 | 3a3ba8175cd079c4373ba149ddbbdc71 |
|
BLAKE2b-256 | 63ff750edd434f00d1fb52fc759f1c101cd45b30333261e4171060b661ffed8b |
File details
Details for the file compling-0.0.38-py3-none-any.whl
.
File metadata
- Download URL: compling-0.0.38-py3-none-any.whl
- Upload date:
- Size: 6.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f16d1f14fd063c8d8bec306d429076aca1c66f0797451dbc31ef2e9c2c36c64d |
|
MD5 | 50ddfb4a6542b61478771d59b01fe19f |
|
BLAKE2b-256 | ef3269aaeeae26275ea0829e9d2fcb84e17c622935fc863388de98fe3be1bfe7 |