Skip to main content

Computational Linguistic

Project description

compling

Computational Linguistic with Python

Build Status

compling is a Python module that provides some Natural Language Processing and Computational Linguistics functionalities to work with human language data. It incorporates various Data and Text Mining features from other famous libraries (e.g. spacy, nltk, sklearn, ...) in order to arrange a pipeline aimed at the analysis of corpora of JSON documents.

Documentation

See documentation here.

Installation

You can install compling with:

$ pip install compling

compling requires:

  • Python (>= 3.6)
  • numpy
  • spacy
  • nltk
  • gensim
  • tqdm
  • unicodedata2
  • unidecode
  • configparser_
  • vaderSentiment
  • wordcloud

You also need to download:

  • a ++spacy language model++
    See here the available models. You can choose based on the language of your corpus documents. By default, complig expects you to download sm models. You can still choose to download larger models, but remember to edit the confg.ini file, so it can work properly.

    Example
    Let's assume the language of your documents is English. You could download the spacy small english model:

    python -m spacy download en_core_web_sm
    
  • some ++nltk functionalities++:

    • stopwords
      $ python -m nltk.downloader stopwords
      
    • punkt
      $ python -m nltk.downloader punkt
      

config.ini

The functionalities offered by compling may require a large variety of parameters. To facilitate their use, default values are provided for some parameters:

  • some can be changed in the function invocation. Many functions provide optional parameters;
  • others are stored in the ++config.ini++ file. This file configures the processing of your corpora. It contains the values of some special parameters. (e.g. the language of documents in your corpus.)

You can see a preview below:

[Corpus]
;The language of documents in your corpus.
language = english

;Documents in your corpus store their text in this key.
text_key = text

;Documents in your corpus store their date values as string in this format.
;For a complete list of formatting directives, see: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior.
date_format = %d/%m/%Y

;The size of spacy model you want it to be used in the text processing
spacy_model_size = md

[Document_record]
;Document records metadata:

;If lower==1, A lowercase version will be stored for each document.
lower = 0

;If lemma==1, A version with tokens replace by their lemma will be stored for each document.
lemma = 0

;If stem==1, A version with tokens replace by their stem will be stored for each document.
stem = 0

;If negations==1, A version where negated token are preceded by 'NOT_' prefix will be stored for each document.
negations = 1

;If named_entities==1, the occurring named entities will be stored in a list for each document.
named_entities = 1
; ...
ConfigManager

compling provides the ConfigManager class to make it easier for you to edit the config.ini file and to help you handling the corpora processing .

example of usage (compling)

You can see a short example of usage at https://github.com/FrancescoPeriti/compling.

See the documentation for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compling-0.0.38.tar.gz (5.9 MB view hashes)

Uploaded Source

Built Distribution

compling-0.0.38-py3-none-any.whl (6.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page