Skip to main content

Computational Linguistic

Project description

compling

Computational Linguistic with Python

Build Status

compling is a Python module that provides some Natural Language Processing and Computational Linguistics functionalities to work with human language data. It incorporates various Data and Text Mining features from other famous libraries (e.g. spacy, nltk, sklearn, ...) in order to arrange a pipeline aimed at the analysis of corpora of JSON documents.

Documentation

See documentation here.

Installation

You can install compling with:

$ pip install compling

compling requires:

  • Python (>= 3.6)
  • numpy
  • spacy
  • nltk
  • gensim
  • tqdm
  • unicodedata2
  • unidecode
  • configparser_
  • vaderSentiment
  • wordcloud

You also need to download:

  • a ++spacy language model++
    See here the available models. You can choose based on the language of your corpus documents. By default, complig expects you to download sm models. You can still choose to download larger models, but remember to edit the confg.ini file, so it can work properly.

    Example
    Let's assume the language of your documents is English. You could download the spacy small english model:

    python -m spacy download en_core_web_sm
    
  • some ++nltk functionalities++:

    • stopwords
      $ python -m nltk.downloader stopwords
      
    • punkt
      $ python -m nltk.downloader punkt
      

config.ini

The functionalities offered by compling may require a large variety of parameters. To facilitate their use, default values are provided for some parameters:

  • some can be changed in the function invocation. Many functions provide optional parameters;
  • others are stored in the ++config.ini++ file. This file configures the processing of your corpora. It contains the values of some special parameters. (e.g. the language of documents in your corpus.)

You can see a preview below:

[Corpus]
;The language of documents in your corpus.
language = english

;Documents in your corpus store their text in this key.
text_key = text

;Documents in your corpus store their date values as string in this format.
;For a complete list of formatting directives, see: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior.
date_format = %d/%m/%Y

;The size of spacy model you want it to be used in the text processing
spacy_model_size = md

[Document_record]
;Document records metadata:

;If lower==1, A lowercase version will be stored for each document.
lower = 0

;If lemma==1, A version with tokens replace by their lemma will be stored for each document.
lemma = 0

;If stem==1, A version with tokens replace by their stem will be stored for each document.
stem = 0

;If negations==1, A version where negated token are preceded by 'NOT_' prefix will be stored for each document.
negations = 1

;If named_entities==1, the occurring named entities will be stored in a list for each document.
named_entities = 1
; ...
ConfigManager

compling provides the ConfigManager class to make it easier for you to edit the config.ini file and to help you handling the corpora processing .

example of usage (compling)

You can see a short example of usage at https://github.com/FrancescoPeriti/compling.

See the documentation for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compling-0.0.38.tar.gz (5.9 MB view details)

Uploaded Source

Built Distribution

compling-0.0.38-py3-none-any.whl (6.3 MB view details)

Uploaded Python 3

File details

Details for the file compling-0.0.38.tar.gz.

File metadata

  • Download URL: compling-0.0.38.tar.gz
  • Upload date:
  • Size: 5.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.6

File hashes

Hashes for compling-0.0.38.tar.gz
Algorithm Hash digest
SHA256 8900eb0fbd69e45d1317d50e9acef65d43e968a7c5d9f41650040fb81ebc0150
MD5 3a3ba8175cd079c4373ba149ddbbdc71
BLAKE2b-256 63ff750edd434f00d1fb52fc759f1c101cd45b30333261e4171060b661ffed8b

See more details on using hashes here.

File details

Details for the file compling-0.0.38-py3-none-any.whl.

File metadata

  • Download URL: compling-0.0.38-py3-none-any.whl
  • Upload date:
  • Size: 6.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.6

File hashes

Hashes for compling-0.0.38-py3-none-any.whl
Algorithm Hash digest
SHA256 f16d1f14fd063c8d8bec306d429076aca1c66f0797451dbc31ef2e9c2c36c64d
MD5 50ddfb4a6542b61478771d59b01fe19f
BLAKE2b-256 ef3269aaeeae26275ea0829e9d2fcb84e17c622935fc863388de98fe3be1bfe7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page