A library for preprocessing.

These details have not been verified by PyPI

Project links

Homepage

Project description

A library for processing text data

cophi is a Python library for handling, modeling and processing text corpora. You can easily pipe a collection of text files using the high-level API:

corpus, metadata = cophi.corpus(directory="british-fiction-corpus",
                                filepath_pattern="**/*.txt",
                                encoding="utf-8",
                                lowercase=True,
                                token_pattern=r"\p{L}+\p{P}?\p{L}+")

You can also plug the DARIAH-DKPro-Wrapper into this pipeline to lemmatize text, or just keep certain word types.

Check out the introducing Jupyter notebook.

Getting started

To install the latest stable version:

$ pip install cophi

To install the latest development version:

$ pip install --upgrade git+https://github.com/cophi-wue/cophi-toolbox.git@testing

Available complexity measures

There are also a plenty of complexity metrics for measuring the lexical richness of (literary) texts.

Measures that use sample size and vocabulary size:

Type-Token Ratio TTR
Guiraud’s R
Herdan’s C
Dugast’s k
Maas’ a²
Dugast’s U
Tuldava’s LN
Brunet’s W
Carroll’s CTTR
Summer’s S

Measures that use part of the frequency spectrum:

Honoré’s H
Sichel’s S
Michéa’s M

Measures that use the whole frequency spectrum:

Entropy S
Yule’s K
Simpson’s D
Herdan’s V_m

Parameters of probabilistic models:

Orlov’s Z

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.3.2

Apr 25, 2019

1.3.1

Apr 25, 2019

1.3.0

Apr 25, 2019

1.2.3

Apr 14, 2019

1.2.2

Apr 14, 2019

1.2.1

Apr 14, 2019

1.1.1

Jan 24, 2019

1.1.0

Dec 23, 2018

1.0.10

Dec 23, 2018

1.0.9

Dec 23, 2018

1.0.8

Dec 23, 2018

1.0.7

Nov 24, 2018

1.0.6

Sep 30, 2018

1.0.5

Sep 4, 2018

1.0.4

Sep 4, 2018

1.0.3

Sep 4, 2018

1.0.2

Sep 4, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cophi-1.3.2.tar.gz (14.5 kB view details)

Uploaded Apr 25, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cophi-1.3.2-py3-none-any.whl (17.1 kB view details)

Uploaded Apr 25, 2019 Python 3

File details

Details for the file cophi-1.3.2.tar.gz.

File metadata

Download URL: cophi-1.3.2.tar.gz
Upload date: Apr 25, 2019
Size: 14.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.2

File hashes

Hashes for cophi-1.3.2.tar.gz
Algorithm	Hash digest
SHA256	`ffefc3997105dbd93dd8403c0bd7a452f5516d97d2119648dd135f765dce7e33`
MD5	`67c4b2a3af54300000e60b2108b298b7`
BLAKE2b-256	`20df520517d7092c8a579c8edab8c919f291d0b3a204e8116557e5977afd9b79`

See more details on using hashes here.

File details

Details for the file cophi-1.3.2-py3-none-any.whl.

File metadata

Download URL: cophi-1.3.2-py3-none-any.whl
Upload date: Apr 25, 2019
Size: 17.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.2

File hashes

Hashes for cophi-1.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bafa4a504b700fd098d6a801c3aa4c4fa8e670f38c66f23a04953f04fc252272`
MD5	`7d2fd24fa1fb0d57e0cbcd16c75a35f6`
BLAKE2b-256	`97e7fb9fd78982253a9950e5ca2618a5755a58b1b5b28380bc34381a4bf3aa46`

See more details on using hashes here.

cophi 1.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

A library for processing text data

Getting started

Available complexity measures

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes