Skip to main content

A markov chainer library

Project description

carkov

This is a library for creating and walking simple markov chains. It is meant for things like text generators (such as ebooks bots and word generators) and thus is not 'mathetematically correct'. It has some tools for doing text analysis but more are planned in the future (stubs exist to illustrate some plans, see TODO.md).

Command line interface

This library includes a command line interface to analyzing text and then walk the chain and generate text from the analysis.

To analyze a corpus of text files, thus:

carkov analyze mychain.chain textfile1.txt textfile2.txt ... textfileN.txt

To walk a chain and generate text form it, thus:

carkov chain mychain.chain -c 10

There are two analysis modes currently supported, english and word, which are passed to the analyze method with the -m argument. english mode analyzes the input in a word-wise method: the input is segmented into (English-style) sentences, each of which are analyzed as separate chains of words. word segments the input into tokens, each of which is analyzed as a series of characters separately.

Analysis also allows a window size to be specified, so that each item in the chain may be a fixed series of items of a specific length (for example, the word foo with a window of 2, would analyze to (_, ) -> 'f', (, f) -> o, (f, o) -> o, etc). The wider the window, the more similar or identical to the input stream the output becomes since there are fewer total options to follow any given token. This is specified with the analysis command line with the -w argument.

About Library

The library itself exposes objects and interfaces to do the same as the command line above. A todo item on this project is to generate documentation and examples, but looking at the contents of main.py should be instructive. The library is written in such a way as to be pretty agnostic about the items that are chained, and hypothetically any sequential set of things could work for this. Some framework would have to be written to support displaying these sorts of things but it should be possible if non-textual data were desired.

The library also provides a few mechanisms for serializing a ready to use chain for reuse in other projects. The command line makes use of the binary serialization mechanism (which uses msgpack) to save chains from the analysis step for re-use in the chain step. There is also a mechanism which produces a python source file tthat can be embedded in a target project so that a python project can use the chain without having to include an extra data file. It should be noted that this of course is extremely inefficient for large chains.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carkov-0.1.2.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

carkov-0.1.2-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file carkov-0.1.2.tar.gz.

File metadata

  • Download URL: carkov-0.1.2.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.6

File hashes

Hashes for carkov-0.1.2.tar.gz
Algorithm Hash digest
SHA256 abf70e1377f995d703c908419016801854882f934fe3d578635d97fb4ca69524
MD5 6d333ada4cc0b6cba5f565b04baabc7e
BLAKE2b-256 fc8b01976eee32f5045530ba7464f8d60f985ff30d2983055aacee3f66da6a7f

See more details on using hashes here.

File details

Details for the file carkov-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: carkov-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.6

File hashes

Hashes for carkov-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7bef3778e5b68350ecb250f5eebc8657c293d7df5f1f1cf7e1f6f9e65be37fa4
MD5 8f023ad7a78b9ac30dd3ffba63dbdb07
BLAKE2b-256 d33c9291d7a0f1dea26c6a800cd615ffac456e2aff29b235d6c14e5109dec142

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page