Skip to main content

Ingest corpuses of text and output a sentence generated from markov chains

Project description

https://badge.fury.io/py/helga-markovify.png https://travis-ci.org/narfman0/helga-markovify.png?branch=master

Ingest corpuses of text and output a sentence generated from markov chains. Helga will now listen to your IRC channel and ingest dialog along the way, learning to speak your lingo. You may jump start this with ‘logs’ ingestion or any other ingestion technique.

Installation

After installing and configuring helga, use:

pip install helga-markovify

Add markovify to your settings and restart helga. To use twitter timelines, you must also add the following to settings (with your credentials):

TWITTER_CONSUMER_KEY = 'asad'
TWITTER_CONSUMER_SECRET = 'sdfs'
TWITTER_ACCESS_TOKEN = 'fghf'
TWITTER_ACCESS_SECRET = 'ghjg'

Usage

Note: Please use punctuation in your text. This is a tough sticking point in practice, but it is important to be able to differentiate sentences.

Command syntax:

ingest <topic> <learning_type> <learning_type_source>
generate <topic>
drop <topic>

Arguments

topic: like tagging, so helga can respond in different ways

learning_type: how helga is going to ingest. Can be text, a url to raw data, a url to dpaste, or a twitter account.

learning_type_source: the corresponding data e.g. plaintext if learning_type is “text”, a url if “url”, twitter screen name if “twitter”, helga_log_reader arguments for “logs” (suggest you go old and use current channel)

The ingest command teaches the bot about the topic from the referenced corpus. You may teach the bot from any number of sources, it can be twitter or text. Mix and matching is fine.

The generate command generates a sentence from the corpus.

The drop command drops a particular topic from storage. If a corpus becomes corrupt for whatever reason, a user may drop it and re-ingest data to populate it again.

Settings

MARKOVIFY_ADD_PUNCTUATION: If we should add periods after lines. You always want good punctuation for good generated sentences, this is a pretty safe “True” by default.

MARKOVIFY_CHANNEL_LISTEN: Ingest current channel chatter. A bit expensive, and possibly not great privacy-wise, but that’s where the lol-train arrives.

MARKOVIFY_CHANNEL_GENERATE: Regex helga listens to to generate response for default channel chatter

MARKOVIFY_TOPIC_DEFAULT: Default ingestion topic for channel data

TWITTER_CONSUMER_KEY TWITTER_CONSUMER_SECRET TWITTER_ACCESS_TOKEN TWITTER_ACCESS_SECRET: if using twitter, you’ll want these from your configured twitter apps.

Examples

The following are different ways you may usage helga-markovify. Most are different ways to ingest/learn data.

URL

!markovify ingest zen url https://hg.python.org/peps/raw-file/tip/pep-0020.txt
!markovify generate zen
helga> Sparse is better than ugly.

Text

!markovify ingest hitler text "Mein Kampf is the best Kampf."
!markovify ingest hitler text "Don't be stupid, be a smarty. Come and join the nazi party."
!markovify ingest hitler text "Make America hate again."
!markovify ingest hitler text "Kampf America is hate nazi smarty. Hate party again filler sentence. America is the best at being terrible."
!markovify generate hitler
helga> Mein Kampf is the best at being terrible.

dpaste

!markovify ingest zen dpaste http://dpaste.com/1JF2P4S
!markovify generate zen
helga> If the implementation is hard to explain, it may be a good idea.

Twitter

!markovify ingest narf twitter narfman0
!markovify generate narf
helga> You won't believe this one weird trick to get the target populace hooked.
!markovify generate narf
helga> FOSS: it only takes one highly incentivized dealer to get 4057$ a month doing nothing!

Channel logs

!markovify ingest channel logs --channel #bots --start_date 1999-01-01
!markovify generate channel
helga> dropbox serving it does ASAP

helga, thoughts?
helga> it could be a crackhead, who wants to haskell bees

Drop corpus

If you have somehow screwed up or broken a corpus, you may drop it completely:

!markovify drop zen

TODO

  • Travis

  • Talk about specific topics

  • Keep history aka conversations

  • Weighted round-robin type conversation

License

Copyright (c) 2016 Jon Robison

See included LICENSE for licensing information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helga-markovify-1.1.1.tar.gz (55.9 kB view details)

Uploaded Source

Built Distribution

helga_markovify-1.1.1-py2-none-any.whl (7.7 kB view details)

Uploaded Python 2

File details

Details for the file helga-markovify-1.1.1.tar.gz.

File metadata

File hashes

Hashes for helga-markovify-1.1.1.tar.gz
Algorithm Hash digest
SHA256 2e8ee88e00cfc1e5feefc078e5f8e19ec6edf5a2bcf4b64a9210a51b160e088a
MD5 23a72d30e16a34d08695b35959bb8bf1
BLAKE2b-256 0ab5779a6a26e50258a38bb969d70194e2f9792ffb1e2d03612e27f73960e7ec

See more details on using hashes here.

File details

Details for the file helga_markovify-1.1.1-py2-none-any.whl.

File metadata

File hashes

Hashes for helga_markovify-1.1.1-py2-none-any.whl
Algorithm Hash digest
SHA256 040ad6351266a0ae7dd6e030d9cc4eab275c407e18ac64bc490c7ae696675899
MD5 1a72856099b6c373552ded1c3f29b67a
BLAKE2b-256 9d51d49d94fc67fd54c69a3a704cd43c7db62731c5b036a6ece336c4e8e4daad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page