Markify is an open source command line application written in python which scrapes data from your social media(s) (ie reddit, discord, and twitter for now) and generates new setences based on them using markov chains. For more information, please visit https://github.com/msr8/markify

These details have not been verified by PyPI

Project links

Project description

Github | PyPi

Index

Introduction
Installation
Usage
Flags
How does this work?
FAQs

Introduction

Installation

There are many methods to install markify on your device, such as:

1) Install the pip package

(Reccomended)

python -m pip install markify

2) Install it via pip and git

python -m pip install git+https://github.com/msr8/markify.git

3) Clone the repo and install the package

git clone https://github.com/msr8/markify
cd markify
python setup.py install

4) Clone the repo and run markify without installing to PATH

git clone https://github.com/msr8/markify
cd markify
python -m pip install -r requirements.txt
cd src
python markify.py

Usage

To use, you can simply just run markify on the command line, but we gotta setup a config file first. If you're windows, the default location for the config file is %LOCALAPPDATA%\markify\config.json, and on linux/macOS it is ~/.config/markify/config.json. Alterantively, you can provide the path to the config file using the -c --config flag. If you run the program and the config file doesn't exist, it makes an empty template. An ideal config file should look like:

{
    "reddit": {
        "username"     : "..."
    },
    "discord": {
        "token"        : "..."
    },
    "twitter": {
        "username"     : "..."
    }
}

where the username under reddit section is your reddit username, token under discord is your discord token, and username under twitter is your twitter username. If any of them are not given, the program will skip the collection process for that social media

Flags

You can view the available flags by running markify --help. It should show the following text:

  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        The path to config file. By default, its {LOCALAPPDATA}/markify/config.json on
                        windows, and ~/.config/markify/config.json on other operating systems
  -d DATA, --data DATA  The path to the json data file. If given, the program will not scrape any data and
                        will just compile the model and generate sentences
  -n NUMBER, --number NUMBER
                        Number of sentences to generate. Default is 50
  -v, --version         Print out the version number

More explanation is given below:

-c --config

This is the path to the config file (config.json). By default, its {LOCALAPPDATA}/markify/config.json on windows, and ~/.config/markify/config.json on other operating systems. For example:

markify -c /Users/tyrell/Documents/config.json

-d --data

This is the path to the data file containing all the scraped content. If it is given, the program doesn't scrape any data and just complies a model based on the data present in the file. By default, a new data file is generated in the DATA folder in the config folder and is named x.json where x is the current epoch time in seconds. For example:

markify -d /Users/tyrell/.config/markify/DATA/1658433988.json

-n --number

This is the number of sentences to generate after compiling the model. Default is 50. For example:

markify -n 20

-v --version

Print out the version of markify you're using via this flag. For example:

markify -v

How does this work?

This program has 4 main parts: Scraping reddit comments, scraping discord messages, scraping tweets, generating sentences using markov chains. More explanation is given below

Scraping reddit comments

The program uses the Pushshift's API to scrape your comments. Since Pushshift can only return 100 comments at a time, the program gets the timestamp of the oldest comment and then sends a request to the API to get comments before that timestamp. This loop goes on until either all your comments are scraped, or 10000 comments are scraped. I chose to use Pushshift's API since its faster, yeilds more result, and doesnt need a client ID or secret

Scraping discord messages

To scrape discord messages, first the program checks if the token is valid or not by getting basic information (username, discriminator, and account ID) through the /users/@me endpoint. Then it gets all the DM channels you have participated in through the /@me/channels endpoint. Then it extracts the channel IDs from the response and gets the recent 100 messages in the channels using the /channels/channelid/messages endpoint, where channelid is the channel ID. Then it goes through the respone and adds the messages which are a text message, sent by you, and arent empty, to the data file

Scraping tweets

The program uses the snscrape module to scrape your tweets. The program keeps scraping your tweets until either it has scraped all the tweets, or has scraped 10000 tweets

Generating sentences using markov chains

The program extracts all the useful texts from the data file and makes a markov chain model based on them using the markovify module. Then the program generates new sentences (default being 50) and prints them out

FAQs

Q) How do I get my discord token?

Recently (as of July 2022), discord reworked its system of tokens and the format of the new tokes is a bit different. You can obtain your discord token using this guide

Q) The program is throwing an error and is telling me to install "averaged_perceptron_tagger" or something. What to do?

Running the command given below should work

python3 -c "import nltk; nltk.download('averaged_perceptron_tagger')"

You can visit this page for more information

Q) The installation is stuck at building lxml. What to do?

Sadly, all you can do is wait. It is a known issue with lxml

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.4

Jan 30, 2023

1.0.3

Jan 30, 2023

1.0.2

Aug 24, 2022

1.0.1

Aug 22, 2022

1.0.0

Aug 22, 2022

0.0.17

Aug 18, 2022

0.0.16

Aug 6, 2022

0.0.15

Aug 6, 2022

0.0.14

Aug 6, 2022

0.0.13

Aug 6, 2022

0.0.12

Aug 6, 2022

0.0.11

Aug 6, 2022

0.0.10

Aug 4, 2022

0.0.9

Aug 1, 2022

0.0.8

Jul 27, 2022

0.0.7

Jul 25, 2022

0.0.6

Jul 23, 2022

0.0.5

Jul 21, 2022

0.0.4

Jul 20, 2022

0.0.3

Jul 20, 2022

0.0.2

Jul 20, 2022

0.0.1

Jul 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markify-1.0.4.tar.gz (26.7 kB view details)

Uploaded Jan 30, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

markify-1.0.4-py3-none-any.whl (24.7 kB view details)

Uploaded Jan 30, 2023 Python 3

File details

Details for the file markify-1.0.4.tar.gz.

File metadata

Download URL: markify-1.0.4.tar.gz
Upload date: Jan 30, 2023
Size: 26.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.9

File hashes

Hashes for markify-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`d06a55681ce3095ecf4147d3869f08e69c2c5002821fc452516f437b01339c49`
MD5	`3ea9e3577616feb6949537be58c09c70`
BLAKE2b-256	`d43252a07d50961be96044e672426192f176a65c442796e8e5e57caa05ecfd25`

See more details on using hashes here.

File details

Details for the file markify-1.0.4-py3-none-any.whl.

File metadata

Download URL: markify-1.0.4-py3-none-any.whl
Upload date: Jan 30, 2023
Size: 24.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.9

File hashes

Hashes for markify-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84ddf53498c6b54196aff400e952884ad4bf1e7ebd64b3e8f358eab26889cd56`
MD5	`72c0575d4115dbf27fc8a14c13070569`
BLAKE2b-256	`223594ebce0f8b217c884f2cf704db3d36f9eba522823df715d3bed4bf462a03`

See more details on using hashes here.

markify 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Classifiers

Project description

Index

Introduction

Installation

1) Install the pip package

2) Install it via pip and git

3) Clone the repo and install the package

4) Clone the repo and run markify without installing to PATH

Usage

Flags

-c --config

-d --data

-n --number

-v --version

How does this work?

Scraping reddit comments

Scraping discord messages

Scraping tweets

Generating sentences using markov chains

FAQs

Q) How do I get my discord token?

Q) The program is throwing an error and is telling me to install "averaged_perceptron_tagger" or something. What to do?

Q) The installation is stuck at building lxml. What to do?

Project details

Verified details

Maintainers

Unverified details

Project links

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes