Analyze characters in fictional texts.

These details have not been verified by PyPI

Project links

Homepage

Project description

Buskin

Buskin is a python package for analyzing various attributes of characters in fictional texts. This was developed as part of a project for the terrific Computational Humanities course at UC Berkeley. Buskin's pipeline utilizes state-of-the-art techniques in processing the text (to obtain Emotions, Characters, Character Arcs, Patient-agent-predicatives, Part-of-speech tags,etc.)

We created this package to understand character arcs from various novels, but we hope it will reduce the effort to get started in analyzing fictional text for any purpose. We hope that Buskin makes it easier to peel open any novel and the characters within, in all their idiosyncrasies. Over time, we intend to add more features to the package in pursuit of that goal. Also, this is very much a work in progress. We appreciate any feedback, or contribution to the project!

“Plot is no more than footprints left in the snow after your characters have run by on their way to incredible destinations.” ― Ray Bradbury, Zen in the Art of Writing

Contributors : nuwandavek, Dmacracy

Usage

Buskin needs Pytorch (>=1.4) which can be installed from here. Once that's done, Buskin can be installed with Pip by :

pip install buskin

Buskin requires spacy, torch and huggingface's transformers among other dependencies. So installation might take a while.

Many examples can be found in the example_notebooks directory.

Functions

parse_book

parse_book(book_path, batch_size=None, threads=None, max_chunk_size=None, pipeline=None, model=None, tokenizer=None)

Description : Parse a fictional text

Parameters :

book_path : str : Path to the .txt file of the book
batch_size : int, optional : Batch size of sentences for emotion classification (default = 8)
threads : int, optional : Number of threads to be used in the processing of chunks (default = 5). The larger the number of threads, the faster the processing; but this might fill up the memory since neural coreference is memory heavy
max_chunk_size : int, optional : Max size of a chunk that the text is divided into (default = 10k). The larger the chunks, the better the corefernce, but, memory is a constraint.
pipeline : Spacy Pipeline, optional : This is used to process the text tokens to obtain the POS tags, etc. If not provided, a default pipeline is initialized.
model : HuggingFace BertForSequenceClassification model, optional : Model used to obtain emotion for sentences. If not provided, a default model is initialized.
tokenizer : HuggingFace BertTokenizer, optional : Tokenizer used for the emotion model. If not provided, a default tokenizer is initialized.

Returns:

Book : An instance of the Book class

load_default_models

load_default_models()

Description : Explicitly initialize the pipeline, model and tokenizer in case a batch of books are parses and you want to avoid initializing for each book.

Returns :

nlp : Spacy Pipeline, optional : This is used to process the text tokens to obtain the POS tags, etc.
model : HuggingFace BertForSequenceClassification model, optional : Model used to obtain emotion for sentences.
tokenizer : HuggingFace BertTokenizer, optional : Tokenizer used for the emotion model.

Classes

Book

Book(book_path=None, sentences=None, characters=None)

Attribute	Type	Description
book_path	`str`	Path to the book text file
sentences	List of `Sentence`	List of all sentences in the fictional text
characters	List of `Character`	List of all characters in the fictional text

Sentence

Sentence(sentence_id=None, cluster_id=None, global_token_start=None, text=None, token_tags=None, emotion_tags=None)

Attribute	Type	Description
sentence_id	`int`	ID of the sentence
cluster_id	`int`	ID of the sentence cluster used for coreference resolution
global_token_start	`int`	ID of the token
text	`str`	Text in the sentence
token_tags	List of `TokenTags`	List of all tags for each token in the sentence
emotion_tags	List of `Emotion`	List of all emotions for each token in the sentence

Character

Character(rank=None, name=None, mentions=None, agents=None, patients=None, predicatives=None)

Attribute	Type	Description
rank	`int`	Rank of the character (1 = most mentioned character)
name	`str`	Name of the character
mentions	List of `Occurrence`	List of all occurrences of the character mentions
agents	List of `Occurrence`	List of all occurrences of the character agent verbs
patients	List of `Occurrence`	List of all occurrences of the character patient verbs
predicatives	List of `Occurrence`	List of all occurrences of the character predicatives

TokenTags

TokenTags(token_id=None, token_global_id=None, token=None, lemma=None, pos=None, tag=None, dep=None, head_global_id=None)

Attribute	Type	Description
token_id	`int`	ID of the token
token_global_id	`int`	Global ID of the token
token	`str`	Text of the token
lemma	`str`	Lemma of the token
pos	`str`	Part of Speech of the token
tag	`str`	POS Tag of the token
dep	`str`	Dependency Parse tag of the token
head_global_id	`int`	ID of the parse-head of the token

Emotion

Emotion(emotion=None, mini_emotion=None, probability=None)

Attribute	Type	Description
emotion	`str`	Emotion of the sentence (28 values)
mini_emotion	`str`	Reduced Emotion of the sentence (3 values)
probability	`float`	Probability of the emotion [0,1]

Occurrence

Occurrence(text=None, sentence_id=None, cluster_id=None, start=None, end=None)

Attribute	Type	Description
text	`str`	Text of the occurrence
sentence_id	`int`	ID of the sentence with the occurrence
cluster_id	`int`	ID of the cluster with the occurrence
start	`int`	Start token ID of the occurrence
end	`int`	End token ID of the occurrence

Icon made by Freepik from www.flaticon.com

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

Dec 24, 2020

0.1.0

Dec 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buskin-0.1.1.tar.gz (12.0 kB view details)

Uploaded Dec 24, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

buskin-0.1.1-py2.py3-none-any.whl (9.7 kB view details)

Uploaded Dec 24, 2020 Python 2Python 3

File details

Details for the file buskin-0.1.1.tar.gz.

File metadata

Download URL: buskin-0.1.1.tar.gz
Upload date: Dec 24, 2020
Size: 12.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for buskin-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`03154050ac9e845b625921efcb6549089d80d23cfd563f5c4f5c21c41d261140`
MD5	`11c65b79b1bb9b9c4cf2c8526902f638`
BLAKE2b-256	`29fc7e60d79887f3a57676c41b0af59934b7b9a25449f57575ee7209d30d7e0d`

See more details on using hashes here.

File details

Details for the file buskin-0.1.1-py2.py3-none-any.whl.

File metadata

Download URL: buskin-0.1.1-py2.py3-none-any.whl
Upload date: Dec 24, 2020
Size: 9.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for buskin-0.1.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e53bc1b64a38ac841662c904454fe121a712ed5a9e915109528c5df727bdab2`
MD5	`6e7b378601001211502f6b9cd222f875`
BLAKE2b-256	`b936a458b48fa4b4b201d5d7dc886e7bc653ed98a3fb1c8a767559d4b6377221`

See more details on using hashes here.

buskin 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Buskin

Usage

Functions

parse_book

load_default_models

Classes

Book

Sentence

Character

TokenTags

Emotion

Occurrence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes