Skip to main content

Calculates Burrows Delta

Project description

Burrows Delta

By Thomas Wood, https://freelancedatascientist.net, Fast Data Science https://fastdatascience.com

Source code at https://github.com/woodthom2/faststylometry

Python library for calculating the Burrows Delta.

Burrows’ Delta is an algorithm for comparing the similarity of the writing styles of documents, known as forensic stylometry https://fastdatascience.com/how-you-can-identify-the-author-of-a-document

Requirements

Python 3.6 and above

Installation

pip install localspelling

Usage examples

Demonstration of Burrows’ Delta on a small corpus downloaded from Project Gutenberg.

We will test the Burrows’ Delta code on two “unknown” texts: Sense and Sensibility by Jane Austen, and Villette by Charlotte Bronte. Both authors are in our training corpus.

You can get the training corpus by cloning https://github.com/woodthom2/faststylometry, the data is in faststylometry/data.

Example 1

from faststylometry.util import load_corpus_from_folder
from faststylometry.en import tokenise_remove_pronouns_en
from faststylometry.burrows_delta import calculate_burrows_delta

train_corpus = load_corpus_from_folder("faststylometry/data/train")

train_corpus.tokenise(tokenise_remove_pronouns_en)

test_corpus_sense_and_sensibility = load_corpus_from_folder("faststylometry/data/test", pattern="sense")

test_corpus_sense_and_sensibility.tokenise(tokenise_remove_pronouns_en)

calculate_burrows_delta(train_corpus, test_corpus_sense_and_sensibility)

returns a Pandas dataframe of Burrows’ Delta scores

Example 2: using the probability calibration functionality, you can calculate the probability of two books being by the same author.

from faststylometry.probability import predict_proba, calibrate
calibrate(train_corpus)
predict_proba(train_corpus, test_corpus_sense_and_sensibility)

outputs a Pandas dataframe of probabilities.

Who to contact

Thomas Wood at Fast Data Science https://fastdatascience.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faststylometry-0.2.tar.gz (4.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page