Skip to main content

No project description provided

Project description

SaysWho

SaysWho is a Python package for identifying and attributing quotes in text. It uses a combination of grammar and logic to find quotes and their speakers, then uses a coreferencing model to better clarify who is speaking.

Installation

Install and update using pip:

$ pip install sayswho

You will probably need to install the coreferencing model manually, then re-update SpaCy.

$ pip install pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl
$ pip install spacy -U

You also might need to download the main large SpaCy english model.

$ spacy download en_core_web_lg

Notes

  • Coreferencing (which this package uses to resolve speakers) is an experimental feature not fully implemented in SpaCy. SaysWho uses a pretrained model which SpaCy made available, but it was built with older code. This explains the version roulette required on installation.

A Simple Example

from sayswho.sayswho import Attributor
from sayswho import quote_helpers
test_text = open("./tests/qa_test_file.txt").read()
test_text = quote_helpers.prep_text_for_quote_detection(test_text)

Instantiate Attributor and run .attribute on target text.

This will probably raise an warning about differing SpaCy models, which you can ignore.

a = Attributor()
a.attribute(test_text)
/usr/local/lib/python3.11/site-packages/spacy/util.py:910: UserWarning: [W095] Model 'en_coreference_web_trf' (3.4.0a2) was trained with spaCy v3.3 and may not be 100% compatible with the current version (3.6.0). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
/usr/local/lib/python3.11/site-packages/spacy/util.py:910: UserWarning: [W095] Model 'en_core_web_lg' (3.5.0) was trained with spaCy v3.5 and may not be 100% compatible with the current version (3.6.0). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)

See speaker, cue and content of every quote with .quotes.

a.quotes
[DQTriple(speaker=[he], cue=[said], content=“Based on the actions the group was making, based on everything the gentlemen who came in had told me — if I allowed anyone in the store, they would try to cause harm to people,”),
 DQTriple(speaker=[he], cue=[said], content=“I couldn’t see how big the group was. I thought, ’It’s just seven to 10 people. Maybe they’ll back off.’”),
 DQTriple(speaker=[she], cue=[noted], content=“They were all late teens, early 20s, clean-cut, typical blondie, blue-eyed, wholesome Utah boys,”),
 DQTriple(speaker=[Rogers], cue=[added], content=“Then this African-American guy came from the restaurant,”)]

See resolved entity clusters with .clusters.

a.clusters
[[four frightened, breathless men,
  They,
  they,
  them,
  the group,
  they,
  the group,
  they,
  the group,
  the group],
 [Terrance Mannery,
  Mannery,
  me,
  I,
  he,
  I,
  I,
  Mannery,
  Mannery,
  his,
  him,
  Mannery],
 [the Doki Doki dessert shop,
  Doki Doki,
  the store,
  the shop,
  Doki Doki,
  Doki Doki,
  the shop],
 [just seven to 10 people, that],
 [the Utah Pride Festival, Pride, Pride],
 [the mob, the crowd, They],
 [Michelle Turpin, who was walking west near Doki Doki with friends after Pride,
  she,
  Turpin],
 [Jen Parsons-Soren, who had attended Pride with Turpin,, she],
 [Lyft driver Ross Rogers, he, Rogers, Rogers],
 [the entrance of Doki Doki, the entrance],
 [this African-American guy, That],
 [a small group, the attackers],
 [the door, the door],
 [one of the attackers, the man]]

Use .print_clusters() to see unique text in each cluster, easier to read.

a.print_clusters()
0 {'them', 'the group', 'four frightened, breathless men', 'They', 'they'}
1 {'I', 'his', 'Terrance Mannery', 'he', 'him', 'me', 'Mannery'}
2 {'Doki Doki', 'the Doki Doki dessert shop', 'the shop', 'the store'}
3 {'that', 'just seven to 10 people'}
4 {'Pride', 'the Utah Pride Festival'}
5 {'the crowd', 'They', 'the mob'}
6 {'Michelle Turpin, who was walking west near Doki Doki with friends after Pride', 'she', 'Turpin'}
7 {'she', 'Jen Parsons-Soren, who had attended Pride with Turpin,'}
8 {'Rogers', 'Lyft driver Ross Rogers', 'he'}
9 {'the entrance', 'the entrance of Doki Doki'}
10 {'That', 'this African-American guy'}
11 {'a small group', 'the attackers'}
12 {'the door'}
13 {'one of the attackers', 'the man'}

Quote/cluster matches are saved to .quote_matches as namedtuples.

for qm in a.quote_matches:
    print(qm)
QuoteClusterMatch(quote_index=0, cluster_index=1)
QuoteClusterMatch(quote_index=1, cluster_index=1)
QuoteClusterMatch(quote_index=2, cluster_index=6)
QuoteClusterMatch(quote_index=3, cluster_index=8)

Use .expand_match() to interpret quote/cluster matches.

a.expand_match()
QUOTE : 0
 DQTriple(speaker=[he], cue=[said], content=“Based on the actions the group was making, based on everything the gentlemen who came in had told me — if I allowed anyone in the store, they would try to cause harm to people,”) 

CLUSTER : 1
 ['Terrance Mannery', 'Mannery'] 

QUOTE : 1
 DQTriple(speaker=[he], cue=[said], content=“I couldn’t see how big the group was. I thought, ’It’s just seven to 10 people. Maybe they’ll back off.’”) 

CLUSTER : 1
 ['Terrance Mannery', 'Mannery'] 

QUOTE : 2
 DQTriple(speaker=[she], cue=[noted], content=“They were all late teens, early 20s, clean-cut, typical blondie, blue-eyed, wholesome Utah boys,”) 

CLUSTER : 6
 ['Michelle Turpin, who was walking west near Doki Doki with friends after Pride', 'Turpin'] 

QUOTE : 3
 DQTriple(speaker=[Rogers], cue=[added], content=“Then this African-American guy came from the restaurant,”) 

CLUSTER : 8
 ['Rogers', 'Lyft driver Ross Rogers'] 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sayswho-0.1.0.tar.gz (13.8 kB view hashes)

Uploaded Source

Built Distribution

sayswho-0.1.0-py3-none-any.whl (14.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page