It provides a framework to label a text according to the main elements of narrative (events, participants,time) and their relations

These details have not been verified by PyPI

Project description

Text2Story main package

The Text2Story main package contains the main classes and methods for the T2S pipeline: from text to formal representation to visualization or other representation.

Relation to Brat2Viz The Text2Story package is a generalization of Brat2Viz and should in fact contain all the funcionalities and variants of the T2S project output.

Installation

Language and OS Requirements

Text2Story package is written entirely in Python 3.8 modules ensuring compatibility with UNIX type Operating systems.

Swap Size

T2S is an NLP project, which means that is intended to operate over large amounts of data using complex models, some of the third-party libraries that demand great computing resources.

To ensure enough computation power, you should use a computer where the sum of physical and virtual RAM should be at least 16GB.

How to increase swap/virtual memory size in Linux systems

Steps for installation

Create a virtual enviroment with the following command
```
python3.8 -m venv venv    
```
Activate the virtual enviroment with the following command
```
source venv/bin/activate 
```
Installation of py_heideltime package (more detailed instructions in https://github.com/JMendes1995/py_heideltime)
```
 pip install git+https://github.com/JMendes1995/py_heideltime.git
```

Give tree parser of py_heideltime package permission to execute

 chmod +x $(VENV_HOME)/lib/python3.8/site-packages/py_heideltime/Heideltime/TreeTaggerLinux/bin/tree-tagger

Installation of the text2story package.
```
  python -m pip install text2story
```

The following steps are optional to use the text2story package, but essential to run the our TLDR Python notebook locally (https://bit.ly/3s36Bxf).

Adding virtual enviroment to Jupyter Notebook.

   python3.8 -m pip install --user ipykernel

Adding your virtual enviroment to Jupyter.

   python -m ipykernel install --user --name=venv

Changing the kernel in the Jupyter, by cliking in Kernel -> Change Kernel -> (kernel name).

Usage

import text2story as t2s # Import the package

t2s.start('en') # Load the pipelines in en language

text = 'On Friday morning, Max Healthcare, which runs 10 private hospitals around Delhi, put out an "SOS" message, saying it had less than an hour\'s supply remaining at two of its sites. The shortage was later resolved.'

doc = t2s.Narrative('en', text, '2020-05-30')

doc.extract_actors('sparknlp') # Extraction done with just the SPARKNLP tool.

doc.extract_times() # Extraction done with all tools (same as specifying 'py_heideltime', since we have just one tool to extract timexs)


doc.extract_events('allennlp') # Extraction of events with allennlp tool
doc.extract_semantic_role_link('allennlp') # Extraction of semantic role links with all tools (should be done after extracting events since most semantic relations are between an actor and an event)

doc.ISO_annotation('annotations.ann') # Outputs ISO annotation in .ann format (txt) in a file called 'annotations.ann', which is a standard of BRAT annotation tool


## Structure

. │ README.md | env.yml │ requirements.txt | pyproject.toml | MANIFEST.in | LICENSE | └── src └─ text2story └─ core │ annotator.py (META-annotator) │ entity_structures.py (ActorEntity, TimexEntity and EventEntity classes) | exceptions.py (Exceptions raised by the package) | link_structures.py (TemporalLink, AspectualLink, SubordinationLink, SemanticRoleLink and ObjectalLink classes) | narrative.py (Narrative class) | utils.py (Utility functions)

    └─ annotators (tools supported by the package to do the extractions)
     |   NLTK
     │   PY_HEIDELTIME
     |   SPACY
     |   SPARKNLP
 |   ALLENNLP
 |   CUSTOMPT (A CRF customized model to detect events in the Portuguese language)
     
    └─ brat2viz (tool devoted to create visual representations of ann files)
     |   brat2drs (scripts that do a conversion from a brat stand off format (.ann) to DRS format)
     │   drs2viz (scripts that do a conversion from drs format to a visual representation)

    └─ readers (module dedicated to read different kind of corpora)
     |   fn-lirics.json (conversion map from framenet to lirics: semlink project -> https://github.com/cu-clear/semlink)
     |   pb-vn2.json   (conversion map from propbank to verbnet: semlink project -> https://github.com/cu-clear/semlink)
     |   vn-lirics.json (conversion map from verbnet to lirics: semlink project -> https://github.com/cu-clear/semlink)
     |   read_brat.py  (read brat stand off format)
     |   read_ecb.py  (read ecb+ format)
     |   read_framenet.py  (read nltk data of framenet dataset)
     |   read_propbank.py  (read nltk data of propbank dataset)
     |   read.py  (META-reader)
     |   token_corpus.py  (Token representation of data)
     |   utils.py  (Utility functions for readers)

    └─ experiments (module dedicated to perform batch experiments with narrative datasets)
     |   evaluation.py  (It performs experiments in only one dataset)
     |   metrics.py   (It implements some metrics for classification recall, precision, and f1. Strict and relaxed versions (ref. Semeval-2013 task 1: Tempeval-3))
     |   run_experiments.py  (It implements batch experiments for narrative datasets)
     |   stats.py (It implements methods to evaluate some statistics of narrative datasets)

Annotators

All annotators have the same interface: they implement a function called 'extract_' followed by the name of the particular extraction. E.g., if they are extracting actors, then they implement a function named 'extract_actors', with two arguments: the language of text and the text itself.

Extractions	Interface	Supporting tools
Actor	extract_actors(lang, text)	SPACY, SPARKNLP, NLTK
Timexs	extract_timexs(lang, text, publication_time)	PY_HEIDELTIME
ObjectalLink	extract_objectal_links(lang, text, publication_time)	ALLENNLP
Event	extract_events(lang, text, publication_time)	ALLENNLP, CUSTOMPT
SemanticLink	extract_semantic_role_link(lang, text, publication_time)	ALLENNLP

To change some model used in the supported tools, just go to text2story/annotators/ANNOTATOR_TO_BE_CHANGED and change the model in the file: __init__.py.

To add a new tool, add a folder to text2story/annotators with the name of the annotator all capitalized (just a convention; useful to avoid name colisions). In that folder, create a file called '__init__.py' and there implement a function load() and the desired extraction functions. The function load() should load the pipeline to some variable defined by you, so that, every time we do an extraction, we don't need to load the pipeline all over again. (Implement it, even if your annotator doesn't load anything. Leave it with an empty body.)

In the text2story.annotators.__init__.py file, add a call to the load() function, and to the extract functions. (See the already implemented tools for guidance.)

And it should be done.

PS: Don't forget to normalize the labels to our semantic framework!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.6.0

Jul 2, 2024

1.5.1

Jun 4, 2024

1.5.0

Jun 4, 2024

1.4.9

May 15, 2024

1.4.8

May 6, 2024

1.4.7

May 6, 2024

1.4.6

May 6, 2024

1.4.5

May 6, 2024

1.4.4

Nov 22, 2023

1.4.3

Nov 8, 2023

1.4.2

Oct 18, 2023

1.4.1 yanked

Sep 21, 2023

1.4.0 yanked

Sep 19, 2023

1.4.0.dev8 pre-release yanked

Sep 21, 2023

1.4.0.dev7 pre-release yanked

Sep 20, 2023

1.4.0.dev6 pre-release yanked

Sep 20, 2023

1.4.0.dev5 pre-release yanked

Sep 20, 2023

1.4.0.dev4 pre-release yanked

Sep 20, 2023

1.4.0.dev3 pre-release yanked

Sep 20, 2023

1.4.0.dev2 pre-release yanked

Sep 20, 2023

1.4.0.dev1 pre-release yanked

Sep 20, 2023

1.4.0.dev0 pre-release yanked

Sep 20, 2023

1.3.11 yanked

Sep 13, 2023

1.3.10 yanked

Sep 13, 2023

1.3.9 yanked

Sep 13, 2023

1.3.8 yanked

Sep 13, 2023

1.3.7 yanked

Sep 13, 2023

1.3.6 yanked

Sep 13, 2023

1.3.5 yanked

Sep 12, 2023

1.3.4 yanked

Jun 16, 2023

1.3.3 yanked

Jun 13, 2023

1.3.2 yanked

Jun 12, 2023

1.3.1 yanked

Jun 12, 2023

1.3.0 yanked

Jun 12, 2023

1.2.25 yanked

Jun 1, 2023

1.2.24 yanked

Jun 1, 2023

1.2.23 yanked

May 31, 2023

1.2.22 yanked

May 31, 2023

1.2.21 yanked

May 31, 2023

1.2.20 yanked

May 31, 2023

1.2.18 yanked

May 30, 2023

1.2.17 yanked

May 29, 2023

1.2.16 yanked

May 26, 2023

1.2.15 yanked

May 26, 2023

1.2.14 yanked

May 26, 2023

1.2.13 yanked

May 26, 2023

1.2.12 yanked

May 26, 2023

1.2.11 yanked

May 25, 2023

1.2.10 yanked

May 25, 2023

1.2.9 yanked

May 24, 2023

1.2.8 yanked

May 24, 2023

1.2.7 yanked

May 24, 2023

1.2.6 yanked

May 24, 2023

1.2.5 yanked

May 17, 2023

1.2.4 yanked

May 17, 2023

1.2.3 yanked

May 16, 2023

1.2.2 yanked

May 16, 2023

1.2.1 yanked

May 16, 2023

1.2.0 yanked

May 10, 2023

1.1.28 yanked

May 10, 2023

1.1.27 yanked

Mar 31, 2023

1.1.26 yanked

Mar 31, 2023

1.1.25 yanked

Mar 29, 2023

1.1.24 yanked

Mar 29, 2023

1.1.9 yanked

Dec 12, 2022

1.0.9 yanked

Nov 4, 2022

1.0.8 yanked

Nov 4, 2022

1.0.6 yanked

Nov 4, 2022

This version

1.0.5 yanked

Oct 26, 2022

1.0.4 yanked

Oct 26, 2022

1.0.3 yanked

Oct 25, 2022

1.0.2 yanked

Oct 24, 2022

1.0.0 yanked

Oct 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2story-1.0.5.tar.gz (478.7 kB view details)

Uploaded Oct 26, 2022 Source

Built Distribution

text2story-1.0.5-py3-none-any.whl (490.8 kB view details)

Uploaded Oct 26, 2022 Python 3

File details

Details for the file text2story-1.0.5.tar.gz.

File metadata

Download URL: text2story-1.0.5.tar.gz
Upload date: Oct 26, 2022
Size: 478.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for text2story-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`e2ae7e3dca9c844c986a035e230e9c6ecc8565365284131bdbe11a41817cd962`
MD5	`05a00040fd5780d1be6902010bd53875`
BLAKE2b-256	`a3a9cc9441b0c7a84e5b5ad0824ea5d75767db66da00919f9905baf7fdfeebdb`

See more details on using hashes here.

Provenance

File details

Details for the file text2story-1.0.5-py3-none-any.whl.

File metadata

Download URL: text2story-1.0.5-py3-none-any.whl
Upload date: Oct 26, 2022
Size: 490.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for text2story-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cea2c092a89e11660f42de8b80aebfc8cfc7883d61a23841f826feffc04dd562`
MD5	`c2176afcd5b7e83e82c35d1006a57446`
BLAKE2b-256	`8357f5d5064c025d542d53cbd8d068b24f8d82e9785ab00a15fa955d9866dd8e`