A python package for extracting and exploring context-enriched word networks from corpora
Project description
Implicit Word Network
Introduction
This python package can be used to extract context-enriched implicit word networks as described by Spitz and Gertz. The theoretical background is explained in the following publications:
- Spitz, A. (2019). Implicit Entity Networks: A Versatile Document Model. Heidelberg University Library. https://doi.org/10.11588/HEIDOK.00026328
- Spitz, A., & Gertz, M. (2018). Exploring Entity-centric Networks in Entangled News Streams. In Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW ’18. Companion of the The Web Conference 2018. ACM Press. https://doi.org/10.1145/3184558.3188726
Dependencies
This project uses models from the spaCy and sentence_transformers package. These packages are not installed automatically. You can use the following commands to install them.
pip install sentence_transformers
pip install spacy
python -m spacy download en_core_web_sm
Example Usage
import spacy as sp
import implicit_word_network as wn
# Path to text file
path = "data.txt"
# Entities to search for in corpus
entity_types = ["PERSON", "LOC", "NORP", "ORG", "WORK_OF_ART"]
c = 2 # Cut-off parameter
# Importing data ...
D = wn.readDocuments(path)
# Parsing data ...
nlp = sp.load("en_core_web_sm")
D_parsed = wn.parseDocuments(D, entity_types, nlp=nlp)
# Converting parsing results ...
D_mat = wn.createCorpMat(D_parsed)
# Building graph ...
V, Ep = wn.buildGraph(D_mat, c)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file implicit-word-network-0.0.6.tar.gz.
File metadata
- Download URL: implicit-word-network-0.0.6.tar.gz
- Upload date:
- Size: 36.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.0.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81a01cc1c188df881720b13b9b617164dbccff24afdf1393ea9fa6057cfa2ff2
|
|
| MD5 |
fce160ba286f48b74222da3902569a2d
|
|
| BLAKE2b-256 |
4dd1324d236ccee7639cc307320dc6eeff80ef213f8ed4a52fcbfda98bfd686d
|