Skip to main content

A python package for extracting and exploring context-enriched word networks from corpora

Project description

Implicit Word Network

Introduction

This python package can be used to extract context-enriched implicit word networks as described by Spitz and Gertz. The theoretical background is explained in the following publications:

  1. Spitz, A. (2019). Implicit Entity Networks: A Versatile Document Model. Heidelberg University Library. https://doi.org/10.11588/HEIDOK.00026328
  2. Spitz, A., & Gertz, M. (2018). Exploring Entity-centric Networks in Entangled News Streams. In Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW ’18. Companion of the The Web Conference 2018. ACM Press. https://doi.org/10.1145/3184558.3188726

Dependencies

This project uses models from the spaCy and sentence_transformers package. These packages are not installed automatically. You can use the following commands to install them.

pip install sentence_transformers
pip install spacy
python -m spacy download en_core_web_sm

Example Usage

import spacy as sp
import implicit_word_network as wn

# Path to text file
path = "data.txt"

# Entities to search for in corpus
entity_types = ["PERSON", "LOC", "NORP", "ORG", "WORK_OF_ART"]

c = 2  # Cut-off parameter

# Importing data ...
D = wn.readDocuments(path)

# Parsing data ...
nlp = sp.load("en_core_web_sm")
D_parsed = wn.parseDocuments(D, entity_types, nlp=nlp)

# Converting parsing results ...
D_mat = wn.createCorpMat(D_parsed)

# Building graph ...
V, Ep = wn.buildGraph(D_mat, c)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

implicit-word-network-0.0.6.tar.gz (36.0 kB view details)

Uploaded Source

File details

Details for the file implicit-word-network-0.0.6.tar.gz.

File metadata

  • Download URL: implicit-word-network-0.0.6.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.0.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.7

File hashes

Hashes for implicit-word-network-0.0.6.tar.gz
Algorithm Hash digest
SHA256 81a01cc1c188df881720b13b9b617164dbccff24afdf1393ea9fa6057cfa2ff2
MD5 fce160ba286f48b74222da3902569a2d
BLAKE2b-256 4dd1324d236ccee7639cc307320dc6eeff80ef213f8ed4a52fcbfda98bfd686d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page