Skip to main content

Explicit Semantic Analysis

Project description

ESA-Wiki

Explicit Semantic Analysis based on Wikipedia

This is a python library which contains code to 1) construct a semantic interpreter based on data from Wikipedia and 2) apply this to various kinds of texts.

To construct an interpreter, first obtain a Wikipedia XML dump from http://dumps.wikimedia.org/enwiki/

  1. Then run python3 -m esa_wiki.xml_parse <file> with the downloaded file as its argument. This outputs some temporary files containing information on the words, links and articles encountered.

  2. Next, run python3 -m esa_wiki.generate_indices to generate lists of indices corresponding to unique words and articles encountered

  3. Finally, run python3 -m esa_wiki.matrix_builder to construct a very large sparse interpretation matrix. Each row corresponds to a unique word, each column to a 'concept', i.e. a Wikipedia article, and each entry is the TF-IDF score for word i in article j. The Matrix is saved in separate chunks to conserve memory.

medium_wiki.xml can be used as an example file for demonstration/testing purposes, as it contains only the first 100 or so Wikipedia articles.

cunning_linguistics.py then contains classes to perform text analysis and harvest tweets for analysis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

esa_wiki-0.0.1.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

esa_wiki-0.0.1-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file esa_wiki-0.0.1.tar.gz.

File metadata

  • Download URL: esa_wiki-0.0.1.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.23.3 CPython/3.6.5

File hashes

Hashes for esa_wiki-0.0.1.tar.gz
Algorithm Hash digest
SHA256 78a4bd39e7d340ea8dc76fad00248d6ab1e464649b09f88909e47ac99038c0d3
MD5 aa056bc065caa1de49f6a6d3d80d5790
BLAKE2b-256 a20efa497789f9a3c1a50fdba2ee60d7d765266ddaff51720d78594b7b6f9177

See more details on using hashes here.

File details

Details for the file esa_wiki-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: esa_wiki-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.23.3 CPython/3.6.5

File hashes

Hashes for esa_wiki-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3571a5fe21f8692761a2f1a4f3b663f91942e878f53bc6ea2078868c5888e5ee
MD5 a6ab72790cf81ec66ea90b2d8f83f277
BLAKE2b-256 0ee09d54bc3e1ddadf03eb26cd730d61f2ac13201d1d629aca0952a624ab3d4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page