Explicit Semantic Analysis
Project description
ESA-Wiki
Explicit Semantic Analysis based on Wikipedia
This is a python library which contains code to 1) construct a semantic interpreter based on data from Wikipedia and 2) apply this to various kinds of texts.
To construct an interpreter, first obtain a Wikipedia XML dump from http://dumps.wikimedia.org/enwiki/
-
Then run
python3 -m esa_wiki.xml_parse <file>
with the downloaded file as its argument. This outputs some temporary files containing information on the words, links and articles encountered. -
Next, run
python3 -m esa_wiki.generate_indices
to generate lists of indices corresponding to unique words and articles encountered -
Finally, run
python3 -m esa_wiki.matrix_builder
to construct a very large sparse interpretation matrix. Each row corresponds to a unique word, each column to a 'concept', i.e. a Wikipedia article, and each entry is the TF-IDF score for word i in article j. The Matrix is saved in separate chunks to conserve memory.
medium_wiki.xml can be used as an example file for demonstration/testing purposes, as it contains only the first 100 or so Wikipedia articles.
cunning_linguistics.py then contains classes to perform text analysis and harvest tweets for analysis.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file esa_wiki-0.0.1.tar.gz
.
File metadata
- Download URL: esa_wiki-0.0.1.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.23.3 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78a4bd39e7d340ea8dc76fad00248d6ab1e464649b09f88909e47ac99038c0d3 |
|
MD5 | aa056bc065caa1de49f6a6d3d80d5790 |
|
BLAKE2b-256 | a20efa497789f9a3c1a50fdba2ee60d7d765266ddaff51720d78594b7b6f9177 |
File details
Details for the file esa_wiki-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: esa_wiki-0.0.1-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.23.3 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3571a5fe21f8692761a2f1a4f3b663f91942e878f53bc6ea2078868c5888e5ee |
|
MD5 | a6ab72790cf81ec66ea90b2d8f83f277 |
|
BLAKE2b-256 | 0ee09d54bc3e1ddadf03eb26cd730d61f2ac13201d1d629aca0952a624ab3d4b |