Explicit Semantic Analysis

These details have not been verified by PyPI

Project description

ESA-Wiki

Explicit Semantic Analysis based on Wikipedia

This is a python library which contains code to 1) construct a semantic interpreter based on data from Wikipedia and 2) apply this to various kinds of texts.

To construct an interpreter, first obtain a Wikipedia XML dump from http://dumps.wikimedia.org/enwiki/

Then run python3 -m esa_wiki.xml_parse <file> with the downloaded file as its argument. This outputs some temporary files containing information on the words, links and articles encountered.
Next, run python3 -m esa_wiki.generate_indices to generate lists of indices corresponding to unique words and articles encountered
Finally, run python3 -m esa_wiki.matrix_builder to construct a very large sparse interpretation matrix. Each row corresponds to a unique word, each column to a 'concept', i.e. a Wikipedia article, and each entry is the TF-IDF score for word i in article j. The Matrix is saved in separate chunks to conserve memory.

medium_wiki.xml can be used as an example file for demonstration/testing purposes, as it contains only the first 100 or so Wikipedia articles.

cunning_linguistics.py then contains classes to perform text analysis and harvest tweets for analysis.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.1

Mar 4, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

esa_wiki-0.0.1.tar.gz (15.0 kB view details)

Uploaded Mar 4, 2019 Source

Built Distribution

esa_wiki-0.0.1-py3-none-any.whl (17.3 kB view details)

Uploaded Mar 4, 2019 Python 3

File details

Details for the file esa_wiki-0.0.1.tar.gz.

File metadata

Download URL: esa_wiki-0.0.1.tar.gz
Upload date: Mar 4, 2019
Size: 15.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.23.3 CPython/3.6.5

File hashes

Hashes for esa_wiki-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`78a4bd39e7d340ea8dc76fad00248d6ab1e464649b09f88909e47ac99038c0d3`
MD5	`aa056bc065caa1de49f6a6d3d80d5790`
BLAKE2b-256	`a20efa497789f9a3c1a50fdba2ee60d7d765266ddaff51720d78594b7b6f9177`

See more details on using hashes here.

File details

Details for the file esa_wiki-0.0.1-py3-none-any.whl.

File metadata

Download URL: esa_wiki-0.0.1-py3-none-any.whl
Upload date: Mar 4, 2019
Size: 17.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.23.3 CPython/3.6.5

File hashes

Hashes for esa_wiki-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3571a5fe21f8692761a2f1a4f3b663f91942e878f53bc6ea2078868c5888e5ee`
MD5	`a6ab72790cf81ec66ea90b2d8f83f277`
BLAKE2b-256	`0ee09d54bc3e1ddadf03eb26cd730d61f2ac13201d1d629aca0952a624ab3d4b`

See more details on using hashes here.

esa-wiki 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ESA-Wiki

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes