Implemented thesaurus library using SOM
Project description
Thesaurus Visualization
Current supported languages are:
- English
eng
- Russian
rus
How to run
Minimalistic way
Install the library:
pip install thesaurus-lib
Create an object and specify the language:
obj = Thesaurus(lang='eng')
Show output:
obj.show_map()
Run with your own foregrounds:
After you install the library and create the object do the following
- pass them to the library:
text1 = obj.read_pickle('2017')
text2 = obj.read_txt('shakespeare.txt')
text3 = obj.read_text('My foreground in string format')
- Preprocess your foreground:
texts = dict()
foreground_name = 'Physics articles 2017'
texts[foreground_name] = obj.custom_preprocessing_of_data(text1)
- Process foregrounds:
processed_foregrounds = obj.process_foreground(foreground_names, texts)
- Show output:
obj.show_map()
Use your own configurations
After installing the library create a file called 'config.cfg' in your working directory and fill the value with your own files:
[paths]
som_path =
index_path =
back_tokens_path =
back_embeds_path =
stopwords_path =
foregrounds_path =
[lang]
som_url =
embeds_url =
som_file =
index_file =
back_tokens =
back_embeds =
embeddings_file =
STOPWORDS_FILE =
model =
Note: Don't leave any empty field in config.cfg. For example if you aren't providing a som_file then delete it in your config.cfg and don't keep it in this way:
# fill it or delete it
som_file =
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
thesaurus_lib-0.1.9.tar.gz
(2.0 MB
view hashes)
Built Distribution
Close
Hashes for thesaurus_lib-0.1.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 064f9ae3e9042ba0b211850821ab21fcfdb8c863a8b35b0fd41df5dd91486d67 |
|
MD5 | 04b79cec441539cd2cd5cd65fef12bfe |
|
BLAKE2b-256 | c87b92f58664517f82406992e2ff13dfa956a026831b5dc172de1d7fa8158555 |