Implemented thesaurus library using SOM
Project description
Thesaurus Visualization
Current supported languages are:
- English
eng
- Russian
rus
How to run
Minimalistic way
Install the library:
pip install thesaurus-lib
Create an object and specify the language:
obj = Thesaurus(lang='eng')
Show output:
obj.show_map()
Run with your own foregrounds:
After you install the library and create the object do the following
- pass them to the library:
text1 = obj.read_pickle('2017')
text2 = obj.read_txt('shakespeare.txt')
text3 = obj.read_text('My foreground in string format')
- Preprocess your foreground:
texts = dict()
foreground_name = 'Physics articles 2017'
texts[foreground_name] = obj.custom_preprocessing_of_data(text1)
- Process foregrounds:
processed_foregrounds = obj.process_foreground(foreground_names, texts)
- Show output:
obj.show_map()
Use your own configurations
After installing the library create a file called 'config.cfg' in your working directory and fill the value with your own files:
[paths]
som_path =
index_path =
back_tokens_path =
back_embeds_path =
stopwords_path =
foregrounds_path =
[lang]
som_url =
embeds_url =
som_file =
index_file =
back_tokens =
back_embeds =
embeddings_file =
STOPWORDS_FILE =
model =
Note: Don't leave any empty field in config.cfg. For example if you aren't providing a som_file then delete it in your config.cfg and don't keep it in this way:
# fill it or delete it
som_file =
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
thesaurus_lib-0.1.8.tar.gz
(2.0 MB
view hashes)
Built Distribution
Close
Hashes for thesaurus_lib-0.1.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 629893e5a1089202275f4d2791296b435b416558c5b604ad5f94876e22a871ec |
|
MD5 | 38e29ac9a2b9d5ee81b845a5187f09e1 |
|
BLAKE2b-256 | ddfdc774cc69375b4fe14b7004eaeddd31fd90272b7e29ba641d06c4c5cc9400 |