An module to create network of words on bases of realtive sense under a corpus of document.
Project description
# WordNet
[![Build Status](https://travis-ci.org/anuragkumarak95/wordnet.svg?branch=master)](https://travis-ci.org/anuragkumarak95/wordnet)
[![codecov](https://codecov.io/gh/anuragkumarak95/wordnet/branch/master/graph/badge.svg)](https://codecov.io/gh/anuragkumarak95/wordnet)
[![Requirements Status](https://requires.io/github/anuragkumarak95/wordnet/requirements.svg?branch=master)](https://requires.io/github/anuragkumarak95/wordnet/requirements/?branch=master)
Create a Simple **network of words** related to each other using **Twitter Streaming API**.
![Made with python-3.5](http://forthebadge.com/images/badges/made-with-python.svg)
Major parts of this project.
* `Streamer` : ~/twitter_streaming.py
* `TF-IDF` Gene : ~/wordnet/tf_idf_generator.py
* `NN` words Gene :~/ wordnet/nn_words.py
* `NETWORK` Gene : ~/wordnet/word_net.py
## Using Streamer Functionality
1. `Clone this repo` and run on bash '`$pip install -r requirements.txt`' @ root directory and you will be ready to go..
1. Go to root-dir(~), Create a config.py file with details mentioned below:
```python
# Variables that contains the user credentials to access Twitter Streaming API
# this link will help you(http://socialmedia-class.org/twittertutorial.html)
access_token = "xxx-xx-xxxx"
access_token_secret = "xxxxx"
consumer_key = "xxxxxx"
consumer_secret = "xxxxxxxx"
```
1. run `Streamer` with an array of filter words that you want to fetch tweets on. eg. `$python twitter_streaming.py hello hi hallo namaste > data_file.txt` this will save a line by line words from tweets filtered according to words used as args in `data_file.txt`.
## Using WordNet Module
1. `Clone this repo` and install wordnet module using this script,
$python setup.py install
1. To create a `TF-IDF` structure file for every doc, use:
```python
from wordnet import find_tf_idf
df, tf_idf = find_tf_idf(
file_names=['file/path1','file/path2',..], # paths of files to be processed.(create using twitter_streamer.py)
prev_file_path='prev/tf/idf/file/path.tfidfpkl', # prev TF_IDF file to modify over, format standard is .tfidfpkl. default = None
dump_path='path/to/dump/file.tfidfpkl' # dump_path if tf-idf needs to be dumped, format standard is .tfidfpkl. default = None
)
'''
if no file is provided prev_file_path parameter, new TF-IDF file will be generated ,and else
TF-IDF values will be combined with previous file, and dumped at dump_path if mentioned,
else will only return the new tf-idf list of dictionaries, and df dictionary.
'''
```
1. To use `NN` Word Gene of this module, simply use wordnet.find_knn:
```python
from wordnet import find_knn
words = find_knn(
tf_idf=tf_idf, # this tf_idf is returned by find_tf_idf() above.
input_word='german', # a word for which k nearest neighbours are required.
k=10, # k = number of neighbours required, default=10
rand_on=True # rand_on = either to randomly skip few words or show initial k words default=True
)
'''
This function will return a list of words closely related to provided input_word refering to
tf_idf var provided to it. either use find_tf_idf() to gather this var or pickle.load() a dump
file dumped by the same function at your choosen directory. the file contains 2 lists in format
(idf, tf_idf).
'''
```
1. To create a Word `Network`, use :
```python
from wordnet import generate_net
word_net = generate_net(
df=df, # this df is returned by find_tf_idf() above.
tf_idf=tf_idf, # this tf_idf is returned by find_tf_idf() above.
dump_path='path/to/dump.wrnt' # dump_path = path to dump the generated files, format standard is .wrnt. default=None
)
'''
this function returns a dict of Word entities, with word as key.
'''
```
1. To retrieve a Word `Network`, use :
```python
from wordnet import retrieve_net
word_net = retrieve_net(
'path/to/network.wrnt' # path to network file, format standard is .wrnt.
)
'''
this function returns a dictionary of Word entities, with word as key.
'''
```
1. To retrieve list of words that are at some depth form a root word in the network, use:
```python
from wordnet import return_net
words = return_net(
word, # root word in this process.
word_net, # word network generated from generate_net()
depth=1 # depth to which you wish this word collector to traverse.
)
'''
This function returns a list of words that are at provided depth from root word in the
network provided.
'''
```
### Test Run
To run a formal test, simply run this script. `python test.py`, this module will return **0** if everythinig worked as expected.
test.py uses sample data provided [here](https://github.com/anuragkumarak95/wordnet/tree/master/test) and executes unittest on `find_tf_idf()`, `find_knn()` & `generate_net()`.
> `Streamer` functionality will not be provided under distribution of this code. That is just a script independent from the module.
#### Contributions Are welcomed here
![BUILT WITH LOVE](http://forthebadge.com/images/badges/built-with-love.svg)
by [@Anurag](https://github.com/anuragkumarak95)
[![Build Status](https://travis-ci.org/anuragkumarak95/wordnet.svg?branch=master)](https://travis-ci.org/anuragkumarak95/wordnet)
[![codecov](https://codecov.io/gh/anuragkumarak95/wordnet/branch/master/graph/badge.svg)](https://codecov.io/gh/anuragkumarak95/wordnet)
[![Requirements Status](https://requires.io/github/anuragkumarak95/wordnet/requirements.svg?branch=master)](https://requires.io/github/anuragkumarak95/wordnet/requirements/?branch=master)
Create a Simple **network of words** related to each other using **Twitter Streaming API**.
![Made with python-3.5](http://forthebadge.com/images/badges/made-with-python.svg)
Major parts of this project.
* `Streamer` : ~/twitter_streaming.py
* `TF-IDF` Gene : ~/wordnet/tf_idf_generator.py
* `NN` words Gene :~/ wordnet/nn_words.py
* `NETWORK` Gene : ~/wordnet/word_net.py
## Using Streamer Functionality
1. `Clone this repo` and run on bash '`$pip install -r requirements.txt`' @ root directory and you will be ready to go..
1. Go to root-dir(~), Create a config.py file with details mentioned below:
```python
# Variables that contains the user credentials to access Twitter Streaming API
# this link will help you(http://socialmedia-class.org/twittertutorial.html)
access_token = "xxx-xx-xxxx"
access_token_secret = "xxxxx"
consumer_key = "xxxxxx"
consumer_secret = "xxxxxxxx"
```
1. run `Streamer` with an array of filter words that you want to fetch tweets on. eg. `$python twitter_streaming.py hello hi hallo namaste > data_file.txt` this will save a line by line words from tweets filtered according to words used as args in `data_file.txt`.
## Using WordNet Module
1. `Clone this repo` and install wordnet module using this script,
$python setup.py install
1. To create a `TF-IDF` structure file for every doc, use:
```python
from wordnet import find_tf_idf
df, tf_idf = find_tf_idf(
file_names=['file/path1','file/path2',..], # paths of files to be processed.(create using twitter_streamer.py)
prev_file_path='prev/tf/idf/file/path.tfidfpkl', # prev TF_IDF file to modify over, format standard is .tfidfpkl. default = None
dump_path='path/to/dump/file.tfidfpkl' # dump_path if tf-idf needs to be dumped, format standard is .tfidfpkl. default = None
)
'''
if no file is provided prev_file_path parameter, new TF-IDF file will be generated ,and else
TF-IDF values will be combined with previous file, and dumped at dump_path if mentioned,
else will only return the new tf-idf list of dictionaries, and df dictionary.
'''
```
1. To use `NN` Word Gene of this module, simply use wordnet.find_knn:
```python
from wordnet import find_knn
words = find_knn(
tf_idf=tf_idf, # this tf_idf is returned by find_tf_idf() above.
input_word='german', # a word for which k nearest neighbours are required.
k=10, # k = number of neighbours required, default=10
rand_on=True # rand_on = either to randomly skip few words or show initial k words default=True
)
'''
This function will return a list of words closely related to provided input_word refering to
tf_idf var provided to it. either use find_tf_idf() to gather this var or pickle.load() a dump
file dumped by the same function at your choosen directory. the file contains 2 lists in format
(idf, tf_idf).
'''
```
1. To create a Word `Network`, use :
```python
from wordnet import generate_net
word_net = generate_net(
df=df, # this df is returned by find_tf_idf() above.
tf_idf=tf_idf, # this tf_idf is returned by find_tf_idf() above.
dump_path='path/to/dump.wrnt' # dump_path = path to dump the generated files, format standard is .wrnt. default=None
)
'''
this function returns a dict of Word entities, with word as key.
'''
```
1. To retrieve a Word `Network`, use :
```python
from wordnet import retrieve_net
word_net = retrieve_net(
'path/to/network.wrnt' # path to network file, format standard is .wrnt.
)
'''
this function returns a dictionary of Word entities, with word as key.
'''
```
1. To retrieve list of words that are at some depth form a root word in the network, use:
```python
from wordnet import return_net
words = return_net(
word, # root word in this process.
word_net, # word network generated from generate_net()
depth=1 # depth to which you wish this word collector to traverse.
)
'''
This function returns a list of words that are at provided depth from root word in the
network provided.
'''
```
### Test Run
To run a formal test, simply run this script. `python test.py`, this module will return **0** if everythinig worked as expected.
test.py uses sample data provided [here](https://github.com/anuragkumarak95/wordnet/tree/master/test) and executes unittest on `find_tf_idf()`, `find_knn()` & `generate_net()`.
> `Streamer` functionality will not be provided under distribution of this code. That is just a script independent from the module.
#### Contributions Are welcomed here
![BUILT WITH LOVE](http://forthebadge.com/images/badges/built-with-love.svg)
by [@Anurag](https://github.com/anuragkumarak95)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wordnet-0.0.1b2.tar.gz
(8.8 kB
view details)
File details
Details for the file wordnet-0.0.1b2.tar.gz
.
File metadata
- Download URL: wordnet-0.0.1b2.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ec876cbab8fc997eaebe1056e6ccef5678811fe5e75734156b27482e761dcc9 |
|
MD5 | 29d275f6fdd5b2e7b3a101cd8e474eb9 |
|
BLAKE2b-256 | e5c993f89fc3613db301ff92be67aa67a5f9e4b5e212081ce3569e84a9e57304 |