This is a fake contents detector trained on different languages and mainly on COVID19 domain
Project description
This library is for detecting fake content present in the internet. This library is for 7 languages out of them 1 is English and 4 is European (Italian, German, French & Spanish) and 2 is Indian (Hindi & Bengali).
Installing library (First step)
pip install soumayan4==1.0.0
Now we will see how to implement this library upon English and other 4 European languages.
Downloading part for English and 4 European languages. This code should be run after you pip install the above library, else you will get error.
!polyglot download ner2.en # downloading model ner
!polyglot download pos2.en # downloading model pos
!polyglot download sentiment2.en # downloading model sentiment
!polyglot download embeddings2.en
!polyglot download pos2.en
!polyglot download embeddings2.fr
!polyglot download pos2.fr
!polyglot download embeddings2.es
!polyglot download pos2.es
!polyglot download embeddings2.de
!polyglot download pos2.de
!polyglot download embeddings2.it
!polyglot download pos2.it
!python -m spacy download en_core_web_sm
!polyglot download sentiment2.en
!python -m spacy download fr_core_news_sm
!polyglot download sentiment2.fr
!python -m spacy download de_core_news_sm
!polyglot download sentiment2.de
!python -m spacy download it_core_news_sm
!polyglot download sentiment2.it
!python -m spacy download es_core_news_sm
!polyglot download sentiment2.es
Now we will see how to use this library
from soumayan4 import italian_fake # you can import other functions also like german_fake
data={'text':['warmes Wasser entfernt Korona','how are you?','we are all fine']}
import pandas as pd
df = pd.DataFrame(data) #This is small data for testing our library
!wget https://github.com/soumayan/fake-news-spreader/blob/main/italian/italian_model_svm.sav?raw=true
#This above code is for downloading model present in github, you can change language and model name to use different types of model and languages
italian_fake(df,'text','svm')
#This is how you have to give input to the model, first argument is your dataframe name, second argument is attribute name upon which you want to apply this library, here it is text. Third one is the model name, here model name should be same what you have downloaded before using wget
df.head()
#now you will see there are many features present like NER and other POS with news_output column. If news_output is 0 then it is real else content is fake
Now we will see how to implement this library upon bengali language.
First we have to download some libraries in models directory and have to change current directory to models
!pip install -U bnlp_toolkit
!mkdir models
%cd models
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_spm.model
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_spm.vocab
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_ner.pkl
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_pos.pkl
!wget https://github.com/soumayan/fake-news-spreader/blob/main/bengali/bengali_model_knn.sav?raw=true
Now we will create a small dataset and will apply our library upon this
data={'text':['বিজেপি কখনও জাল খবর ছড়ায় না']}
#data={'text':['কিছু লোক ভুয়া খবর ছড়িয়ে দেয়']}
import pandas as pd
t = pd.DataFrame(data) # creating dataframe
from soumayan4 import bengali_fake
train=bengali_fake(t,'text','knn')
train.head() # you will get your result in news_output column and you will also get some additional features like NER and POS.
Now we will see how to implement this upon Hindi language.
!wget https://github.com/soumayan/fake-news-spreader/blob/main/hindi/hindi_model_svm.sav?raw=true
#install the models according to your need
data={'text':['हर कोई फर्जी खबर फैलाता है']}
import string
import pandas as pd
df = pd.DataFrame(data)
# creating a small dataset for testing
from soumayan4 import hindi_fake
t=hindi_fake(df,'text','svm')
t.head()
#you will get output in news_output column of dataframe t
Note - For bigger datasets like 12000 to 15000 it can take upto 20 minutes time and currently this library will run error free in google colab but we are not sure about other environments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soumayan4-1.0.1.tar.gz.
File metadata
- Download URL: soumayan4-1.0.1.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd60cf91c6c8cbd252d6b9c2ce17e02278d254b368cb3618eaf078d8da017e72
|
|
| MD5 |
a8da0db2804de12a446856b43442edaa
|
|
| BLAKE2b-256 |
a6dccf4f94835cfa629c7ef6138e2b5426c38507f0ee107a3375a2bb803423d8
|
File details
Details for the file soumayan4-1.0.1-py3-none-any.whl.
File metadata
- Download URL: soumayan4-1.0.1-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3867cab1a34169aef11640b46b92e1ecb00aaf71b55f870a45bf2b6efbd3eab
|
|
| MD5 |
212a1d6af73dd61b985b94e64b643be2
|
|
| BLAKE2b-256 |
58f3e275d2fae85578f80912c8cfd12fc3399037e92f56e65285aa9f70c89b35
|