Skip to main content

This is a fake contents detector trained on different languages and mainly on COVID19 domain

Project description

This library is for detecting fake content present in the internet. This library is for 7 languages out of them 1 is English and 4 is European (Italian, German, French & Spanish) and 2 is Indian (Hindi & Bengali).

GitHub

Installing library (First step) pip install soumayan4==1.0.0


Now we will see how to implement this library upon English and other 4 European languages.

Downloading part for English and 4 European languages. This code should be run after you pip install the above library, else you will get error.

!polyglot download ner2.en    # downloading model ner

!polyglot download pos2.en    # downloading model pos

!polyglot download sentiment2.en  # downloading model sentiment


!polyglot download embeddings2.en
!polyglot download pos2.en

!polyglot download embeddings2.fr
!polyglot download pos2.fr

!polyglot download embeddings2.es
!polyglot download pos2.es

!polyglot download embeddings2.de
!polyglot download pos2.de

!polyglot download embeddings2.it
!polyglot download pos2.it

!python -m spacy download en_core_web_sm
!polyglot download sentiment2.en
!python -m spacy download fr_core_news_sm
!polyglot download sentiment2.fr
!python -m spacy download de_core_news_sm
!polyglot download sentiment2.de
!python -m spacy download it_core_news_sm
!polyglot download sentiment2.it
!python -m spacy download es_core_news_sm
!polyglot download sentiment2.es

Now we will see how to use this library

from soumayan4 import italian_fake  # you can import other functions also like german_fake

data={'text':['warmes Wasser entfernt Korona','how are you?','we are all fine']}
import pandas as pd
df = pd.DataFrame(data) #This is small data for testing our library

!wget https://github.com/soumayan/fake-news-spreader/blob/main/italian/italian_model_svm.sav?raw=true
#This above code is for downloading model present in github, you can change language and model name to use different types of model and languages

italian_fake(df,'text','svm')
#This is how you have to give input to the model, first argument is your dataframe name, second argument is attribute name upon which you want to apply this library, here it is text. Third one is the model name, here model name should be same what you have downloaded before using wget

df.head()
#now you will see there are many features present like NER and other POS with news_output column. If news_output is 0 then it is real else content is fake

Now we will see how to implement this library upon bengali language.

First we have to download some libraries in models directory and have to change current directory to models

!pip install -U bnlp_toolkit
!mkdir models
%cd models
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_spm.model
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_spm.vocab
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_ner.pkl
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_pos.pkl
!wget https://github.com/soumayan/fake-news-spreader/blob/main/bengali/bengali_model_knn.sav?raw=true

Now we will create a small dataset and will apply our library upon this

data={'text':['বিজেপি কখনও জাল খবর ছড়ায় না']}
#data={'text':['কিছু লোক ভুয়া খবর ছড়িয়ে দেয়']}

import pandas as pd
t = pd.DataFrame(data) # creating dataframe

from soumayan4 import bengali_fake

train=bengali_fake(t,'text','knn')
train.head() # you will get your result in news_output column and you will also get some additional features like NER and POS.

Now we will see how to implement this upon Hindi language.

!wget https://github.com/soumayan/fake-news-spreader/blob/main/hindi/hindi_model_svm.sav?raw=true 
#install the models according to your need

data={'text':['हर कोई फर्जी खबर फैलाता है']}
import string
import pandas as pd
df = pd.DataFrame(data) 
# creating a small dataset for testing

from soumayan4 import hindi_fake
t=hindi_fake(df,'text','svm')

t.head()
#you will get output in news_output column of dataframe t

Note - For bigger datasets like 12000 to 15000 it can take upto 20 minutes time and currently this library will run error free in google colab but we are not sure about other environments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soumayan4-1.0.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soumayan4-1.0.1-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file soumayan4-1.0.1.tar.gz.

File metadata

  • Download URL: soumayan4-1.0.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for soumayan4-1.0.1.tar.gz
Algorithm Hash digest
SHA256 fd60cf91c6c8cbd252d6b9c2ce17e02278d254b368cb3618eaf078d8da017e72
MD5 a8da0db2804de12a446856b43442edaa
BLAKE2b-256 a6dccf4f94835cfa629c7ef6138e2b5426c38507f0ee107a3375a2bb803423d8

See more details on using hashes here.

File details

Details for the file soumayan4-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: soumayan4-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for soumayan4-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3867cab1a34169aef11640b46b92e1ecb00aaf71b55f870a45bf2b6efbd3eab
MD5 212a1d6af73dd61b985b94e64b643be2
BLAKE2b-256 58f3e275d2fae85578f80912c8cfd12fc3399037e92f56e65285aa9f70c89b35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page