Skip to main content

This is a fake contents detector trained on different languages and mainly on COVID19 domain

Project description

This library is for detecting fake content present in the internet. This library is for 7 languages out of them 1 is English and 4 is European (Italian, German, French & Spanish) and 2 is Indian (Hindi & Bengali).

GitHub

Installing library (First step) pip install soumayan4==1.0.2


Now we will see how to implement this library upon English and other 4 European languages.

Downloading part for English and 4 European languages. This code should be run after you pip install the above library, else you will get error.

!polyglot download ner2.en    # downloading model ner

!polyglot download pos2.en    # downloading model pos

!polyglot download sentiment2.en  # downloading model sentiment


!polyglot download embeddings2.en
!polyglot download pos2.en

!polyglot download embeddings2.fr
!polyglot download pos2.fr

!polyglot download embeddings2.es
!polyglot download pos2.es

!polyglot download embeddings2.de
!polyglot download pos2.de

!polyglot download embeddings2.it
!polyglot download pos2.it

!python -m spacy download en_core_web_sm
!polyglot download sentiment2.en
!python -m spacy download fr_core_news_sm
!polyglot download sentiment2.fr
!python -m spacy download de_core_news_sm
!polyglot download sentiment2.de
!python -m spacy download it_core_news_sm
!polyglot download sentiment2.it
!python -m spacy download es_core_news_sm
!polyglot download sentiment2.es

Now we will see how to use this library

from soumayan4 import italian_fake  # you can import other functions also like german_fake

data={'text':['warmes Wasser entfernt Korona','how are you?','we are all fine']}
import pandas as pd
df = pd.DataFrame(data) #This is small data for testing our library

!wget https://github.com/soumayan/fake-news-spreader/blob/main/italian/italian_model_svm.sav?raw=true
#This above code is for downloading model present in github, you can change language and model name to use different types of model and languages

italian_fake(df,'text','svm')
#This is how you have to give input to the model, first argument is your dataframe name, second argument is attribute name upon which you want to apply this library, here it is text. Third one is the model name, here model name should be same what you have downloaded before using wget

df.head()
#now you will see there are many features present like NER and other POS with news_output column. If news_output is 0 then it is real else content is fake

Now we will see how to implement this library upon bengali language.

First we have to download some libraries in models directory and have to change current directory to models

!pip install -U bnlp_toolkit
!mkdir models
%cd models
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_spm.model
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_spm.vocab
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_ner.pkl
!wget https://github.com/sagorbrur/bnlp/raw/master/model/bn_pos.pkl
!wget https://github.com/soumayan/fake-news-spreader/blob/main/bengali/bengali_model_knn.sav?raw=true

Now we will create a small dataset and will apply our library upon this

data={'text':['বিজেপি কখনও জাল খবর ছড়ায় না']}
#data={'text':['কিছু লোক ভুয়া খবর ছড়িয়ে দেয়']}

import pandas as pd
t = pd.DataFrame(data) # creating dataframe

from soumayan4 import bengali_fake

train=bengali_fake(t,'text','knn')
train.head() # you will get your result in news_output column and you will also get some additional features like NER and POS.

Now we will see how to implement this upon Hindi language.

!wget https://github.com/soumayan/fake-news-spreader/blob/main/hindi/hindi_model_svm.sav?raw=true 
#install the models according to your need

data={'text':['हर कोई फर्जी खबर फैलाता है']}
import string
import pandas as pd
df = pd.DataFrame(data) 
# creating a small dataset for testing

from soumayan4 import hindi_fake
t=hindi_fake(df,'text','svm')

t.head()
#you will get output in news_output column of dataframe t

Note - For bigger datasets like 12000 to 15000 it can take upto 20 minutes time and currently this library will run error free in google colab but we are not sure about other environments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soumayan4-1.0.2.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soumayan4-1.0.2-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file soumayan4-1.0.2.tar.gz.

File metadata

  • Download URL: soumayan4-1.0.2.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for soumayan4-1.0.2.tar.gz
Algorithm Hash digest
SHA256 91dffe1fb56703aae4b81f4d260e495786fe6fce555c325fa080fd6bc9b80843
MD5 91a0e838374e29a3ab5db2e33bfdb269
BLAKE2b-256 8aa18e808ec90fa9211e3bdf7581314544f6db6e407621b5d0f8d58784625536

See more details on using hashes here.

File details

Details for the file soumayan4-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: soumayan4-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for soumayan4-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6aeaeb27c76eb20ae4d9a80bbf14f1414c596f133e6a37a441deed222d06c962
MD5 80b5d37d229efe70c9aaa34cdc7ff7ab
BLAKE2b-256 2726259b91356ade5127466f62e81374a61e29b983dcd555c464f6cd9d63b846

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page