Skip to main content

Persian Natural Language Inference DataSet

Project description

FarsTail: A Persian Natural Language Inference Dataset



Natural Language Inference (NLI) who is also called Texual Entailment is an important task in NLP that its goal is to determine the inference relationship between a premise p and a hypothesis h. It is a three-class problem, where each pair (p, h) is assigned to one of these classes: "ENTAILMENT" if the hypothesis can be inferred from the premise, "CONTRADICTION" if the hypothesis contradicts with the premise, and "NEUTRAL" if infering hypothesis from premise is not possible.
In English, large datasets such as SNLI, MNLI, SciTail are created for this task. Even for some other languages, datasets has been created that has improved this task in these languages. But we see this less for poor source languages like persian.
Persian (Farsi) language is a pluricentric language spoken by around 110 million people in countries such as Iran, Afghanistan, and Tajikistan. In this github, we present the first large scale Persian corpus for NLI task, called FarsTail.

We divided the data into test, train, and dev based on the following distribution:
Split Number
Train 7266
Dev 1537
Test 1564

Getting started with package

We have provided an API in the form of a python package to read and use FarsTail easier. In the following, we will explain how to use this package.

You'll need Python 3.6 or higher.

Installation

pip install farstail

using

  • Loading the the FarsTail dataset.
from farstail.datasets import farstail
(p_train, h_train, l_train), (p_dev, h_dev, l_dev), (p_test, h_test, l_test) = farstail.load_data()
  • Retrieving a dict mapping words to their index in the IMDB dataset.
from farsfail.datasets import farstail
farstail_word_index = farstail.get_word_index()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

farstail-1.0.3.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

farstail-1.0.3-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file farstail-1.0.3.tar.gz.

File metadata

  • Download URL: farstail-1.0.3.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for farstail-1.0.3.tar.gz
Algorithm Hash digest
SHA256 8cc39c9380578050b4963932793cda86f16458e7a3f424936a7f986d16145b60
MD5 6875668c1a29edb2260b106932595c8e
BLAKE2b-256 825a476094a522de8e1b896c44bd97536fdab1d44c87809d469c6aeed4b99fec

See more details on using hashes here.

File details

Details for the file farstail-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: farstail-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for farstail-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 abde973c6f4e77a75241c06624d72f1ade6f7d639d670d5ca73db28f9381752a
MD5 67760f8cb13d2eba7b8979118fdde03f
BLAKE2b-256 7f1f84184df85732227885ef657411729b76d0597c566a12449de3dd64811c7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page