Skip to main content

Persian Natural Language Inference DataSet

Project description

FarsTail: A Persian Natural Language Inference Dataset



Natural Language Inference (NLI) who is also called Texual Entailment is an important task in NLP that its goal is to determine the inference relationship between a premise p and a hypothesis h. It is a three-class problem, where each pair (p, h) is assigned to one of these classes: "ENTAILMENT" if the hypothesis can be inferred from the premise, "CONTRADICTION" if the hypothesis contradicts with the premise, and "NEUTRAL" if infering hypothesis from premise is not possible.
In English, large datasets such as SNLI, MNLI, SciTail are created for this task. Even for some other languages, datasets has been created that has improved this task in these languages. But we see this less for poor source languages like persian.
Persian (Farsi) language is a pluricentric language spoken by around 110 million people in countries such as Iran, Afghanistan, and Tajikistan. In this github, we present the first large scale Persian corpus for NLI task, called FarsTail.

We divided the data into test, train, and dev based on the following distribution:
Split Number
Train 7266
Dev 1537
Test 1564

Getting started with package

We have provided an API in the form of a python package to read and use FarsTail easier. In the following, we will explain how to use this package.

You'll need Python 3.6 or higher.

Installation

pip install farstail

using

  • Loading the the FarsTail dataset.
from farstail.datasets import farstail
(p_train, h_train, l_train), (p_dev, h_dev, l_dev), (p_test, h_test, l_test) = farstail.load_data()
  • Retrieving a dict mapping words to their index in the IMDB dataset.
from farsfail.datasets import farstail
farstail_word_index = farstail.get_word_index()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

farstail-1.0.2.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

farstail-1.0.2-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file farstail-1.0.2.tar.gz.

File metadata

  • Download URL: farstail-1.0.2.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for farstail-1.0.2.tar.gz
Algorithm Hash digest
SHA256 3a043acdbe62f66df72f9630980524cd961341241b507819c0144c54385fcf45
MD5 34c021c57e5bbffcd23d34b02aeaf792
BLAKE2b-256 cf76415704af7c56f294b71d326b0bc45769aec03ba9acec15a2398e1fa70855

See more details on using hashes here.

File details

Details for the file farstail-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: farstail-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for farstail-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 25bc05faada18fe3090ae902946ac28bd66c9a4421517e774361fa3ced7f70a7
MD5 618623c3034fef2dcfa112805308d5c0
BLAKE2b-256 16eb0e20089fab14f8636fc7ba3a741d06aed03a628dfbaebce57e671eea7a59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page