Skip to main content

Persian time and date marker extractor

Project description

parstdex (persian time date extractor) - پارس تی‌دِکس

Pypi Package Documentation Status Hugging Face Spaces Google Colab

How to Install parstdex

pip install parstdex

How to use

from parstdex import Parstdex

model = Parstdex()

sentence = """ماریا شنبه عصر راس ساعت ۱۷ و بیست و سه دقیقه به نادیا زنگ زد اما تا سه روز بعد در تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش. خبری از نادیا نشد"""

Extract spans

model.extract_span(sentence)

output :

{"datetime": [[6, 47], [68, 78], [82, 111]], "date": [[6, 10], [68, 78], [82, 111]], "time": [[11, 47]]}

Extract markers

model.extract_marker(sentence)
{
   "datetime":{
      "[6, 47]":"شنبه عصر راس ساعت ۱۷ و بیست و سه دقیقه به",
      "[68, 78]":"سه روز بعد",
      "[82, 111]":"تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش."
   },
   "date":{
      "[6, 10]":"شنبه",
      "[68, 78]":"سه روز بعد",
      "[82, 111]":"تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش."
   },
   "time":{
      "[11, 47]":"عصر راس ساعت ۱۷ و بیست و سه دقیقه به"
   }
}

Extract TimeML scheme

model.extract_time_ml(sentence)

output :

ماریا 
<TIMEX3 type='DATE'>
شنبه
</TIMEX3>
<TIMEX3 type='TIME'>
عصر راس ساعت ۱۷ و بیست و سه دقیقه به
</TIMEX3>
 نادیا زنگ زد اما 
<TIMEX3 type='DURATION'>
تا سه روز بعد
</TIMEX3>
 در 
<TIMEX3 type='DATE'>
تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش.
</TIMEX3>
خبری از نادیا نشد

Extract markers' NER tags

DATTIM mode (Default):

model.extract_ner(sentence, mode="dattim")

output :

[
    ("ماریا", "O"),
    ("شنبه", "B-DAT"),
    ("عصر", "B-TIM"),
    ("راس", "I-TIM"),
    ("ساعت", "I-TIM"),
    ("۱۷", "I-TIM"),
    ("و", "I-TIM"),
    ("بیست", "I-TIM"),
    ("و", "I-TIM"),
    ("سه", "I-TIM"),
    ("دقیقه", "I-TIM"),
    ("به", "I-TIM"),
    ("نادیا", "O"),
    ("زنگ", "O"),
    ("زد", "O"),
    ("اما", "O"),
    ("تا", "B-DAT"),
    ("سه", "I-DAT"),
    ("روز", "I-DAT"),
    ("بعد", "I-DAT"),
    ("در", "I-DAT"),
    ("تاریخ", "I-DAT"),
    ("۱۸", "I-DAT"),
    ("شهریور", "I-DAT"),
    ("سال", "I-DAT"),
    ("۱۳۷۸", "I-DAT"),
    ("ه", "I-DAT"),
    (".", "I-DAT"),
    ("ش", "I-DAT"),
    (".", "I-DAT"),
    ("خبری", "O"),
    ("از", "O"),
    ("نادیا", "O"),
    ("نشد", "O"),
]

TMP mode:

model.extract_ner(sentence, mode="tmp")

output :

[
    ("ماریا", "O"),
    ("شنبه", "B-TMP"),
    ("عصر", "I-TMP"),
    ("راس", "I-TMP"),
    ("ساعت", "I-TMP"),
    ("۱۷", "I-TMP"),
    ("و", "I-TMP"),
    ("بیست", "I-TMP"),
    ("و", "I-TMP"),
    ("سه", "I-TMP"),
    ("دقیقه", "I-TMP"),
    ("به", "I-TMP"),
    ("نادیا", "O"),
    ("زنگ", "O"),
    ("زد", "O"),
    ("اما", "O"),
    ("تا", "B-TMP"),
    ("سه", "I-TMP"),
    ("روز", "I-TMP"),
    ("بعد", "I-TMP"),
    ("در", "I-TMP"),
    ("تاریخ", "I-TMP"),
    ("۱۸", "I-TMP"),
    ("شهریور", "I-TMP"),
    ("سال", "I-TMP"),
    ("۱۳۷۸", "I-TMP"),
    ("ه", "I-TMP"),
    (".", "I-TMP"),
    ("ش", "I-TMP"),
    (".", "I-TMP"),
    ("خبری", "O"),
    ("از", "O"),
    ("نادیا", "O"),
    ("نشد", "O"),
]


File Structure:

Parstdex architecture is very flexible and scalable and therefore suggests an easy solution to adapt to new patterns which haven't been considered yet.

├── parstdex                 
│   └── utils
|   |   └── annotation
|   |   |   └── ...
|   |   └── pattern
|   |   |   └── ...
|   |   └── special_words
|   |   |   └── words.txt
|   |   └── const.py
|   |   └── normalizer.py
|   |   └── pattern_to_regex.py
|   |   └── deprecation.py
|   |   └── regex_tool.py
|   |   └── spans.py
|   |   └── tokenizer.py
|   └── marker_extractor.py
|   └── settings.py
└── Test           
│   └── data.json
|   └── test_parstdex.py
|      
└── examples.py
└── performance_test.ipynb
└── requirement.txt
└── setup.py

Performance Test

Executable codes and performance test results are accessible on google colab.

The average time required to obtain temporal expressions is 6 ms. This test was conducted using 264 sentences with an average length of 50 characters that covered all of the patterns.

How to contribute

Please feel free to provide us with any feedback or suggestions. You can find more information on how to contribute to Parstdex by reading the contribution document.

Citation

If you use any part of this library in your research, please cite it using the following BibTex entry.

@misc{parstdex,
  author = {Kargaran, Amir Hossein and Mirzababaei, Sajad and Jahad, Hamid},
  title = {Parstdex: Persian Time Date Extractor Python Library},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/kargaranamir/parstdex}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parstdex-1.3.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

parstdex-1.3.1-py3-none-any.whl (44.9 kB view details)

Uploaded Python 3

File details

Details for the file parstdex-1.3.1.tar.gz.

File metadata

  • Download URL: parstdex-1.3.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for parstdex-1.3.1.tar.gz
Algorithm Hash digest
SHA256 f42732d860c9550aaf79935c12a4e4c7e9f325ca574f54e49565ea400c2789f1
MD5 379031ea61d7c0671b4e178b25b77f97
BLAKE2b-256 120c6fd97128935aac59118911a28004f688ab96f2129fc0d5fb9f28112bfe7f

See more details on using hashes here.

File details

Details for the file parstdex-1.3.1-py3-none-any.whl.

File metadata

  • Download URL: parstdex-1.3.1-py3-none-any.whl
  • Upload date:
  • Size: 44.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for parstdex-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8f65190106ced7c4e9b049c559c430af5895eb1f19f83ac17dac4a7e3bc8e922
MD5 e824a406511812403e17c75cce61025c
BLAKE2b-256 ca81fc139f1f1f8c0c55151411e5671878924b68a43f4866138a1ee0cbc12b4a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page