Persian time and date marker extractor
Project description
parstdex (persian time date extractor) - پارس تیدِکس
How to Install parstdex
pip install parstdex
How to use
from parstdex import Parstdex
model = Parstdex()
sentence = """ماریا شنبه عصر راس ساعت ۱۷ و بیست و سه دقیقه به نادیا زنگ زد اما تا سه روز بعد در تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش. خبری از نادیا نشد"""
Extract spans
model.extract_span(sentence)
output :
{"datetime": [[6, 47], [68, 78], [82, 111]], "date": [[6, 10], [68, 78], [82, 111]], "time": [[11, 47]]}
Extract markers
model.extract_marker(sentence)
{
"datetime":{
"[6, 47]":"شنبه عصر راس ساعت ۱۷ و بیست و سه دقیقه به",
"[68, 78]":"سه روز بعد",
"[82, 111]":"تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش."
},
"date":{
"[6, 10]":"شنبه",
"[68, 78]":"سه روز بعد",
"[82, 111]":"تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش."
},
"time":{
"[11, 47]":"عصر راس ساعت ۱۷ و بیست و سه دقیقه به"
}
}
Extract TimeML scheme
model.extract_time_ml(sentence)
output :
ماریا
<TIMEX3 type='DATE'>
شنبه
</TIMEX3>
<TIMEX3 type='TIME'>
عصر راس ساعت ۱۷ و بیست و سه دقیقه به
</TIMEX3>
نادیا زنگ زد اما
<TIMEX3 type='DURATION'>
تا سه روز بعد
</TIMEX3>
در
<TIMEX3 type='DATE'>
تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش.
</TIMEX3>
خبری از نادیا نشد
Extract markers' NER tags
DATTIM mode (Default):
model.extract_ner(sentence, mode="dattim")
output :
[
("ماریا", "O"),
("شنبه", "B-DAT"),
("عصر", "B-TIM"),
("راس", "I-TIM"),
("ساعت", "I-TIM"),
("۱۷", "I-TIM"),
("و", "I-TIM"),
("بیست", "I-TIM"),
("و", "I-TIM"),
("سه", "I-TIM"),
("دقیقه", "I-TIM"),
("به", "I-TIM"),
("نادیا", "O"),
("زنگ", "O"),
("زد", "O"),
("اما", "O"),
("تا", "B-DAT"),
("سه", "I-DAT"),
("روز", "I-DAT"),
("بعد", "I-DAT"),
("در", "I-DAT"),
("تاریخ", "I-DAT"),
("۱۸", "I-DAT"),
("شهریور", "I-DAT"),
("سال", "I-DAT"),
("۱۳۷۸", "I-DAT"),
("ه", "I-DAT"),
(".", "I-DAT"),
("ش", "I-DAT"),
(".", "I-DAT"),
("خبری", "O"),
("از", "O"),
("نادیا", "O"),
("نشد", "O"),
]
TMP mode:
model.extract_ner(sentence, mode="tmp")
output :
[
("ماریا", "O"),
("شنبه", "B-TMP"),
("عصر", "I-TMP"),
("راس", "I-TMP"),
("ساعت", "I-TMP"),
("۱۷", "I-TMP"),
("و", "I-TMP"),
("بیست", "I-TMP"),
("و", "I-TMP"),
("سه", "I-TMP"),
("دقیقه", "I-TMP"),
("به", "I-TMP"),
("نادیا", "O"),
("زنگ", "O"),
("زد", "O"),
("اما", "O"),
("تا", "B-TMP"),
("سه", "I-TMP"),
("روز", "I-TMP"),
("بعد", "I-TMP"),
("در", "I-TMP"),
("تاریخ", "I-TMP"),
("۱۸", "I-TMP"),
("شهریور", "I-TMP"),
("سال", "I-TMP"),
("۱۳۷۸", "I-TMP"),
("ه", "I-TMP"),
(".", "I-TMP"),
("ش", "I-TMP"),
(".", "I-TMP"),
("خبری", "O"),
("از", "O"),
("نادیا", "O"),
("نشد", "O"),
]
File Structure:
Parstdex architecture is very flexible and scalable and therefore suggests an easy solution to adapt to new patterns which haven't been considered yet.
├── parstdex
│ └── utils
| | └── annotation
| | | └── ...
| | └── pattern
| | | └── ...
| | └── special_words
| | | └── words.txt
| | └── const.py
| | └── normalizer.py
| | └── pattern_to_regex.py
| | └── deprecation.py
| | └── regex_tool.py
| | └── spans.py
| | └── tokenizer.py
| └── marker_extractor.py
| └── settings.py
└── Test
│ └── data.json
| └── test_parstdex.py
|
└── examples.py
└── performance_test.ipynb
└── requirement.txt
└── setup.py
Performance Test
Executable codes and performance test results are accessible on google colab.
The average time required to obtain temporal expressions is 6 ms
. This test was conducted using 264 sentences with an average length of 50 characters that covered all of the patterns.
How to contribute
Please feel free to provide us with any feedback or suggestions. You can find more information on how to contribute to Parstdex by reading the contribution document.
Citation
If you use any part of this library in your research, please cite it using the following BibTex entry.
@misc{parstdex,
author = {Kargaran, Amir Hossein and Mirzababaei, Sajad and Jahad, Hamid},
title = {Parstdex: Persian Time Date Extractor Python Library},
year = {2021},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/kargaranamir/parstdex}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parstdex-1.3.1.tar.gz
.
File metadata
- Download URL: parstdex-1.3.1.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f42732d860c9550aaf79935c12a4e4c7e9f325ca574f54e49565ea400c2789f1 |
|
MD5 | 379031ea61d7c0671b4e178b25b77f97 |
|
BLAKE2b-256 | 120c6fd97128935aac59118911a28004f688ab96f2129fc0d5fb9f28112bfe7f |
File details
Details for the file parstdex-1.3.1-py3-none-any.whl
.
File metadata
- Download URL: parstdex-1.3.1-py3-none-any.whl
- Upload date:
- Size: 44.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f65190106ced7c4e9b049c559c430af5895eb1f19f83ac17dac4a7e3bc8e922 |
|
MD5 | e824a406511812403e17c75cce61025c |
|
BLAKE2b-256 | ca81fc139f1f1f8c0c55151411e5671878924b68a43f4866138a1ee0cbc12b4a |