An AI-backed name parsing package for Middle-Eastern and Other languages.
Project description
MEHDIE Arabic Name Parser
This is an AI-backed name parser that parses names into their constituent parts. The parser employs an Open-AI GPT Chat
model that is prompted with explanations on how to parse names of the specific chosen language, script and sometimes
historical period. The GPT model is also given a langauge and script specific example.
The response is parsed into a structured response by the phi.agent framework and some post-processing rules are applied
to handle some of the common mistakes the agent makes.
The parser was developed as part of the MEHDIE project- https://mehdie.org/. )
MEHDIE is funded by the Israel Ministry of Science and Technology MOST. )
Usage
The parser can be used to parse a single name or a given tab-seperated file containing names.
Set up
- Choose which language and script to parse from. Supported language-script combinations can be shown by running:
print(parse.get_supported_languages()) - Choose an AI model to use. The model string needs to be one of the valid models specified in the OpenAI API or the Anthropic API.
- Set an environment variable for your chosen AI provider's API key (e.g.
export OPENAI_API_KEY=your-api-keyorexport ANTHROPIC_API_KEY=your-api-key).
Parsing a single name
from parse_me.parse import parse_name
result = parse_name(name="Abū Ayyūb Sulaymān b. Yaḥyā b. Ǧabīrūl al-Qurṭubī", language="arL",
background_info="A fighter and a poet", model_name="gpt-o1-mini")
Use the background_info parameter to provide additional context that can help the parser understand the name better.
Use the model_name parameter to specify the AI model to use, e.g. gpt-4o or claude-3-5-sonnet-latest.
This model should match the API key you have set up.
Parsing a file with names
from parse_me.parse import parse_tsv
result_file = parse_tsv(tsv_file='/data/names.tsv', column_name='person_name', language='he', background_column_name='description',
model_name='gpt-4o')
Contributing
We invite users to contribute new prompts and examples for existing and new languages, scripts and historical periods. Just edit the parsing_prompts and open a pull request or open an issue with suggested additional post-processing rules or encountered mistakes the parser made.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parse_me-0.2.0.tar.gz.
File metadata
- Download URL: parse_me-0.2.0.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39723d016ea7f32e568a71c5640d55c0d2ced28a168ce511e558a1e1e4ed07e3
|
|
| MD5 |
bbe6e85c3d25666f83cb6339bd14c077
|
|
| BLAKE2b-256 |
d3e57f3de7e11bd4b6e43354515baedaee2b8ee6dea8cb3a3115466c0e387848
|
File details
Details for the file parse_me-0.2.0-py3-none-any.whl.
File metadata
- Download URL: parse_me-0.2.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f24feb9d6edee481999ede83cbe8c96302f3c6d56412bebf4d00191bef59c67b
|
|
| MD5 |
47f93dbd35e0c57d71888da4fd004bb8
|
|
| BLAKE2b-256 |
b2ca9a6d541d3484b3307760374adb34614e5b389e66da405df771b707107ba1
|