Skip to main content

An AI-backed name parsing package for Middle-Eastern and Other languages.

Project description

MEHDIE Arabic Name Parser

This is an AI-backed name parser that parses names into their constituent parts. The parser employs an Open-AI GPT Chat
model that is prompted with explanations on how to parse names of the specific chosen language, script and sometimes historical period. The GPT model is also given a langauge and script specific example. The response is parsed into a structured response by the phi.agent framework and some post-processing rules are applied to handle some of the common mistakes the agent makes.

The parser was developed as part of the MEHDIE project- https://mehdie.org/. the mehdie logo is a line-drawn M in several similar lines symbolizing the similarity and distincness of the middle-eastern languages)

MEHDIE is funded by the Israel Ministry of Science and Technology MOST. The symbol of the state of Israel, a Menora with two olive branches on the sides.)

Usage

The parser can be used to parse a single name or a given tab-seperated file containing names.

Set up

  1. Choose which language and script to parse from. Supported language-script combinations can be shown by running: print(parse.get_supported_languages())
  2. Choose an AI model to use. The model string needs to be one of the valid models specified in the OpenAI API or the Anthropic API.
  3. Set an environment variable for your chosen AI provider's API key (e.g. export OPENAI_API_KEY=your-api-key or export ANTHROPIC_API_KEY=your-api-key).

Parsing a single name

from parse_me.parse import parse_name

result = parse_name(name="Abū Ayyūb Sulaymān b. Yaḥyā b. Ǧabīrūl al-Qurṭubī", language="arL",
                    background_info="A fighter and a poet", model_name="gpt-o1-mini")

Use the background_info parameter to provide additional context that can help the parser understand the name better. Use the model_name parameter to specify the AI model to use, e.g. gpt-4o or claude-3-5-sonnet-latest. This model should match the API key you have set up.

Parsing a file with names

from parse_me.parse import parse_tsv

result_file = parse_tsv(tsv_file='/data/names.tsv', column_name='person_name', language='he', background_column_name='description',
              model_name='gpt-4o')

Contributing

We invite users to contribute new prompts and examples for existing and new languages, scripts and historical periods. Just edit the parsing_prompts and open a pull request or open an issue with suggested additional post-processing rules or encountered mistakes the parser made.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parse_me-0.2.0.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parse_me-0.2.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file parse_me-0.2.0.tar.gz.

File metadata

  • Download URL: parse_me-0.2.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for parse_me-0.2.0.tar.gz
Algorithm Hash digest
SHA256 39723d016ea7f32e568a71c5640d55c0d2ced28a168ce511e558a1e1e4ed07e3
MD5 bbe6e85c3d25666f83cb6339bd14c077
BLAKE2b-256 d3e57f3de7e11bd4b6e43354515baedaee2b8ee6dea8cb3a3115466c0e387848

See more details on using hashes here.

File details

Details for the file parse_me-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: parse_me-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for parse_me-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f24feb9d6edee481999ede83cbe8c96302f3c6d56412bebf4d00191bef59c67b
MD5 47f93dbd35e0c57d71888da4fd004bb8
BLAKE2b-256 b2ca9a6d541d3484b3307760374adb34614e5b389e66da405df771b707107ba1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page