Skip to main content

An AI-backed name parsing package for Middle-Eastern and Other languages.

Project description

MEHDIE Arabic Name Parser

This is an AI-backed name parser that parses names into their constituent parts. The parser employs an Open-AI GPT Chat
model that is prompted with explanations on how to parse names of the specific chosen language, script and sometimes historical period. The GPT model is also given a langauge and script specific example. The response is parsed into a structured response by the phi.agent framework and some post-processing rules are applied to handle some of the common mistakes the agent makes.

The parser was developed as part of the MEHDIE project- https://mehdie.org/. the mehdie logo is a line-drawn M in several similar lines symbolizing the similarity and distincness of the middle-eastern languages)

MEHDIE is funded by the Israel Ministry of Science and Technology MOST. The symbol of the state of Israel, a Menora with two olive branches on the sides.)

Usage

The parser can be used to parse a single name or a given tab-seperated file containing names.

Set up

  1. Choose which language and script to parse from. Supported language-script combinations can be shown by running: print(parse.get_supported_languages())
  2. Choose an openAI model to use. The model string needs to be one of the valid models specified in the OpenAI API

Parsing a single name

from parse_me.parse import parse_name

result = parse_name(name="Abū Ayyūb Sulaymān b. Yaḥyā b. Ǧabīrūl al-Qurṭubī", language="arL",
                    background_info="A fighter and a poet", model_name="gpt-o1-mini")

Parsing a file with names

from parse_me.parse import parse_tsv

result_file = parse_tsv(tsv_file='/data/names.tsv', column_name='person_name', language='he', background_column_name='description',
              model_name='gpt-4o')

Contributing

We invite users to contribute new prompts and examples for existing and new languages, scripts and historical periods. Just edit the parsing_prompts and open a pull request or open an issue with suggested additional post-processing rules or encountered mistakes the parser made.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parse_me-0.1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parse_me-0.1.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file parse_me-0.1.0.tar.gz.

File metadata

  • Download URL: parse_me-0.1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for parse_me-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c4a5dbf6c8bdcd382c53e7642658da3b6c95fe1a6834ac53b22a3b592e2cd848
MD5 5e8f0ff4ccbc31b3910f85974fef460b
BLAKE2b-256 530ddd0022b23905131e4dfe626ad85dee2d2df9ae49206a0690fa191912a582

See more details on using hashes here.

File details

Details for the file parse_me-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: parse_me-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for parse_me-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0632da236bd71e6e978b873e15004c5b2c1f205dfacfa0d711461d276fc042b7
MD5 386e862f17cafa51307b8d3fb30faa6c
BLAKE2b-256 a612678615d93a88a89b7a80324c1764d02a234e4165fc36d39f2118f77da1be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page