Skip to main content

Regex-based parser for structuring academic author affiliation strings.

Project description

Affiliation Regex Parser

A lightweight, regex-based Python library for parsing academic author affiliation strings into structured fields (departments, institutes, universities, organizations, cities, countries, emails, postcodes, and unknown segments).

Examples and usage patterns are documented in cookbook.ipynb.

Install

Core install:

pip install affiliation-regex-parser

Optional Excel support (for worldcities.xlsx-based city→country inference):

pip install affiliation-regex-parser[excel]

Features

  • Regex-first, deterministic parsing (no ML dependencies).
  • Pluggable architecture via providers (e.g., custom city lists, custom inference).
  • Optional city→country inference using a world cities dataset.
  • Designed for batch parsing (reuse one parser instance).

Data attribution (worldcities.xlsx)

This project can use a worldcities.xlsx dataset sourced from SimpleMaps World Cities (free version), licensed under Creative Commons Attribution 4.0 (CC BY 4.0).

Source:

https://simplemaps.com/data/world-cities

If you redistribute the dataset with this package, ensure attribution is preserved per CC BY 4.0.

Documentation

  • See cookbook.ipynb for practical examples and common configurations.
  • See the docstrings in AffiliationRegexParser and provider classes for API details.

License

MIT License. See LICENSE.

Project status

This package is under active development; the public API aims to remain stable, but outputs may improve over time as patterns and fixtures evolve.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

affiliation_regex_parser-0.1.0.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

affiliation_regex_parser-0.1.0-py3-none-any.whl (3.5 MB view details)

Uploaded Python 3

File details

Details for the file affiliation_regex_parser-0.1.0.tar.gz.

File metadata

File hashes

Hashes for affiliation_regex_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 06fd594fcefab55cbaf6fda59840cb4aa02dace2e2d35fe6ca13fb731ae7cc09
MD5 02a5ac6236415089dd267c64ff32d31d
BLAKE2b-256 09b25bbcdf9ce852b97d455c67fac02772465c317f37c49afd31b663afc74ebf

See more details on using hashes here.

File details

Details for the file affiliation_regex_parser-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for affiliation_regex_parser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8587171ddfbd382560f7743bcf445bc636cd9ac7a532b9265f7cbd2fab0c0bfe
MD5 fa88ad1e02e350e287227acc2b3b5781
BLAKE2b-256 d603876f638dbe746c9848c10328f0d5bcdb4cd460bd847acf8b07f68781d2f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page