Regex-based parser for structuring academic author affiliation strings.
Project description
Affiliation Regex Parser
A lightweight, regex-based Python library for parsing academic author affiliation strings into structured fields (departments, institutes, universities, organizations, cities, countries, emails, postcodes, and unknown segments).
Examples and usage patterns are documented in cookbook.ipynb.
Install
Core install:
pip install affiliation-regex-parser
Optional Excel support (for worldcities.xlsx-based city→country inference):
pip install affiliation-regex-parser[excel]
Features
- Regex-first, deterministic parsing (no ML dependencies).
- Pluggable architecture via providers (e.g., custom city lists, custom inference).
- Optional city→country inference using a world cities dataset.
- Designed for batch parsing (reuse one parser instance).
Data attribution (worldcities.xlsx)
This project can use a worldcities.xlsx dataset sourced from SimpleMaps World Cities (free version), licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
Source:
https://simplemaps.com/data/world-cities
If you redistribute the dataset with this package, ensure attribution is preserved per CC BY 4.0.
Documentation
- See
cookbook.ipynbfor practical examples and common configurations. - See the docstrings in
AffiliationRegexParserand provider classes for API details.
License
MIT License. See LICENSE.
Project status
This package is under active development; the public API aims to remain stable, but outputs may improve over time as patterns and fixtures evolve.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file affiliation_regex_parser-0.1.0.tar.gz.
File metadata
- Download URL: affiliation_regex_parser-0.1.0.tar.gz
- Upload date:
- Size: 3.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06fd594fcefab55cbaf6fda59840cb4aa02dace2e2d35fe6ca13fb731ae7cc09
|
|
| MD5 |
02a5ac6236415089dd267c64ff32d31d
|
|
| BLAKE2b-256 |
09b25bbcdf9ce852b97d455c67fac02772465c317f37c49afd31b663afc74ebf
|
File details
Details for the file affiliation_regex_parser-0.1.0-py3-none-any.whl.
File metadata
- Download URL: affiliation_regex_parser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8587171ddfbd382560f7743bcf445bc636cd9ac7a532b9265f7cbd2fab0c0bfe
|
|
| MD5 |
fa88ad1e02e350e287227acc2b3b5781
|
|
| BLAKE2b-256 |
d603876f638dbe746c9848c10328f0d5bcdb4cd460bd847acf8b07f68781d2f4
|