Enhanced parser for medical professional names with advanced NLP methods, supporting middle names, generational suffixes, and credentials
Project description
๐ฅ ProbableDoctor
Enhanced Medical Professional Name Parser with advanced multi-credential support.
ProbableDoctor is an enhanced Python library for parsing medical professional names with proper handling of multiple comma-separated credentials and titles. Built on top of the robust probablepeople library, it provides specialized parsing for healthcare professionals.
โจ Key Features
- ๐ Multi-Credential Parsing: Properly handles multiple credentials like "MD, PhD, FACP"
- ๐จโโ๏ธ Medical Titles: Recognizes medical prefixes and professional titles
- ๐ Structured Output: Returns parsed components as dictionaries for easy processing
- ๐ Backwards Compatible: Works with existing
probablepeoplecode - ๐ฏ High Accuracy: Uses advanced NLP and machine learning
๐ Quick Start
Installation
pip install probabledoctor
Basic Usage
import probabledoctor
# Parse a medical professional's name
result, name_type = probabledoctor.tag("Dr. Sarah Johnson MD, PhD, FACP")
print(result)
# Output: {
# 'PrefixOther': 'Dr.',
# 'GivenName': 'Sarah',
# 'Surname': 'Johnson',
# 'SuffixOther': 'MD, PhD, FACP'
# }
print(f"Name type: {name_type}")
# Output: Name type: Person
CLI Usage
# Parse and tag a name
probabledoctor "Dr. Jane Doe MD" --tag
# Parse without tagging
probabledoctor "Dr. Jane Doe MD"
# Use specific model type
probabledoctor "Smith Medical Corp" --type company --tag
๐ Examples
Multiple Credentials
import probabledoctor
names = [
"Taylor Anne Jordan ATC, LAT",
"Dr. Sarah Johnson MD, PhD, FACP",
"Michael Smith RN, BSN, CEN",
"David Wilson PhD, MD"
]
for name in names:
result, name_type = probabledoctor.tag(name)
credentials = result.get('SuffixOther', '')
full_name = f"{result.get('GivenName', '')} {result.get('Surname', '')}"
print(f"๐ค {full_name}")
print(f"๐ Credentials: {credentials}")
print(f"๐ Type: {name_type}")
print()
Output:
๐ค Taylor Jordan
๐ Credentials: ATC, LAT
๐ Type: Person
๐ค Sarah Johnson
๐ Credentials: MD, PhD, FACP
๐ Type: Person
๐ค Michael Smith
๐ Credentials: RN, BSN, CEN
๐ Type: Person
๐ค David Wilson
๐ Credentials: PhD, MD
๐ Type: Person
Backwards Compatibility
ProbableDoctor maintains full compatibility with probablepeople:
import probabledoctor as pp
# Use standard probablepeople functions
result = pp.parse("John Smith")
tagged = pp.tag("Dr. Jane Doe MD")
๐๏ธ What's Enhanced?
| Feature | probablepeople | probabledoctor |
|---|---|---|
| Basic name parsing | โ | โ |
| Single credentials | โ | โ |
| Multiple credentials | โ Returns as string | โ Returns as string |
| Medical titles | โ ๏ธ Limited | โ Enhanced |
| Complex credentials | โ "MD, PhD, FACP" | โ "MD, PhD, FACP" |
โ ๏ธ Known Limitations
- Parsing of trailing initials: In some cases, names with a trailing initial (e.g., "John Doe A") might have the initial misclassified. For example, running
probabledoctor "Kashani Jamshid A" --tagmay currently identify "A" asSuffixOtherinstead of aMiddleInitialor as part of the main name components. This is related to the intricacies of the statistical model used for parsing. Efforts to improve accuracy for these patterns are ongoing.
๐ฏ Use Cases
- Healthcare Systems: Parse doctor names from databases
- Medical Records: Extract credentials from patient records
- Research: Analyze medical professional credentials
- HR Systems: Process healthcare worker information
- Compliance: Verify medical professional credentials
๐ Advanced Usage
Custom Model Types
# Use different parsing models
result = probabledoctor.tag("Dr. Smith", type="person")
result = probabledoctor.tag("Smith Medical Corp", type="company")
Integration with Existing Code
# Drop-in replacement for probablepeople
import probabledoctor as pp
# Your existing probablepeople code works unchanged
names = ["John Smith", "Dr. Jane Doe MD"]
for name in names:
parsed = pp.parse(name)
tagged = pp.tag(name)
Requirements
- Python 3.9+
- python-crfsuite>=0.7
- probableparsing
- doublemetaphone
Installation from Source
git clone https://github.com/atk81-candor/probabledoctor
cd probabledoctor
pip install -e .
Running Tests
pytest tests/
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ Support
- ๐ Documentation
- ๐ Issue Tracker
- ๐ฌ Discussions
๐ Acknowledgments
Built on top of the excellent probablepeople library by DataMade.
Made with โค๏ธ for the healthcare community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file probabledoctor-0.1.0.tar.gz.
File metadata
- Download URL: probabledoctor-0.1.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76c6d023aa4cda2bfbf4920dd454d1f0c341cb4477cb84cdee84880722db3446
|
|
| MD5 |
bc628daea05e74995c3f733ba9bdaa32
|
|
| BLAKE2b-256 |
a01774c311a28a21696e6ab144be4de803d2dcb173d6832cbef1f74a54ba3563
|
Provenance
The following attestation bundles were made for probabledoctor-0.1.0.tar.gz:
Publisher:
python-publish.yml on atk81-candor/probabledoctor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
probabledoctor-0.1.0.tar.gz -
Subject digest:
76c6d023aa4cda2bfbf4920dd454d1f0c341cb4477cb84cdee84880722db3446 - Sigstore transparency entry: 232901455
- Sigstore integration time:
-
Permalink:
atk81-candor/probabledoctor@916322c5f90f9d9d577673bdfd4dca0bbee78a70 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/atk81-candor
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@916322c5f90f9d9d577673bdfd4dca0bbee78a70 -
Trigger Event:
push
-
Statement type:
File details
Details for the file probabledoctor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: probabledoctor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 904.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a228db54461fde8c2865e9b3f4a76d262e0a171b4bf7f93d86d681b0538b3e34
|
|
| MD5 |
ae93c0090910019debf0fd7e085dd340
|
|
| BLAKE2b-256 |
5f7931e81abe6addff7b134f9ac8fe5268bae67aeba5c6d962d532ec5cb8663a
|
Provenance
The following attestation bundles were made for probabledoctor-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on atk81-candor/probabledoctor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
probabledoctor-0.1.0-py3-none-any.whl -
Subject digest:
a228db54461fde8c2865e9b3f4a76d262e0a171b4bf7f93d86d681b0538b3e34 - Sigstore transparency entry: 232901469
- Sigstore integration time:
-
Permalink:
atk81-candor/probabledoctor@916322c5f90f9d9d577673bdfd4dca0bbee78a70 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/atk81-candor
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@916322c5f90f9d9d577673bdfd4dca0bbee78a70 -
Trigger Event:
push
-
Statement type: