Skip to main content

Enhanced parser for medical professional names with advanced NLP methods, supporting middle names, generational suffixes, and credentials

Project description

๐Ÿฅ ProbableDoctor

PyPI version License: MIT

Enhanced Medical Professional Name Parser with advanced multi-credential support.

ProbableDoctor is an enhanced Python library for parsing medical professional names with proper handling of multiple comma-separated credentials and titles. Built on top of the robust probablepeople library, it provides specialized parsing for healthcare professionals.

โœจ Key Features

  • ๐ŸŽ“ Multi-Credential Parsing: Properly handles multiple credentials like "MD, PhD, FACP"
  • ๐Ÿ‘จโ€โš•๏ธ Medical Titles: Recognizes medical prefixes and professional titles
  • ๐Ÿ“‹ Structured Output: Returns parsed components as dictionaries for easy processing
  • ๐Ÿ”„ Backwards Compatible: Works with existing probablepeople code
  • ๐ŸŽฏ High Accuracy: Uses advanced NLP and machine learning

๐Ÿš€ Quick Start

Installation

pip install probabledoctor

Basic Usage

import probabledoctor

# Parse a medical professional's name
result, name_type = probabledoctor.tag("Dr. Sarah Johnson MD, PhD, FACP")

print(result)
# Output: {
#     'PrefixOther': 'Dr.',
#     'GivenName': 'Sarah', 
#     'Surname': 'Johnson',
#     'SuffixOther': 'MD, PhD, FACP'
# }

print(f"Name type: {name_type}")
# Output: Name type: Person

CLI Usage

# Parse and tag a name
probabledoctor "Dr. Jane Doe MD" --tag

# Parse without tagging
probabledoctor "Dr. Jane Doe MD"

# Use specific model type
probabledoctor "Smith Medical Corp" --type company --tag

๐Ÿ“Š Examples

Multiple Credentials

import probabledoctor

names = [
    "Taylor Anne Jordan ATC, LAT",
    "Dr. Sarah Johnson MD, PhD, FACP", 
    "Michael Smith RN, BSN, CEN",
    "David Wilson PhD, MD"
]

for name in names:
    result, name_type = probabledoctor.tag(name)
    credentials = result.get('SuffixOther', '')
    full_name = f"{result.get('GivenName', '')} {result.get('Surname', '')}"
    
    print(f"๐Ÿ‘ค {full_name}")
    print(f"๐ŸŽ“ Credentials: {credentials}")
    print(f"๐Ÿ“‹ Type: {name_type}")
    print()

Output:

๐Ÿ‘ค Taylor Jordan
๐ŸŽ“ Credentials: ATC, LAT
๐Ÿ“‹ Type: Person

๐Ÿ‘ค Sarah Johnson  
๐ŸŽ“ Credentials: MD, PhD, FACP
๐Ÿ“‹ Type: Person

๐Ÿ‘ค Michael Smith
๐ŸŽ“ Credentials: RN, BSN, CEN
๐Ÿ“‹ Type: Person

๐Ÿ‘ค David Wilson
๐ŸŽ“ Credentials: PhD, MD
๐Ÿ“‹ Type: Person

Backwards Compatibility

ProbableDoctor maintains full compatibility with probablepeople:

import probabledoctor as pp

# Use standard probablepeople functions
result = pp.parse("John Smith")
tagged = pp.tag("Dr. Jane Doe MD")

๐Ÿ—๏ธ What's Enhanced?

Feature probablepeople probabledoctor
Basic name parsing โœ… โœ…
Single credentials โœ… โœ…
Multiple credentials โŒ Returns as string โœ… Returns as string
Medical titles โš ๏ธ Limited โœ… Enhanced
Complex credentials โŒ "MD, PhD, FACP" โœ… "MD, PhD, FACP"

โš ๏ธ Known Limitations

  • Parsing of trailing initials: In some cases, names with a trailing initial (e.g., "John Doe A") might have the initial misclassified. For example, running probabledoctor "Kashani Jamshid A" --tag may currently identify "A" as SuffixOther instead of a MiddleInitial or as part of the main name components. This is related to the intricacies of the statistical model used for parsing. Efforts to improve accuracy for these patterns are ongoing.

๐ŸŽฏ Use Cases

  • Healthcare Systems: Parse doctor names from databases
  • Medical Records: Extract credentials from patient records
  • Research: Analyze medical professional credentials
  • HR Systems: Process healthcare worker information
  • Compliance: Verify medical professional credentials

๐Ÿ“š Advanced Usage

Custom Model Types

# Use different parsing models
result = probabledoctor.tag("Dr. Smith", type="person")
result = probabledoctor.tag("Smith Medical Corp", type="company")

Integration with Existing Code

# Drop-in replacement for probablepeople
import probabledoctor as pp

# Your existing probablepeople code works unchanged
names = ["John Smith", "Dr. Jane Doe MD"]
for name in names:
    parsed = pp.parse(name)
    tagged = pp.tag(name)

Requirements

  • Python 3.9+
  • python-crfsuite>=0.7
  • probableparsing
  • doublemetaphone

Installation from Source

git clone https://github.com/atk81-candor/probabledoctor
cd probabledoctor
pip install -e .

Running Tests

pytest tests/

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ†˜ Support

๐Ÿ™ Acknowledgments

Built on top of the excellent probablepeople library by DataMade.


Made with โค๏ธ for the healthcare community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probabledoctor-0.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

probabledoctor-0.1.0-py3-none-any.whl (904.4 kB view details)

Uploaded Python 3

File details

Details for the file probabledoctor-0.1.0.tar.gz.

File metadata

  • Download URL: probabledoctor-0.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for probabledoctor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 76c6d023aa4cda2bfbf4920dd454d1f0c341cb4477cb84cdee84880722db3446
MD5 bc628daea05e74995c3f733ba9bdaa32
BLAKE2b-256 a01774c311a28a21696e6ab144be4de803d2dcb173d6832cbef1f74a54ba3563

See more details on using hashes here.

Provenance

The following attestation bundles were made for probabledoctor-0.1.0.tar.gz:

Publisher: python-publish.yml on atk81-candor/probabledoctor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file probabledoctor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: probabledoctor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 904.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for probabledoctor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a228db54461fde8c2865e9b3f4a76d262e0a171b4bf7f93d86d681b0538b3e34
MD5 ae93c0090910019debf0fd7e085dd340
BLAKE2b-256 5f7931e81abe6addff7b134f9ac8fe5268bae67aeba5c6d962d532ec5cb8663a

See more details on using hashes here.

Provenance

The following attestation bundles were made for probabledoctor-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on atk81-candor/probabledoctor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page