Skip to main content

A python package for extracting electronic health transcripts , and then classifying them based on human annotated data.

Project description

pytranscripts

An Open source👨‍🔧 Python Library for Automated classification of Electronic Medical records

Installation

To install , simply use

pip install pytranscripts

Pipeline Summary

pipeline image

Stages

  1. Data Extraction
  2. Target Identification
  3. Finetuning Annotated Data on Pretrained models (Bert & Electra)
  4. Extracting Interviwer/Interviewee records from the specified docx file storage
  5. Metrics Evaluation (Accuracy & Cohen Kappa Score)
  6. Reordering records as a neatly arranged and flagged spreadsheet, alongside metrics and reports from pretrained models.

Example Usage

Generating the Survey Dataset

#extract the survey information from docx file storage

from pytranscripts import docx_transcripts_to_excel

input_directory = "Docx_Records_folder"
output_file = "SURVEY_TABLE.xlsx"

docx_transcripts_to_excel(input_directory,output_file)

Training the model

from pytranscripts import NLPModelTrainer

# Initialize the trainer with paths to your datasets and drive
trainer = NLPModelTrainer(
    base_path="/path/to/basedir", # base directory
    refined_data_path="/path/to/Refined_targets.xlsx", # path to refined human annotations
    survey_data_path="/path/to/SURVEY_TABLE.xlsx", # path to the survey data from extracted documents
)

# Train both BERT and Electra models
trainer.train_models(bert=True, electra=True)

# Classify a piece of text using the trained BERT model
result = trainer.classify_text("This is a sample interview response.", "bert")
print(result)

# Generate encoded evaluation files for human annotations, BERT, and Electra
trainer.generate_encoded_evaluation_files()

Deps

  • Python 3.12
  • Transformers
  • Pytorch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytranscripts-1.2.5.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytranscripts-1.2.5-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file pytranscripts-1.2.5.tar.gz.

File metadata

  • Download URL: pytranscripts-1.2.5.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for pytranscripts-1.2.5.tar.gz
Algorithm Hash digest
SHA256 6a67a3f15dba028d5c00461d1abe501f2b2520221d10b6530c866b6b5b8d5c30
MD5 32bf77d57b44f1497371418e34462d18
BLAKE2b-256 983c1bb05c396fc60c2b5450ed0dad4aa520d91d1202b2a1eec3ac63cda3fff0

See more details on using hashes here.

File details

Details for the file pytranscripts-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: pytranscripts-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for pytranscripts-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c6ff641758c85f0b78d94de32df637c93fc35f64125a481ac39fa0fec06f1698
MD5 0e99cfa548584490e333a0109cd91379
BLAKE2b-256 3941137e24603355c15262858002c522d23fe68cb2a27d12adc6a26197d101cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page