A python package for extracting electronic health transcripts , and then classifying them based on human annotated data.
Project description
pytranscripts
An Open source👨🔧 Python Library for Automated classification of Electronic Medical records
Installation
To install , simply use
pip install pytranscripts
Pipeline Summary
Stages
- Data Extraction
- Target Identification
- Finetuning Annotated Data on Pretrained models (Bert & Electra)
- Extracting Interviwer/Interviewee records from the specified docx file storage
- Metrics Evaluation (Accuracy & Cohen Kappa Score)
- Reordering records as a neatly arranged and flagged spreadsheet, alongside metrics and reports from pretrained models.
Example Usage
Generating the Survey Dataset
#extract the survey information from docx file storage
from pytranscripts import docx_transcripts_to_excel
input_directory = "Docx_Records_folder"
output_file = "SURVEY_TABLE.xlsx"
docx_transcripts_to_excel(input_directory,output_file)
Training the model
from pytranscripts import NLPModelTrainer
# Initialize the trainer with paths to your datasets and drive
trainer = NLPModelTrainer(
base_path="/path/to/basedir", # base directory
refined_data_path="/path/to/Refined_targets.xlsx", # path to refined human annotations
survey_data_path="/path/to/SURVEY_TABLE.xlsx", # path to the survey data from extracted documents
)
# Train both BERT and Electra models
trainer.train_models(bert=True, electra=True)
# Classify a piece of text using the trained BERT model
result = trainer.classify_text("This is a sample interview response.", "bert")
print(result)
# Generate encoded evaluation files for human annotations, BERT, and Electra
trainer.generate_encoded_evaluation_files()
Deps
- Python 3.12
- Transformers
- Pytorch
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytranscripts-1.2.5.tar.gz
(9.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytranscripts-1.2.5.tar.gz.
File metadata
- Download URL: pytranscripts-1.2.5.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a67a3f15dba028d5c00461d1abe501f2b2520221d10b6530c866b6b5b8d5c30
|
|
| MD5 |
32bf77d57b44f1497371418e34462d18
|
|
| BLAKE2b-256 |
983c1bb05c396fc60c2b5450ed0dad4aa520d91d1202b2a1eec3ac63cda3fff0
|
File details
Details for the file pytranscripts-1.2.5-py3-none-any.whl.
File metadata
- Download URL: pytranscripts-1.2.5-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6ff641758c85f0b78d94de32df637c93fc35f64125a481ac39fa0fec06f1698
|
|
| MD5 |
0e99cfa548584490e333a0109cd91379
|
|
| BLAKE2b-256 |
3941137e24603355c15262858002c522d23fe68cb2a27d12adc6a26197d101cf
|