CNN and U-Net library for lung cancer classification and segmentation
Project description
LungScan - Advanced Lung Cancer Detection & Segmentation Library
LungScan is a comprehensive medical AI library for automated lung cancer detection, classification, and segmentation from CT scans. Built with clinical accuracy and ease of use in mind, LungScan combines state-of-the-art deep learning models with medical imaging best practices to deliver reliable diagnostic support.
๐ Table of Contents
- Features
- Installation
- Quick Start
- Dataset Preparation
- Classification Pipeline
- Segmentation Pipeline
- Inference & Prediction
- Curriculum Learning
- Evaluation & Metrics
- API Reference
- Pre-trained Models
- License
โจ Features
Core Capabilities
-
Multi-Class Lung Cancer Classification: Detect 4 lung cancer types with confidence scoring
- Adenocarcinoma
- Squamous Cell Carcinoma
- Large Cell Carcinoma
- Normal (Healthy)
-
Precise Lung Segmentation: Attention U-Net architecture for accurate lung region extraction
-
Metalung Augmentation: Advanced medical-specific data augmentation for improved generalization
-
Curriculum Learning: Progressive training on lesion sizes (Small โ Medium โ Large โ XLarge)
-
Medical Priority Balancing: Handles class imbalance with clinical-aware weighting
-
GPU Acceleration: Auto-detects Intel GPU for optimized training
-
Comprehensive Metrics: Precision, Recall, F1-Score, IoU, Dice Coefficient, ROC-AUC
-
Visual Diagnostics: Built-in visualization tools for training monitoring and result interpretation
๐ฆ Installation
Prerequisites
- Python 3.13 or higher
Install LungScan
# Install from PyPI (when published)
pip install lungscan
๐ Quick Start
1. Prepare Your Dataset
from lungscan import convert_pkl2images_metalung, LungDatasetSplitter
# Convert training data with augmentation
convert_pkl2images_metalung(
pickle_path='dataset/source/lung_cancer_train.pkl',
output_base_dir='dataset/image_data/train',
num_augments=2
)
# Convert test data
convert_pkl2images_metalung(
pickle_path='dataset/source/lung_cancer_test.pkl',
output_base_dir='dataset/image_data/test',
num_augments=0 # No augmentation for test set
)
# Initialize splitter with pixel range definitions
splitter = LungDatasetSplitter(
source_dir='dataset/image_data',
pixel_ranges={
'xlarge': (150, 301), # 150-300 pixels
'large': (50, 151), # 50-150 pixels
'medium': (20, 51), # 20-50 pixels
'small': (9, 21) # 9-20 pixels
}
)
# Analyze lesion distribution
splitter.analyze()
# Create curriculum dataset
splitter.split(output_dir='dataset/image_split')
2. Train Classification Model
from lungscan import LungClassificationPipeline
# Initialize and train
pipeline = LungClassificationPipeline()
pipeline.load_data('dataset/lung_classes')
pipeline.train(epochs=10, load_pretrained=True)
# Evaluate
metrics = pipeline.evaluate(num_samples=20)
print(metrics)
3. Make Classification Prediction
# Classification prediction
result = pipeline.predict(
'path/to/ct_scan.png',
visualize=True
)
print(f"Diagnosis: {result['class']} ({result['confidence']:.1%})")
4. Train Segmentation Model
from lungscan import LungSegmentationPipeline
# Initialize segmentation pipeline
pipeline = LungSegmentationPipeline(
img_size=(256, 256, 1),
model_type='att_unet'
)
pipeline.load_data('dataset/image_split')
pipeline.train(epochs_per_stage=10)
5. Make Segmentation Predictions
# Classification prediction
result = pipeline.predict(
'path/to/ct_scan.png',
visualize=True
)
๐ Dataset Preparation
Input Format
LungScan expects data in .pkl format containing:
- CT scan images
- Corresponding masks (for segmentation)
- Class labels (for classification)
Directory Structure
dataset/
โโโ source/
โ โโโ lung_cancer_train.pkl
โ โโโ lung_cancer_test.pkl
โโโ image_data/
โ โโโ train/
โ โ โโโ images/
โ โ โโโ masks/
โ โโโ test/
โ โโโ images/
โ โโโ masks/
โโโ image_split/
โ โโโ train/
โ โ โโโ xlarge/
โ โ โ โโโ images/
โ โ โ โโโ masks/
โ โ โโโ large/
โ โ โ โโโ images/
โ โ โ โโโ masks/
โ โ โโโ meduim/
โ โ โ โโโ images/
โ โ โ โโโ masks/
โ โ โโโ small/
โ โ โโโ images/
โ โ โโโ masks/
โ โโโ test/
โ โ โโโ xlarge/
โ โ โ โโโ images/
โ โ โ โโโ masks/
โ โ โโโ large/
โ โ โ โโโ images/
โ โ โ โโโ masks/
โ โ โโโ meduim/
โ โ โ โโโ images/
โ โ โ โโโ masks/
โ โ โโโ small/
โ โ โโโ images/
โ โ โโโ masks/
โโโ lung_classes/
โโโ train/
โ โโโ adenocarcinoma/
โ โโโ squamous_cell_carcinoma/
โ โโโ large_cell_carcinoma/
โ โโโ normal/
โโโ test/
โโโ adenocarcinoma/
โโโ squamous_cell_carcinoma/
โโโ large_cell_carcinoma/
โโโ normal/
Metalung Augmentation
The convert_pkl2images_metalung function applies medical-specific augmentations:
- Random rotation and flipping
- Intensity adjustments (simulating different scanner settings)
- Random Cancer relocation (generatining multiple cancer variantions of the same sample)
- Noise injection (simulating acquisition artifacts)
๐ Classification Pipeline
LungClassificationPipeline
A complete end-to-end pipeline for lung cancer classification.
Key Methods
from lungscan import LungClassificationPipeline
pipeline = LungClassificationPipeline(
img_size=(224, 224, 3), # Input image dimensions
verbose=True # Enable detailed logging
)
# Load balanced dataset
pipeline.load_data('dataset/lung_classes')
# Visualize samples
pipeline.view_sample(data_type='train', is_notebook=True)
# Train model
pipeline.train(
epochs=10,
load_pretrained=True, # Use pre-trained weights
learning_rate=1e-4
)
# Calculate metrics
metrics = pipeline.calcuate_metrics(data_type='test')
# Predict on new image
result = pipeline.predict(
'path/to/image.png',
visualize=True,
is_notebook=True
)
# Returns: {'class': 'adenocarcinoma', 'confidence': 0.94, 'probabilities': {...}}
Training Features
- Transfer Learning: Leverages pre-trained CNN architectures (EffentNetb0)
- Class Balancing: Automatic handling of imbalanced datasets
- Early Stopping: Prevents overfitting with patience monitoring
- Checkpoint Saving: Saves best model weights automatically
โ๏ธ Segmentation Pipeline
LungSegmentationPipeline
Advanced lung segmentation using Attention U-Net architecture.
Key Methods
from lungscan import LungSegmentationPipeline
pipeline = LungSegmentationPipeline(
img_size=(256, 256, 1), # Grayscale input
model_type='att_unet', # Attention U-Net
pretrained_path=None, # Path to pre-trained weights
verbose=True
)
# Load dataset
pipeline.load_data('dataset/image_split')
# Visualize samples
pipeline.view_sample(sample_type='train', is_notebook=True)
# Train with curriculum learning
pipeline.train(epochs_per_stage=10)
# Predict segmentation
result = pipeline.predict(
'path/to/ct_scan.png',
output_path='results/mask.png',
is_notebook=True
)
# Returns: {'image': array, 'mask': array, 'overlay': array}
# Evaluate performance
pipeline.evaluate(num_samples=10, is_notebook=True)
pipeline.calcuate_metrics(data_type='test')
Model Architecture
- Attention Gates: Focus on relevant regions, suppress noise
- Skip Connections: Preserve spatial information
- Multi-scale Feature Extraction: Captures details at different resolutions
- Dice Loss: Optimized for medical segmentation tasks
๐ Curriculum Learning
LungDatasetSplitter
Progressive training strategy based on lesion size for improved convergence.
from lungscan import LungDatasetSplitter
# Initialize splitter with pixel range definitions
splitter = LungDatasetSplitter(
source_dir='dataset/image_data',
pixel_ranges={
'xlarge': (150, 301), # 150-300 pixels
'large': (50, 151), # 50-150 pixels
'medium': (20, 51), # 20-50 pixels
'small': (9, 21) # 9-20 pixels
}
)
# Analyze lesion distribution
splitter.analyze()
# Create curriculum dataset
splitter.split(output_dir='dataset/image_split')
Benefits
- Faster Convergence: Start with easier (xlarger) lesions
- Better Generalization: Gradually learn complex patterns
- Reduced Overfitting: Progressive complexity prevents memorization
๐ฎ Inference & Prediction
Standalone Prediction Tool
For quick predictions without training:
from lungscan import LungSegmentationPipeline, LungClassificationPipeline
# Initialize models
seg_model = LungSegmentationPipeline(
img_size=(256, 256, 1),
model_type='att_unet',
pretrained_path='checkpoints/best_medium.keras'
)
classi_model = LungClassificationPipeline(
img_size=(224, 224, 3)
)
# Predict
classification = classi_model.predict('path/to/image.png', visualize=False)
segmentation = seg_model.predict('path/to/image.png', visualize=False)
print(f"Diagnosis: {classification['class']}")
print(f"Confidence: {classification['confidence']:.1%}")
GUI-Based Prediction
Interactive file selection for batch prediction:
from lungscan import select_files
# Open file dialog
image_paths = select_files()
# Process each image
for path in image_paths:
# Your prediction logic here
pass
๐ Evaluation & Metrics
Classification Metrics
- Accuracy: Overall correctness
- Precision: True positive rate among predicted positives
- Recall: True positive rate among actual positives
- F1-Score: Harmonic mean of precision and recall
- Sentivity: True positive rate among actual negatives
- Specificity: True negative rate among predicted negatives
Segmentation Metrics
- IoU (Intersection over Union): Overlap between predicted and ground truth
- Dice Coefficient: Similarity measure (2 * IoU / (IoU + 1))
- Pixel Accuracy: Percentage of correctly classified pixels
- precision: True positive rate among predicted positives
- recall: True positive rate among actual positives
Visualization Tools
from lungscan import disp_image, add_text_to_image
# Display image in notebook or save to file
disp_image(image_array, isNotebook=True, save_path='output.png')
# Add diagnostic text to image
annotated = add_text_to_image(
image_array,
text="Diagnosis: Adenocarcinoma",
position=(5, 5),
font_size=12,
color=(255, 0, 0) # Red text
)
๐ API Reference
Core Functions
convert_pkl2images_metalung(pickle_path, output_base_dir, num_augments)
Converts pickle dataset to images with metalung augmentation.
Parameters:
pickle_path(str): Path to .pkl fileoutput_base_dir(str): Output directory for images/masksnum_augments(int): Number of augmented copies per image
Returns: None
LungClassificationPipeline(img_size, verbose)
End-to-end classification pipeline.
Methods:
load_data(base_dir): Load dataset from directoryview_sample(data_type, is_notebook): Visualize sample imagestrain(epochs, load_pretrained, batch_size, learning_rate): Train modelpredict(image_path, visualize, is_notebook): Predict on single imageevaluate(num_samples, is_notebook): Comprehensive evaluationcalcuate_metrics(data_type): Calculate performance metrics
LungSegmentationPipeline(img_size, model_type, pretrained_path, verbose)
Lung segmentation pipeline with Attention U-Net.
Methods:
load_data(data_dir): Load segmentation datasetview_sample(sample_type, is_notebook): Visualize samplestrain(epochs_per_stage): Train with curriculum learningpredict(image_path, output_path, is_notebook): Predict segmentation maskevaluate(num_samples, is_notebook): Evaluate on test setcalcuate_metrics(data_type): Calculate segmentation metrics
LungDatasetSplitter(source_dir, pixel_ranges)
Split dataset by lesion size for curriculum learning.
Methods:
analyze(): Display lesion size distributionsplit(output_dir): Create curriculum dataset splits
select_files()
Open file dialog for image selection.
Returns: List of selected file paths
disp_image(image, isNotebook, save_path)
Display or save image.
Parameters:
image: PIL Image or numpy arrayisNotebook: Display in Jupyter notebooksave_path: Save path (optional)
add_text_to_image(image, text, position, font_size, color)
Add text annotation to image.
Parameters:
image: Input imagetext: Text to addposition: (x, y) coordinatesfont_size: Font sizecolor: RGB tuple
Returns: Annotated image array
fetch_dirs(base_dir, is_semantic)
Fetch image and mask file lists from directory structure.
Returns: Tuple of (image_dict, mask_dict)
๐ Pre-trained Models
Available Checkpoints
| Model | Task | Path | Performance |
|---|---|---|---|
best_medium.keras |
Segmentation | checkpoints/2nd advance/best_medium.keras |
IoU: 0.2726, Dice: 0.4285 |
lung_classification.keras |
Classification | models/lung_classification.keras |
Accuracy: 80.21%, F1: 0.8 |
Loading Pre-trained Weights
# Segmentation
seg_pipeline = LungSegmentationPipeline(
pretrained_path='checkpoints/2nd advance/best_medium.keras'
)
# Classification
class_pipeline = LungClassificationPipeline()
class_pipeline.train(load_pretrained=True)
๐ Example Workflows
Complete Training Pipeline
# Step 1: Prepare dataset
from lungscan import convert_pkl2images_metalung
convert_pkl2images_metalung('dataset/source/train.pkl', 'dataset/image_data/train', 2)
# Step 2: Split for curriculum learning
from lungscan import LungDatasetSplitter
splitter = LungDatasetSplitter('dataset/image_data')
splitter.split('dataset/image_split')
# Step 3: Train segmentation
from lungscan import LungSegmentationPipeline
seg = LungSegmentationPipeline(img_size=(256, 256, 1))
seg.load_data('dataset/image_split')
seg.train(epochs_per_stage=10)
# Step 4: Train classification
from lungscan import LungClassificationPipeline
clf = LungClassificationPipeline()
clf.load_data('dataset/lung_classes')
clf.train(epochs=20, load_pretrained=True)
Batch Evaluation
from lungscan import LungClassificationPipeline, LungSegmentationPipeline,fetch_dirs
# Get test images
img_flst, _ = fetch_dirs('dataset/lung_classes', is_semantic=False)
# Initialize model
seg_line = LungSegmentationPipeline(
pretrained_path='checkpoints/2nd advance/best_medium.keras',
)
class_line = LungClassificationPipeline()
# Evaluate all test samples
results = []
for class_name in img_flst['test'].keys():
for img_path in img_flst['test'][class_name]:
pred_mask = seg_line.predict(img_path, visualize=False)
pred = class_line.predict(img_path, visualize=False)
results.append({
'path': img_path,
'true_class': class_name,
'input': pred_mask['image'],
'mask': pred_mask['overlay'],
'predicted': pred['class'],
'confidence': pred['confidence']
})
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Medical imaging datasets and annotations
- Pytorch/Keras development team
- Attention U-Net original authors
- Open-source medical AI community
๐ง Contact
For questions, issues, or collaboration opportunities:
- Email: hosam.bosati@gmail.com
๐ Citation
If you use LungScan in your research, please cite:
@software{lungscan2026,
author = {Hosam Hatim Osman},
title = {LungScan: Advanced Lung Cancer Detection and Segmentation Library},
year = {2026},
url = {https://pypi.org/project/lungscan/}
}
Built with โค๏ธ for medical AI advancement
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lungscan-0.1.5.tar.gz.
File metadata
- Download URL: lungscan-0.1.5.tar.gz
- Upload date:
- Size: 46.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
773c9ce572526c3aff5f05ffde05e0f21ef61259549c463f166ac8c7d4ea5037
|
|
| MD5 |
70a02aeffa9507b681b465944ca5d817
|
|
| BLAKE2b-256 |
4e5341c7271ccb9499b92fc3dae462974b8bee8c6e115c1951036aae460d3358
|
File details
Details for the file lungscan-0.1.5-py3-none-any.whl.
File metadata
- Download URL: lungscan-0.1.5-py3-none-any.whl
- Upload date:
- Size: 44.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dae4a61482d14b1b566f64c55d2b377b2147490dc18e403ed05a7d1f31a3963
|
|
| MD5 |
e85b3e3d8f200467e7bba18fee41f9d2
|
|
| BLAKE2b-256 |
5ff2ad3e8c7a1fba4c45da48232b2499f40e2e36d25d0eaab8a56b6f6b9e23b2
|