Skip to main content

CNN and U-Net library for lung cancer classification and segmentation

Project description

LungScan - Advanced Lung Cancer Detection & Segmentation Library

Python License Status

LungScan is a comprehensive medical AI library for automated lung cancer detection, classification, and segmentation from CT scans. Built with clinical accuracy and ease of use in mind, LungScan combines state-of-the-art deep learning models with medical imaging best practices to deliver reliable diagnostic support.


๐Ÿ“‹ Table of Contents


โœจ Features

Core Capabilities

  • Multi-Class Lung Cancer Classification: Detect 4 lung cancer types with confidence scoring

    • Adenocarcinoma
    • Squamous Cell Carcinoma
    • Large Cell Carcinoma
    • Normal (Healthy)
  • Precise Lung Segmentation: Attention U-Net architecture for accurate lung region extraction

  • Metalung Augmentation: Advanced medical-specific data augmentation for improved generalization

  • Curriculum Learning: Progressive training on lesion sizes (Small โ†’ Medium โ†’ Large โ†’ XLarge)

  • Medical Priority Balancing: Handles class imbalance with clinical-aware weighting

  • GPU Acceleration: Auto-detects Intel GPU for optimized training

  • Comprehensive Metrics: Precision, Recall, F1-Score, IoU, Dice Coefficient, ROC-AUC

  • Visual Diagnostics: Built-in visualization tools for training monitoring and result interpretation


๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.13 or higher

Install LungScan

# Install from PyPI (when published)
pip install lungscan

๐Ÿš€ Quick Start

1. Prepare Your Dataset

from lungscan import convert_pkl2images_metalung, LungDatasetSplitter

# Convert training data with augmentation
convert_pkl2images_metalung(
    pickle_path='dataset/source/lung_cancer_train.pkl',
    output_base_dir='dataset/image_data/train',
    num_augments=2
)

# Convert test data
convert_pkl2images_metalung(
    pickle_path='dataset/source/lung_cancer_test.pkl',
    output_base_dir='dataset/image_data/test',
    num_augments=0  # No augmentation for test set
)

# Initialize splitter with pixel range definitions
splitter = LungDatasetSplitter(
    source_dir='dataset/image_data',
    pixel_ranges={
        'xlarge': (150, 301),   # 150-300 pixels
        'large': (50, 151),     # 50-150 pixels
        'medium': (20, 51),     # 20-50 pixels
        'small': (9, 21)        # 9-20 pixels
    }
)

# Analyze lesion distribution
splitter.analyze()

# Create curriculum dataset
splitter.split(output_dir='dataset/image_split')

2. Train Classification Model

from lungscan import LungClassificationPipeline

# Initialize and train
pipeline = LungClassificationPipeline()
pipeline.load_data('dataset/lung_classes')
pipeline.train(epochs=10, load_pretrained=True)

# Evaluate
metrics = pipeline.evaluate(num_samples=20)
print(metrics)

3. Make Classification Prediction

# Classification prediction
result = pipeline.predict(
    'path/to/ct_scan.png',
    visualize=True
)
print(f"Diagnosis: {result['class']} ({result['confidence']:.1%})")

4. Train Segmentation Model

from lungscan import LungSegmentationPipeline

# Initialize segmentation pipeline
pipeline = LungSegmentationPipeline(
    img_size=(256, 256, 1),
    model_type='att_unet'
)

pipeline.load_data('dataset/image_split')
pipeline.train(epochs_per_stage=10)

5. Make Segmentation Predictions

# Classification prediction
result = pipeline.predict(
    'path/to/ct_scan.png',
    visualize=True
)

๐Ÿ“Š Dataset Preparation

Input Format

LungScan expects data in .pkl format containing:

  • CT scan images
  • Corresponding masks (for segmentation)
  • Class labels (for classification)

Directory Structure

dataset/
โ”œโ”€โ”€ source/
โ”‚   โ”œโ”€โ”€ lung_cancer_train.pkl
โ”‚   โ””โ”€โ”€ lung_cancer_test.pkl
โ”œโ”€โ”€ image_data/
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ””โ”€โ”€ masks/
โ”‚   โ””โ”€โ”€ test/
โ”‚       โ”œโ”€โ”€ images/
โ”‚       โ””โ”€โ”€ masks/
โ”œโ”€โ”€ image_split/
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ xlarge/
โ”‚   โ”‚   โ”‚    โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ”‚    โ””โ”€โ”€ masks/
โ”‚   โ”‚   โ”œโ”€โ”€ large/
โ”‚   โ”‚   โ”‚    โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ”‚    โ””โ”€โ”€ masks/
โ”‚   โ”‚   โ”œโ”€โ”€ meduim/
โ”‚   โ”‚   โ”‚    โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ”‚    โ””โ”€โ”€ masks/
โ”‚   โ”‚   โ””โ”€โ”€ small/
โ”‚   โ”‚        โ”œโ”€โ”€ images/
โ”‚   โ”‚        โ””โ”€โ”€ masks/
โ”‚   โ””โ”€โ”€ test/
โ”‚   โ”‚   โ”œโ”€โ”€ xlarge/
โ”‚   โ”‚   โ”‚    โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ”‚    โ””โ”€โ”€ masks/
โ”‚   โ”‚   โ”œโ”€โ”€ large/
โ”‚   โ”‚   โ”‚    โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ”‚    โ””โ”€โ”€ masks/
โ”‚   โ”‚   โ”œโ”€โ”€ meduim/
โ”‚   โ”‚   โ”‚    โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ”‚    โ””โ”€โ”€ masks/
โ”‚   โ”‚   โ””โ”€โ”€ small/
โ”‚   โ”‚        โ”œโ”€โ”€ images/
โ”‚   โ”‚        โ””โ”€โ”€ masks/
โ””โ”€โ”€ lung_classes/
    โ”œโ”€โ”€ train/
    โ”‚   โ”œโ”€โ”€ adenocarcinoma/
    โ”‚   โ”œโ”€โ”€ squamous_cell_carcinoma/
    โ”‚   โ”œโ”€โ”€ large_cell_carcinoma/
    โ”‚   โ””โ”€โ”€ normal/
    โ””โ”€โ”€ test/
        โ”œโ”€โ”€ adenocarcinoma/
        โ”œโ”€โ”€ squamous_cell_carcinoma/
        โ”œโ”€โ”€ large_cell_carcinoma/
        โ””โ”€โ”€ normal/

Metalung Augmentation

The convert_pkl2images_metalung function applies medical-specific augmentations:

  • Random rotation and flipping
  • Intensity adjustments (simulating different scanner settings)
  • Random Cancer relocation (generatining multiple cancer variantions of the same sample)
  • Noise injection (simulating acquisition artifacts)

๐Ÿ” Classification Pipeline

LungClassificationPipeline

A complete end-to-end pipeline for lung cancer classification.

Key Methods

from lungscan import LungClassificationPipeline

pipeline = LungClassificationPipeline(
    img_size=(224, 224, 3),  # Input image dimensions
    verbose=True              # Enable detailed logging
)

# Load balanced dataset
pipeline.load_data('dataset/lung_classes')

# Visualize samples
pipeline.view_sample(data_type='train', is_notebook=True)

# Train model
pipeline.train(
    epochs=10,
    load_pretrained=True,      # Use pre-trained weights
    learning_rate=1e-4
)

# Calculate metrics
metrics = pipeline.calcuate_metrics(data_type='test')

# Predict on new image
result = pipeline.predict(
    'path/to/image.png',
    visualize=True,
    is_notebook=True
)
# Returns: {'class': 'adenocarcinoma', 'confidence': 0.94, 'probabilities': {...}}

Training Features

  • Transfer Learning: Leverages pre-trained CNN architectures (EffentNetb0)
  • Class Balancing: Automatic handling of imbalanced datasets
  • Early Stopping: Prevents overfitting with patience monitoring
  • Checkpoint Saving: Saves best model weights automatically

โœ‚๏ธ Segmentation Pipeline

LungSegmentationPipeline

Advanced lung segmentation using Attention U-Net architecture.

Key Methods

from lungscan import LungSegmentationPipeline

pipeline = LungSegmentationPipeline(
    img_size=(256, 256, 1),           # Grayscale input
    model_type='att_unet',            # Attention U-Net
    pretrained_path=None,             # Path to pre-trained weights
    verbose=True
)

# Load dataset
pipeline.load_data('dataset/image_split')

# Visualize samples
pipeline.view_sample(sample_type='train', is_notebook=True)

# Train with curriculum learning
pipeline.train(epochs_per_stage=10)

# Predict segmentation
result = pipeline.predict(
    'path/to/ct_scan.png',
    output_path='results/mask.png',
    is_notebook=True
)
# Returns: {'image': array, 'mask': array, 'overlay': array}

# Evaluate performance
pipeline.evaluate(num_samples=10, is_notebook=True)
pipeline.calcuate_metrics(data_type='test')

Model Architecture

  • Attention Gates: Focus on relevant regions, suppress noise
  • Skip Connections: Preserve spatial information
  • Multi-scale Feature Extraction: Captures details at different resolutions
  • Dice Loss: Optimized for medical segmentation tasks

๐ŸŽ“ Curriculum Learning

LungDatasetSplitter

Progressive training strategy based on lesion size for improved convergence.

from lungscan import LungDatasetSplitter

# Initialize splitter with pixel range definitions
splitter = LungDatasetSplitter(
    source_dir='dataset/image_data',
    pixel_ranges={
        'xlarge': (150, 301),   # 150-300 pixels
        'large': (50, 151),     # 50-150 pixels
        'medium': (20, 51),     # 20-50 pixels
        'small': (9, 21)        # 9-20 pixels
    }
)

# Analyze lesion distribution
splitter.analyze()

# Create curriculum dataset
splitter.split(output_dir='dataset/image_split')

Benefits

  • Faster Convergence: Start with easier (xlarger) lesions
  • Better Generalization: Gradually learn complex patterns
  • Reduced Overfitting: Progressive complexity prevents memorization

๐Ÿ”ฎ Inference & Prediction

Standalone Prediction Tool

For quick predictions without training:

from lungscan import LungSegmentationPipeline, LungClassificationPipeline

# Initialize models
seg_model = LungSegmentationPipeline(
    img_size=(256, 256, 1),
    model_type='att_unet',
    pretrained_path='checkpoints/best_medium.keras'
)

classi_model = LungClassificationPipeline(
    img_size=(224, 224, 3)
)

# Predict
classification = classi_model.predict('path/to/image.png', visualize=False)
segmentation = seg_model.predict('path/to/image.png', visualize=False)

print(f"Diagnosis: {classification['class']}")
print(f"Confidence: {classification['confidence']:.1%}")

GUI-Based Prediction

Interactive file selection for batch prediction:

from lungscan import select_files

# Open file dialog
image_paths = select_files()

# Process each image
for path in image_paths:
    # Your prediction logic here
    pass

๐Ÿ“ˆ Evaluation & Metrics

Classification Metrics

  • Accuracy: Overall correctness
  • Precision: True positive rate among predicted positives
  • Recall: True positive rate among actual positives
  • F1-Score: Harmonic mean of precision and recall
  • Sentivity: True positive rate among actual negatives
  • Specificity: True negative rate among predicted negatives

Segmentation Metrics

  • IoU (Intersection over Union): Overlap between predicted and ground truth
  • Dice Coefficient: Similarity measure (2 * IoU / (IoU + 1))
  • Pixel Accuracy: Percentage of correctly classified pixels
  • precision: True positive rate among predicted positives
  • recall: True positive rate among actual positives

Visualization Tools

from lungscan import disp_image, add_text_to_image

# Display image in notebook or save to file
disp_image(image_array, isNotebook=True, save_path='output.png')

# Add diagnostic text to image
annotated = add_text_to_image(
    image_array,
    text="Diagnosis: Adenocarcinoma",
    position=(5, 5),
    font_size=12,
    color=(255, 0, 0)  # Red text
)

๐Ÿ“š API Reference

Core Functions

convert_pkl2images_metalung(pickle_path, output_base_dir, num_augments)

Converts pickle dataset to images with metalung augmentation.

Parameters:

  • pickle_path (str): Path to .pkl file
  • output_base_dir (str): Output directory for images/masks
  • num_augments (int): Number of augmented copies per image

Returns: None


LungClassificationPipeline(img_size, verbose)

End-to-end classification pipeline.

Methods:

  • load_data(base_dir): Load dataset from directory
  • view_sample(data_type, is_notebook): Visualize sample images
  • train(epochs, load_pretrained, batch_size, learning_rate): Train model
  • predict(image_path, visualize, is_notebook): Predict on single image
  • evaluate(num_samples, is_notebook): Comprehensive evaluation
  • calcuate_metrics(data_type): Calculate performance metrics

LungSegmentationPipeline(img_size, model_type, pretrained_path, verbose)

Lung segmentation pipeline with Attention U-Net.

Methods:

  • load_data(data_dir): Load segmentation dataset
  • view_sample(sample_type, is_notebook): Visualize samples
  • train(epochs_per_stage): Train with curriculum learning
  • predict(image_path, output_path, is_notebook): Predict segmentation mask
  • evaluate(num_samples, is_notebook): Evaluate on test set
  • calcuate_metrics(data_type): Calculate segmentation metrics

LungDatasetSplitter(source_dir, pixel_ranges)

Split dataset by lesion size for curriculum learning.

Methods:

  • analyze(): Display lesion size distribution
  • split(output_dir): Create curriculum dataset splits

select_files()

Open file dialog for image selection.

Returns: List of selected file paths


disp_image(image, isNotebook, save_path)

Display or save image.

Parameters:

  • image: PIL Image or numpy array
  • isNotebook: Display in Jupyter notebook
  • save_path: Save path (optional)

add_text_to_image(image, text, position, font_size, color)

Add text annotation to image.

Parameters:

  • image: Input image
  • text: Text to add
  • position: (x, y) coordinates
  • font_size: Font size
  • color: RGB tuple

Returns: Annotated image array


fetch_dirs(base_dir, is_semantic)

Fetch image and mask file lists from directory structure.

Returns: Tuple of (image_dict, mask_dict)


๐Ÿ† Pre-trained Models

Available Checkpoints

Model Task Path Performance
best_medium.keras Segmentation checkpoints/2nd advance/best_medium.keras IoU: 0.2726, Dice: 0.4285
lung_classification.keras Classification models/lung_classification.keras Accuracy: 80.21%, F1: 0.8

Loading Pre-trained Weights

# Segmentation
seg_pipeline = LungSegmentationPipeline(
    pretrained_path='checkpoints/2nd advance/best_medium.keras'
)

# Classification
class_pipeline = LungClassificationPipeline()
class_pipeline.train(load_pretrained=True)

๐Ÿ“ Example Workflows

Complete Training Pipeline

# Step 1: Prepare dataset
from lungscan import convert_pkl2images_metalung
convert_pkl2images_metalung('dataset/source/train.pkl', 'dataset/image_data/train', 2)

# Step 2: Split for curriculum learning
from lungscan import LungDatasetSplitter
splitter = LungDatasetSplitter('dataset/image_data')
splitter.split('dataset/image_split')

# Step 3: Train segmentation
from lungscan import LungSegmentationPipeline
seg = LungSegmentationPipeline(img_size=(256, 256, 1))
seg.load_data('dataset/image_split')
seg.train(epochs_per_stage=10)

# Step 4: Train classification
from lungscan import LungClassificationPipeline
clf = LungClassificationPipeline()
clf.load_data('dataset/lung_classes')
clf.train(epochs=20, load_pretrained=True)

Batch Evaluation

from lungscan import LungClassificationPipeline, LungSegmentationPipeline,fetch_dirs

# Get test images
img_flst, _ = fetch_dirs('dataset/lung_classes', is_semantic=False)

# Initialize model
seg_line = LungSegmentationPipeline(
    pretrained_path='checkpoints/2nd advance/best_medium.keras',
)

class_line = LungClassificationPipeline()

# Evaluate all test samples
results = []
for class_name in img_flst['test'].keys():
    for img_path in img_flst['test'][class_name]:
        pred_mask = seg_line.predict(img_path, visualize=False)
        pred = class_line.predict(img_path, visualize=False)
        results.append({
            'path': img_path,
            'true_class': class_name,
            'input': pred_mask['image'],
            'mask': pred_mask['overlay'],
            'predicted': pred['class'],
            'confidence': pred['confidence']
        })

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Medical imaging datasets and annotations
  • Pytorch/Keras development team
  • Attention U-Net original authors
  • Open-source medical AI community

๐Ÿ“ง Contact

For questions, issues, or collaboration opportunities:


๐Ÿ“Š Citation

If you use LungScan in your research, please cite:

@software{lungscan2026,
  author = {Hosam Hatim Osman},
  title = {LungScan: Advanced Lung Cancer Detection and Segmentation Library},
  year = {2026},
  url = {https://pypi.org/project/lungscan/}
}

Built with โค๏ธ for medical AI advancement

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lungscan-0.1.5.tar.gz (46.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lungscan-0.1.5-py3-none-any.whl (44.4 kB view details)

Uploaded Python 3

File details

Details for the file lungscan-0.1.5.tar.gz.

File metadata

  • Download URL: lungscan-0.1.5.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for lungscan-0.1.5.tar.gz
Algorithm Hash digest
SHA256 773c9ce572526c3aff5f05ffde05e0f21ef61259549c463f166ac8c7d4ea5037
MD5 70a02aeffa9507b681b465944ca5d817
BLAKE2b-256 4e5341c7271ccb9499b92fc3dae462974b8bee8c6e115c1951036aae460d3358

See more details on using hashes here.

File details

Details for the file lungscan-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: lungscan-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 44.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for lungscan-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5dae4a61482d14b1b566f64c55d2b377b2147490dc18e403ed05a7d1f31a3963
MD5 e85b3e3d8f200467e7bba18fee41f9d2
BLAKE2b-256 5ff2ad3e8c7a1fba4c45da48232b2499f40e2e36d25d0eaab8a56b6f6b9e23b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page