A package for breast cancer diagnosis using MLP classifier.
Project description
Breast Cancer Diagnosis with MLP 🩺💻
🧠 This project harnesses the power of a Multi-Layer Perceptron (MLP) neural network, implemented with scikit-learn, to perform breast cancer diagnosis based on tumor characteristics extracted from biopsy samples. The MLP model is a type of artificial neural network designed to learn complex patterns in data, making it well-suited for tasks like medical diagnosis.
🔬 The MLP model is trained on a comprehensive dataset containing various features derived from digital images of breast tissue samples. These features include mean radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension. Each feature provides valuable information about the physical properties and spatial arrangements of cells within the tissue, enabling the model to learn to distinguish between benign and malignant tumors.
💡 By analyzing these features, the MLP model can effectively classify breast tumors as either benign or malignant, providing valuable diagnostic information to healthcare professionals. This approach offers a non-invasive and automated method for cancer detection, potentially improving patient outcomes through earlier detection and treatment.
Purpose 🎯
The primary objective of this project is to develop an accurate and reliable system for diagnosing breast cancer based on quantitative analysis of cell nuclei characteristics. By leveraging machine learning techniques, specifically MLP neural networks, we aim to create a predictive model capable of classifying tumors as either malignant (cancerous) or benign (non-cancerous) with high accuracy. Early and accurate diagnosis of breast cancer can significantly improve patient outcomes by enabling timely treatment and intervention.
Key Features 🔑
- Utilizes an MLP neural network for breast cancer diagnosis. 🤖
- Preprocesses input data using feature scaling with StandardScaler. 📊
- Implements training and evaluation functionalities. 📈
- Provides prediction capabilities for new biopsy samples. ⚡
- Offers detailed model evaluation metrics, including accuracy and confusion matrix. 📊
- Supports easy integration into Python applications for breast cancer diagnosis tasks. 🐍
Installation 🚀
You can easily install BreastCancerMLPModel using pip:
pip install BreastCancerMLPModel
How BreastCancerMLPModel Works 🤖
BreastCancerMLPModel utilizes an MLP neural network for breast cancer diagnosis. Here's how it works:
-
Initializing the Model 🛠️:
- The model is initialized using the
BreastCancerMLPModel
class from the package. - This class encapsulates an MLPClassifier from scikit-learn with predefined parameters.
- The model is initialized using the
-
Preprocessing Input Data 📊:
- Input data undergoes preprocessing using feature scaling with StandardScaler.
- Scaling ensures that features are on the same scale, improving model performance.
-
Training and Evaluation 📈:
- The model is trained using the
fit()
method, which splits the dataset, scales features, and trains the MLP model. - Evaluation metrics, including accuracy and confusion matrix, are provided to assess model performance.
- The model is trained using the
-
Making Predictions ⚡:
- The
predict()
method enables prediction capabilities for new biopsy samples. - Input data, such as tumor characteristics, is provided to the model for prediction.
- The
-
Integration with Python Applications 🐍:
- BreastCancerMLPModel supports easy integration into Python applications for breast cancer diagnosis tasks.
- This allows seamless incorporation of the model into existing workflows for efficient diagnosis.
This approach ensures accurate and reliable breast cancer diagnosis based on tumor characteristics, enabling better patient care and treatment decisions.
from BreastCancerMLPModel.BreastCancerMLPModel import BreastCancerMLPModel
# Example usage
# Initialize the model
model = BreastCancerMLPModel()
# Train the model
model.fit()
# Make predictions
# Data for prediction 1
data1 = "mean_radius: 17.99, mean_texture: 10.38, mean_perimeter: 122.8, mean_area: 1001, mean_smoothness: 0.1184, mean_compactness: 0.2776, mean_concavity: 0.3001, mean_concave_points: 0.1471, mean_symmetry: 0.2419, mean_fractal_dimension: 0.07871, se_radius: 1.095, se_texture: 0.9053, se_perimeter: 8.589, se_area: 153.4, se_smoothness: 0.006399, se_compactness: 0.04904, se_concavity: 0.05373, se_concave_points: 0.01587, se_symmetry: 0.03003, se_fractal_dimension: 0.006193, worst_radius: 25.38, worst_texture: 17.33, worst_perimeter: 184.6, worst_area: 2019, worst_smoothness: 0.1622, worst_compactness: 0.6656, worst_concavity: 0.7119, worst_concave_points: 0.2654, worst_symmetry: 0.4601, worst_fractal_dimension: 0.1189"
prediction1 = model.predict(data1)
print("Predicted diagnosis for data 1:", prediction1) ## ('Maligno', 1.0)
# Data for prediction 2
data2 = "mean_radius: 13.08, mean_texture: 15.71, mean_perimeter: 85.63, mean_area: 520, mean_smoothness: 0.1075, mean_compactness: 0.127, mean_concavity: 0.04568, mean_concave_points: 0.0311, mean_symmetry: 0.1967, mean_fractal_dimension: 0.06811, se_radius: 0.1852, se_texture: 0.7477, se_perimeter: 1.383, se_area: 14.67, se_smoothness: 0.004097, se_compactness: 0.01898, se_concavity: 0.01698, se_concave_points: 0.00649, se_symmetry: 0.01678, se_fractal_dimension: 0.002425, worst_radius: 14.5, worst_texture: 20.49, worst_perimeter: 96.09, worst_area: 630.5, worst_smoothness: 0.1312, worst_compactness: 0.2776, worst_concavity: 0.189, worst_concave_points: 0.07283, worst_symmetry: 0.3184, worst_fractal_dimension: 0.08183"
prediction2 = model.predict(data2)
print("Predicted diagnosis for data 2:", prediction2) ##('Benigno', 0.9999982189891156)
Dataset 📊
The dataset used in this project is the Breast Cancer Wisconsin (Diagnostic) dataset, available in scikit-learn's built-in datasets module. It consists of features computed from digital images of fine needle aspirate (FNA) of breast masses. Each feature represents various characteristics of cell nuclei present in the images. The dataset contains both malignant and benign tumor samples, making it suitable for binary classification tasks.
Features and Descriptions
Label | Meaning | Weight in Diagnosis | Description |
---|---|---|---|
Diagnosis | Diagnosis (M = malignant, B = benign) | Not used | Result of breast cancer diagnosis |
mean_radius | Mean radius of cell nuclei | High | Average distance from the center to the points on the perimeter of cell nuclei |
mean_texture | Mean texture of cell nuclei | Low | Standard deviation of gray-scale values in the image of cell nuclei |
mean_perimeter | Mean perimeter of cell nuclei | High | Average lengths of perimeters of cell nuclei |
mean_area | Mean area of cell nuclei | Very High | Average areas of cell nuclei |
mean_smoothness | Mean smoothness of cell nuclei | Low | Local variation in lengths of cell nuclei radii |
mean_compactness | Mean compactness of cell nuclei | High | (Perimeter^2 / area) - 1.0 |
mean_concavity | Mean concavity of cell nuclei | Very High | Severity of concave portions of cell nuclei contour |
mean_concave_points | Mean concave points of cell nuclei | Very High | Number of concave portions of cell nuclei contour |
mean_symmetry | Mean symmetry of cell nuclei | Low | Symmetry of cell nuclei |
mean_fractal_dimension | Mean fractal dimension of cell nuclei | Low | Coastline approximation of cell nuclei |
se_radius | Standard error of radius | Medium | Standard error of cell nuclei radius |
se_texture | Standard error of texture | Low | Standard error of cell nuclei texture |
se_perimeter | Standard error of perimeter | Medium | Standard error of cell nuclei perimeter |
se_area | Standard error of area | Medium | Standard error of cell nuclei area |
se_smoothness | Standard error of smoothness | Low | Standard error of cell nuclei smoothness |
se_compactness | Standard error of compactness | Medium | Standard error of cell nuclei compactness |
se_concavity | Standard error of concavity | High | Standard error of cell nuclei concavity |
se_concave_points | Standard error of concave points | High | Standard error of cell nuclei concave points |
se_symmetry | Standard error of symmetry | Low | Standard error of cell nuclei symmetry |
se_fractal_dimension | Standard error of fractal dimension | Low | Standard error of cell nuclei fractal dimension |
worst_radius | Worst value of radius | High | Worst value of cell nuclei radius |
worst_texture | Worst value of texture | Low | Worst value of cell nuclei texture |
worst_perimeter | Worst value of perimeter | High | Worst value of cell nuclei perimeter |
worst_area | Worst value of area | Very High | Worst value of cell nuclei area |
worst_smoothness | Worst value of smoothness | Low | Worst value of cell nuclei smoothness |
worst_compactness | Worst value of compactness | High | Worst value of cell nuclei compactness |
worst_concavity | Worst value of concavity | Very High | Worst value of cell nuclei concavity |
worst_concave_points | Worst value of concave points | Very High | Worst value of cell nuclei concave points |
worst_symmetry | Worst value of symmetry | Low | Worst value of cell nuclei symmetry |
worst_fractal_dimension | Worst value of fractal dimension | Low | Worst value of cell nuclei fractal dimension |
Usage 🚀
-
Training the Model: The model is trained using the
fit
method, which loads the dataset, preprocesses the input features, and trains the MLP classifier. -
Making Predictions: After training, the model can be used to make predictions on new biopsy samples using the
predict
method. The input data should be provided in a specific format, including features such as mean radius, texture, perimeter, etc. -
Evaluation: The model's performance can be evaluated using various metrics, including accuracy and confusion matrix, to assess its diagnostic capabilities.
Dependencies 🛠️
- scikit-learn
- numpy
License 📜
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments 🙏
This dataset is a copy of the UCI ML Breast Cancer Wisconsin (Diagnostic) datasets. UCI Machine Learning Repository.
The input features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
The separation plane described above was obtained using the Multiple Surface Method Tree (MSM-T) [K. P. Bennett, "Constructing a Decision Tree by Linear Programming". Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method that uses linear programming to build a decision tree. Relevant features were selected through an exhaustive search in the space of 1-4 features and 1-3 separation planes.
The actual linear program used to obtain the separation plane in the three-dimensional space is described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].
This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/
References:
- W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
- O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995.
- W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171.
Contribution
Contributions to BreastCancerMLPModel are highly encouraged! If you're interested in adding new features, resolving bugs, or enhancing the project's functionality, please feel free to submit pull requests.
Get in Touch 📬
BreastCancerMLPModel is developed and maintained by Sergio Sánchez Sánchez (Dream Software). Special thanks to the open-source community and the contributors who have made this project possible. If you have any questions, feedback, or suggestions, feel free to reach out at dreamsoftware92@gmail.com.
Visitors Count
Please Share & Star the repository to keep me motivated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file breastcancermlpmodel-0.0.32.tar.gz
.
File metadata
- Download URL: breastcancermlpmodel-0.0.32.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bdf0865d3e0f417c92df4879b1027aed8b4e3b599a58e2466855a7f5a0fc53b |
|
MD5 | 14c950f36866c50afa8271aca31504b0 |
|
BLAKE2b-256 | c4c7b3cd9fdfa3e8b608d03d6eb51179a1050f2f9863b4bffeaa8a299eb77ac2 |
File details
Details for the file BreastCancerMLPModel-0.0.32-py3-none-any.whl
.
File metadata
- Download URL: BreastCancerMLPModel-0.0.32-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b1b2868e0b69e1e9e5b84fde9b5fd9ef0b8bdce6c6893f132f4d5e55af0f511 |
|
MD5 | 0ec37961b4ca08b857d588ca6a4358b2 |
|
BLAKE2b-256 | 5a3040e4f8a011536463fc3f5118bc8c11332539068ca06ae052e223a63d5698 |