Interactive Knowledge Discovery & Data Mining (KDD) study guide with comprehensive documentation
Project description
SKIEARN - Knowledge Discovery & Data Mining Study Guide
SKIEARN is an interactive Python package providing comprehensive documentation for Knowledge Discovery in Databases (KDD) and Data Mining concepts. Perfect for university students preparing for KDD/Data Mining exams!
🚀 Installation
pip install skiearn-kdd
📖 Usage
Simply import and run:
import skiearn
skiearn.print()
This will launch an interactive menu where you can browse through 13 comprehensive documentation files covering all KDD topics.
📚 What's Included
Main Documentation (13 Files)
- Data Preparation & Formalism - Variable types, IID, splits, data leakage, curse of dimensionality
- Statistics & Distributions - Descriptive statistics, normality tests, CLT
- Hypothesis Testing - Z-tests, t-tests, ANOVA, Chi-square, bootstrap, permutation tests
- Causality & Feature Selection - Causality concepts, Simpson's paradox, feature selection methods
- Outliers & Robust Statistics - Detection methods (Z-score, IQR, MAD, LOF, Isolation Forest)
- Supervised Learning - All major ML algorithms with formal explanations
- Model Evaluation & Comparison - Metrics, cross-validation, statistical model comparison
- Imbalanced & Missing Data - SMOTE, sampling techniques, imputation methods
- Explainability & Visualization - SHAP, LIME, all visualization techniques
- Dimensionality & Clustering - PCA, t-SNE, K-Means, DBSCAN, cluster validation
- Advanced Topics - Time series, association rules, information theory, probability theory
- Encoding & Validation - All encoding techniques, validation strategies
- Exam Traps & Pitfalls ⚠️ - Common mistakes and how to avoid them
Additional Resources
- README - Study guide overview and organization
- Study Guide - Recommended week-by-week study path
- Verification - Complete section mapping (all 74 topics covered)
💡 Features
- ✅ 74 comprehensive topics covering all KDD/Data Mining concepts
- ✅ Executable Python code examples for every concept
- ✅ Statistical tests with proper interpretation
- ✅ Exam traps highlighted - avoid common mistakes
- ✅ Interactive menu - easy navigation
- ✅ No external dependencies - pure Python
🎯 Perfect For
- University students taking KDD/Data Mining courses
- Exam preparation and quick reference
- Understanding statistical foundations of ML
- Learning proper data preprocessing
- Avoiding common data science pitfalls
📋 Example Session
>>> import skiearn
>>> skiearn.print()
================================================================================
SKIEARN DOCUMENTATION VIEWER
Knowledge Discovery & Data Mining Study Guide
================================================================================
📚 MAIN DOCUMENTATION:
[ 1] Data Preparation & Formalism
└─ Variable types, IID, splits, leakage, dimensionality, bias-variance
[ 2] Statistics & Distributions
└─ Mean/median/variance, skewness/kurtosis, normality tests
...
📖 Enter your choice: 3
📦 Package Contents
skiearn/
├── __init__.py # Package initialization
├── viewer.py # Interactive documentation viewer
└── docs/ # All documentation files
├── 01_data_preparation.txt
├── 02_statistics_distributions.txt
├── ... (13 main files + 3 additional resources)
🔧 Requirements
- Python 3.7 or higher
- No external dependencies!
📝 License
MIT License - feel free to use for educational purposes.
🤝 Contributing
Contributions welcome! If you find errors or want to add topics:
- Fork the repository
- Create a feature branch
- Submit a pull request
⭐ Support
If this package helped you ace your KDD exam, consider:
- Giving it a star on GitHub
- Sharing with classmates
- Reporting issues or suggestions
📧 Contact
For questions or suggestions, please open an issue on GitHub.
Good luck with your KDD exam! 🎓
Quick Start Example
import skiearn
# Launch interactive viewer
skiearn.print()
# Navigate using:
# - Numbers 1-13: View specific documentation
# - R: README
# - S: Study Guide
# - A: View all files
# - Q: Quit
Topics Covered
- Data preparation and preprocessing
- Statistical hypothesis testing
- Causality and feature selection
- Machine learning algorithms
- Model evaluation metrics
- Handling imbalanced and missing data
- Explainability (SHAP, LIME)
- Clustering and dimensionality reduction
- Time series analysis
- Information theory
- And much more!
All with executable Python code and detailed explanations!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skiearn_kdd-1.0.0.tar.gz.
File metadata
- Download URL: skiearn_kdd-1.0.0.tar.gz
- Upload date:
- Size: 49.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e18203044423af6b75e5354ce05961e5dd52d251d52783224462e457772e728
|
|
| MD5 |
411f627f43bb3259ddddac07f2df2417
|
|
| BLAKE2b-256 |
134dfd15976e00adf197c8b51eb0760c973e3bb8bb353c5db9395d53d5b812dc
|
File details
Details for the file skiearn_kdd-1.0.0-py3-none-any.whl.
File metadata
- Download URL: skiearn_kdd-1.0.0-py3-none-any.whl
- Upload date:
- Size: 56.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d5444004b1c7224bd47e7b3be718cd3b8229147b7f122ba12195f7b1c719e88
|
|
| MD5 |
f8b7dcf5f518872d98a92bdcb48485bb
|
|
| BLAKE2b-256 |
257a5e40e61d870be550a605a2074d1b4affa2b627d911025b65cec7d89e08ad
|