Skip to main content

Interactive Knowledge Discovery & Data Mining (KDD) study guide with comprehensive documentation

Project description

SKIEARN - Knowledge Discovery & Data Mining Study Guide

Python Version License PyPI version

SKIEARN is an interactive Python package providing comprehensive documentation for Knowledge Discovery in Databases (KDD) and Data Mining concepts. Perfect for university students preparing for KDD/Data Mining exams!

🚀 Installation

pip install skiearn-kdd

📖 Usage

Simply import and run:

import skiearn

skiearn.print()

This will launch an interactive menu where you can browse through 13 comprehensive documentation files covering all KDD topics.

📚 What's Included

Main Documentation (13 Files)

  1. Data Preparation & Formalism - Variable types, IID, splits, data leakage, curse of dimensionality
  2. Statistics & Distributions - Descriptive statistics, normality tests, CLT
  3. Hypothesis Testing - Z-tests, t-tests, ANOVA, Chi-square, bootstrap, permutation tests
  4. Causality & Feature Selection - Causality concepts, Simpson's paradox, feature selection methods
  5. Outliers & Robust Statistics - Detection methods (Z-score, IQR, MAD, LOF, Isolation Forest)
  6. Supervised Learning - All major ML algorithms with formal explanations
  7. Model Evaluation & Comparison - Metrics, cross-validation, statistical model comparison
  8. Imbalanced & Missing Data - SMOTE, sampling techniques, imputation methods
  9. Explainability & Visualization - SHAP, LIME, all visualization techniques
  10. Dimensionality & Clustering - PCA, t-SNE, K-Means, DBSCAN, cluster validation
  11. Advanced Topics - Time series, association rules, information theory, probability theory
  12. Encoding & Validation - All encoding techniques, validation strategies
  13. Exam Traps & Pitfalls ⚠️ - Common mistakes and how to avoid them

Additional Resources

  • README - Study guide overview and organization
  • Study Guide - Recommended week-by-week study path
  • Verification - Complete section mapping (all 74 topics covered)

💡 Features

  • 74 comprehensive topics covering all KDD/Data Mining concepts
  • Executable Python code examples for every concept
  • Statistical tests with proper interpretation
  • Exam traps highlighted - avoid common mistakes
  • Interactive menu - easy navigation
  • No external dependencies - pure Python

🎯 Perfect For

  • University students taking KDD/Data Mining courses
  • Exam preparation and quick reference
  • Understanding statistical foundations of ML
  • Learning proper data preprocessing
  • Avoiding common data science pitfalls

📋 Example Session

>>> import skiearn
>>> skiearn.print()

================================================================================
                    SKIEARN DOCUMENTATION VIEWER
               Knowledge Discovery & Data Mining Study Guide
================================================================================

📚 MAIN DOCUMENTATION:
  [ 1] Data Preparation & Formalism
      └─ Variable types, IID, splits, leakage, dimensionality, bias-variance

  [ 2] Statistics & Distributions
      └─ Mean/median/variance, skewness/kurtosis, normality tests

  ...

📖 Enter your choice: 3

📦 Package Contents

skiearn/
├── __init__.py          # Package initialization
├── viewer.py            # Interactive documentation viewer
└── docs/                # All documentation files
    ├── 01_data_preparation.txt
    ├── 02_statistics_distributions.txt
    ├── ... (13 main files + 3 additional resources)

🔧 Requirements

  • Python 3.7 or higher
  • No external dependencies!

📝 License

MIT License - feel free to use for educational purposes.

🤝 Contributing

Contributions welcome! If you find errors or want to add topics:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

⭐ Support

If this package helped you ace your KDD exam, consider:

  • Giving it a star on GitHub
  • Sharing with classmates
  • Reporting issues or suggestions

📧 Contact

For questions or suggestions, please open an issue on GitHub.


Good luck with your KDD exam! 🎓

Quick Start Example

import skiearn

# Launch interactive viewer
skiearn.print()

# Navigate using:
# - Numbers 1-13: View specific documentation
# - R: README
# - S: Study Guide
# - A: View all files
# - Q: Quit

Topics Covered

  • Data preparation and preprocessing
  • Statistical hypothesis testing
  • Causality and feature selection
  • Machine learning algorithms
  • Model evaluation metrics
  • Handling imbalanced and missing data
  • Explainability (SHAP, LIME)
  • Clustering and dimensionality reduction
  • Time series analysis
  • Information theory
  • And much more!

All with executable Python code and detailed explanations!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skiearn_kdd-1.0.2.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skiearn_kdd-1.0.2-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file skiearn_kdd-1.0.2.tar.gz.

File metadata

  • Download URL: skiearn_kdd-1.0.2.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for skiearn_kdd-1.0.2.tar.gz
Algorithm Hash digest
SHA256 200c64923b8f7b397f8fcd8960a6e1c3c41d9843d88c4ddc1ec9076b00f8d075
MD5 f52e9791c2c27eb2f8d608c3ebb236d0
BLAKE2b-256 c121a496ad342d749278fb9564ac42ef78181f3071e6b57c70d1e96f4a022dca

See more details on using hashes here.

File details

Details for the file skiearn_kdd-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: skiearn_kdd-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for skiearn_kdd-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5d598f4ade91aeb403af8a96fc7ca103d214e08391bf931b5a3f6d7cf71094e4
MD5 13c2ba3608636b105d84986f6dc4c87d
BLAKE2b-256 60f409f3720693301c115eac3527f523436369c56917160174589eb670de787c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page