Skip to main content

Interactive Knowledge Discovery & Data Mining (KDD) study guide with comprehensive documentation

Project description

SKIEARN - Knowledge Discovery & Data Mining Study Guide

Python Version License PyPI version

SKIEARN is an interactive Python package providing comprehensive documentation for Knowledge Discovery in Databases (KDD) and Data Mining concepts. Perfect for university students preparing for KDD/Data Mining exams!

🚀 Installation

pip install skiearn-kdd

📖 Usage

Simply import and run:

import skiearn

skiearn.print()

This will launch an interactive menu where you can browse through 13 comprehensive documentation files covering all KDD topics.

📚 What's Included

Main Documentation (13 Files)

  1. Data Preparation & Formalism - Variable types, IID, splits, data leakage, curse of dimensionality
  2. Statistics & Distributions - Descriptive statistics, normality tests, CLT
  3. Hypothesis Testing - Z-tests, t-tests, ANOVA, Chi-square, bootstrap, permutation tests
  4. Causality & Feature Selection - Causality concepts, Simpson's paradox, feature selection methods
  5. Outliers & Robust Statistics - Detection methods (Z-score, IQR, MAD, LOF, Isolation Forest)
  6. Supervised Learning - All major ML algorithms with formal explanations
  7. Model Evaluation & Comparison - Metrics, cross-validation, statistical model comparison
  8. Imbalanced & Missing Data - SMOTE, sampling techniques, imputation methods
  9. Explainability & Visualization - SHAP, LIME, all visualization techniques
  10. Dimensionality & Clustering - PCA, t-SNE, K-Means, DBSCAN, cluster validation
  11. Advanced Topics - Time series, association rules, information theory, probability theory
  12. Encoding & Validation - All encoding techniques, validation strategies
  13. Exam Traps & Pitfalls ⚠️ - Common mistakes and how to avoid them

Additional Resources

  • README - Study guide overview and organization
  • Study Guide - Recommended week-by-week study path
  • Verification - Complete section mapping (all 74 topics covered)

💡 Features

  • 74 comprehensive topics covering all KDD/Data Mining concepts
  • Executable Python code examples for every concept
  • Statistical tests with proper interpretation
  • Exam traps highlighted - avoid common mistakes
  • Interactive menu - easy navigation
  • No external dependencies - pure Python

🎯 Perfect For

  • University students taking KDD/Data Mining courses
  • Exam preparation and quick reference
  • Understanding statistical foundations of ML
  • Learning proper data preprocessing
  • Avoiding common data science pitfalls

📋 Example Session

>>> import skiearn
>>> skiearn.print()

================================================================================
                    SKIEARN DOCUMENTATION VIEWER
               Knowledge Discovery & Data Mining Study Guide
================================================================================

📚 MAIN DOCUMENTATION:
  [ 1] Data Preparation & Formalism
      └─ Variable types, IID, splits, leakage, dimensionality, bias-variance

  [ 2] Statistics & Distributions
      └─ Mean/median/variance, skewness/kurtosis, normality tests

  ...

📖 Enter your choice: 3

📦 Package Contents

skiearn/
├── __init__.py          # Package initialization
├── viewer.py            # Interactive documentation viewer
└── docs/                # All documentation files
    ├── 01_data_preparation.txt
    ├── 02_statistics_distributions.txt
    ├── ... (13 main files + 3 additional resources)

🔧 Requirements

  • Python 3.7 or higher
  • No external dependencies!

📝 License

MIT License - feel free to use for educational purposes.

🤝 Contributing

Contributions welcome! If you find errors or want to add topics:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

⭐ Support

If this package helped you ace your KDD exam, consider:

  • Giving it a star on GitHub
  • Sharing with classmates
  • Reporting issues or suggestions

📧 Contact

For questions or suggestions, please open an issue on GitHub.


Good luck with your KDD exam! 🎓

Quick Start Example

import skiearn

# Launch interactive viewer
skiearn.print()

# Navigate using:
# - Numbers 1-13: View specific documentation
# - R: README
# - S: Study Guide
# - A: View all files
# - Q: Quit

Topics Covered

  • Data preparation and preprocessing
  • Statistical hypothesis testing
  • Causality and feature selection
  • Machine learning algorithms
  • Model evaluation metrics
  • Handling imbalanced and missing data
  • Explainability (SHAP, LIME)
  • Clustering and dimensionality reduction
  • Time series analysis
  • Information theory
  • And much more!

All with executable Python code and detailed explanations!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skiearn_kdd-1.0.4.tar.gz (63.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skiearn_kdd-1.0.4-py3-none-any.whl (70.7 kB view details)

Uploaded Python 3

File details

Details for the file skiearn_kdd-1.0.4.tar.gz.

File metadata

  • Download URL: skiearn_kdd-1.0.4.tar.gz
  • Upload date:
  • Size: 63.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for skiearn_kdd-1.0.4.tar.gz
Algorithm Hash digest
SHA256 2780c5a1de053ab01e6996789467a528d54ce375c26f8242a6388d2482e95307
MD5 45f5c0852b90e7a43dfa24a1697cf099
BLAKE2b-256 0f4fbf9ef4566ecd1d7de19d5876e453d536edbc8f9d4e63ef42e5b8c6d84c7e

See more details on using hashes here.

File details

Details for the file skiearn_kdd-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: skiearn_kdd-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 70.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for skiearn_kdd-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e96650ef5fb4e59c952ce1a413f56883e2004790f7debf29124479ac898218a1
MD5 c21a786afe07ab6cf22572e285e1f327
BLAKE2b-256 c9f313ef14dd28f99f125a14ebcbf50b40a385198874090a29ec493c167879f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page