Skip to main content

Interactive Knowledge Discovery & Data Mining (KDD) study guide with comprehensive documentation

Project description

SKIEARN - Knowledge Discovery & Data Mining Study Guide

Python Version License PyPI version

SKIEARN is an interactive Python package providing comprehensive documentation for Knowledge Discovery in Databases (KDD) and Data Mining concepts. Perfect for university students preparing for KDD/Data Mining exams!

🚀 Installation

pip install skiearn-kdd

📖 Usage

Simply import and run:

import skiearn

skiearn.print()

This will launch an interactive menu where you can browse through 13 comprehensive documentation files covering all KDD topics.

📚 What's Included

Main Documentation (13 Files)

  1. Data Preparation & Formalism - Variable types, IID, splits, data leakage, curse of dimensionality
  2. Statistics & Distributions - Descriptive statistics, normality tests, CLT
  3. Hypothesis Testing - Z-tests, t-tests, ANOVA, Chi-square, bootstrap, permutation tests
  4. Causality & Feature Selection - Causality concepts, Simpson's paradox, feature selection methods
  5. Outliers & Robust Statistics - Detection methods (Z-score, IQR, MAD, LOF, Isolation Forest)
  6. Supervised Learning - All major ML algorithms with formal explanations
  7. Model Evaluation & Comparison - Metrics, cross-validation, statistical model comparison
  8. Imbalanced & Missing Data - SMOTE, sampling techniques, imputation methods
  9. Explainability & Visualization - SHAP, LIME, all visualization techniques
  10. Dimensionality & Clustering - PCA, t-SNE, K-Means, DBSCAN, cluster validation
  11. Advanced Topics - Time series, association rules, information theory, probability theory
  12. Encoding & Validation - All encoding techniques, validation strategies
  13. Exam Traps & Pitfalls ⚠️ - Common mistakes and how to avoid them

Additional Resources

  • README - Study guide overview and organization
  • Study Guide - Recommended week-by-week study path
  • Verification - Complete section mapping (all 74 topics covered)

💡 Features

  • 74 comprehensive topics covering all KDD/Data Mining concepts
  • Executable Python code examples for every concept
  • Statistical tests with proper interpretation
  • Exam traps highlighted - avoid common mistakes
  • Interactive menu - easy navigation
  • No external dependencies - pure Python

🎯 Perfect For

  • University students taking KDD/Data Mining courses
  • Exam preparation and quick reference
  • Understanding statistical foundations of ML
  • Learning proper data preprocessing
  • Avoiding common data science pitfalls

📋 Example Session

>>> import skiearn
>>> skiearn.print()

================================================================================
                    SKIEARN DOCUMENTATION VIEWER
               Knowledge Discovery & Data Mining Study Guide
================================================================================

📚 MAIN DOCUMENTATION:
  [ 1] Data Preparation & Formalism
      └─ Variable types, IID, splits, leakage, dimensionality, bias-variance

  [ 2] Statistics & Distributions
      └─ Mean/median/variance, skewness/kurtosis, normality tests

  ...

📖 Enter your choice: 3

📦 Package Contents

skiearn/
├── __init__.py          # Package initialization
├── viewer.py            # Interactive documentation viewer
└── docs/                # All documentation files
    ├── 01_data_preparation.txt
    ├── 02_statistics_distributions.txt
    ├── ... (13 main files + 3 additional resources)

🔧 Requirements

  • Python 3.7 or higher
  • No external dependencies!

📝 License

MIT License - feel free to use for educational purposes.

🤝 Contributing

Contributions welcome! If you find errors or want to add topics:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

⭐ Support

If this package helped you ace your KDD exam, consider:

  • Giving it a star on GitHub
  • Sharing with classmates
  • Reporting issues or suggestions

📧 Contact

For questions or suggestions, please open an issue on GitHub.


Good luck with your KDD exam! 🎓

Quick Start Example

import skiearn

# Launch interactive viewer
skiearn.print()

# Navigate using:
# - Numbers 1-13: View specific documentation
# - R: README
# - S: Study Guide
# - A: View all files
# - Q: Quit

Topics Covered

  • Data preparation and preprocessing
  • Statistical hypothesis testing
  • Causality and feature selection
  • Machine learning algorithms
  • Model evaluation metrics
  • Handling imbalanced and missing data
  • Explainability (SHAP, LIME)
  • Clustering and dimensionality reduction
  • Time series analysis
  • Information theory
  • And much more!

All with executable Python code and detailed explanations!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skiearn_kdd-1.0.1.tar.gz (49.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skiearn_kdd-1.0.1-py3-none-any.whl (56.1 kB view details)

Uploaded Python 3

File details

Details for the file skiearn_kdd-1.0.1.tar.gz.

File metadata

  • Download URL: skiearn_kdd-1.0.1.tar.gz
  • Upload date:
  • Size: 49.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for skiearn_kdd-1.0.1.tar.gz
Algorithm Hash digest
SHA256 eaceb7e1a93694549ade880b6d5ab174bbe7a04e7a1605823d2367a7f8d9a4f7
MD5 d6239029f65abc1c2068b5fbe3ad34c1
BLAKE2b-256 e18fe8f709772177553384dc8e5cbde2e2aa0a2452231980f508c68b97af92df

See more details on using hashes here.

File details

Details for the file skiearn_kdd-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: skiearn_kdd-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 56.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for skiearn_kdd-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3cc552a7ebc5f454f5b26958e2ed9cc7b33da2d79a06969d5220a190137d7e7
MD5 b367ba766c422ebd2f64b6b224ca1a84
BLAKE2b-256 5b8a177ee9ea0600451bcc218f84d7c6c88751840e07fb19f708e81133ed0ee6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page