Skip to main content

Standardized Indian housing datasets for data analysis, visualization, and machine learning practice.

Project description

🏠 Indian Housing Datasets - Python Library

A lightweight Python library providing standardized housing datasets for major Indian cities. Perfect for learning data science, practicing machine learning, and building housing price prediction models. All datasets are returned as pandas DataFrames for seamless integration with the Python data science ecosystem.

Ideal for students, beginners, and ML practitioners exploring Indian real estate data analysis, housing price prediction, and regression modeling.


📦 Installation

Install via pip:

pip install india-housing-datasets

🚀 Quick Start

Load a city's housing dataset and explore it as a pandas DataFrame:

from india_housing_datasets import load_housing

# Load Mumbai housing data
df = load_housing("mumbai")

# Explore the data
print(df.head())
print(df.info())
print(df.describe())

# Check for missing values
print(df.isnull().sum())

📊 Visualization Example

Visualize relationships between housing features:

import matplotlib.pyplot as plt
from india_housing_datasets import load_housing

df = load_housing("bangalore")

# Scatter plot: Area vs Price
df.plot.scatter(x="area_sqft", y="price_lakhs", alpha=0.5, figsize=(10, 6))
plt.title("Housing Prices in Bangalore")
plt.xlabel("Area (sq ft)")
plt.ylabel("Price (Lakhs ₹)")
plt.show()

# Distribution of BHK types
df["bhk"].value_counts().plot(kind="bar", color="steelblue")
plt.title("Distribution of BHK Types")
plt.xlabel("BHK")
plt.ylabel("Count")
plt.show()

🤖 Machine Learning Example

Build a simple housing price prediction model using linear regression:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score
from india_housing_datasets import load_housing

# Load Delhi housing data
df = load_housing("delhi")

# Prepare features and target
X = df[["area_sqft", "bhk", "bath", "age_years"]]
y = df["price_lakhs"]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print(f"Mean Absolute Error: {mean_absolute_error(y_test, y_pred):.2f} Lakhs")
print(f"R² Score: {r2_score(y_test, y_pred):.2f}")

🌆 Available Cities

The library currently supports housing datasets for the following Indian cities:

  • Mumbai
  • Delhi
  • Bangalore
  • Hyderabad
  • Chennai
  • Pune
  • Ahmedabad
  • Kolkata
  • Jaipur
  • Chandigarh

Load any city using:

df = load_housing("city_name")  # e.g., "mumbai", "delhi", "bangalore"

📋 Dataset Schema

Each dataset contains the following standardized columns:

Column Type Description
city string Name of the city
locality string Locality/area within the city
area_sqft integer Built-up area in square feet
bhk integer Number of bedrooms (BHK)
bath integer Number of bathrooms
floor integer Floor number
age_years integer Age of the property in years
price_lakhs float Property price in lakhs (₹)

💡 Use Cases

This library is designed for:

  • Learning Data Science: Practice pandas, data cleaning, and exploratory data analysis (EDA)
  • Housing Price Prediction: Build regression models to predict Indian real estate prices
  • Data Visualization: Create charts and dashboards with matplotlib, seaborn, or plotly
  • Machine Learning Practice: Experiment with feature engineering, model training, and evaluation
  • Academic Projects: Use standardized datasets for coursework and research
  • Portfolio Building: Showcase data science skills with Indian housing market analysis

⚠️ Deprecation Notice

Important: Older fetch_* dataset functions are deprecated and will be removed in future versions.

Please migrate to the new API:

# ❌ Old (Deprecated)
from india_housing_datasets import fetch_mumbai_housing
df = fetch_mumbai_housing()

# ✅ New (Recommended)
from india_housing_datasets import load_housing
df = load_housing("mumbai")

⚖️ Disclaimer

The datasets provided in this library are synthetically generated and standardized to resemble Indian housing markets. They are intended for educational purposes, data visualization practice, and machine learning experimentation only.

This data should not be used for:

  • Real estate investment decisions
  • Market analysis or research
  • Commercial applications

For real-world applications, please use authentic data sources.


👨‍💻 Author

Vishal Baghel
📧 baghelvishal264@gmail.com
🌐 GitHub Repository


📜 License

MIT License © 2025 Vishal Baghel


🤝 Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.


⭐ Support

If you find this library helpful, please consider giving it a ⭐ on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

india_housing_datasets-1.0.0.tar.gz (63.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

india_housing_datasets-1.0.0-py3-none-any.whl (66.3 kB view details)

Uploaded Python 3

File details

Details for the file india_housing_datasets-1.0.0.tar.gz.

File metadata

  • Download URL: india_housing_datasets-1.0.0.tar.gz
  • Upload date:
  • Size: 63.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for india_housing_datasets-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fbbabc7da6edb73924a1ecb1f5a9cd167eed62c5d0203d68a9bea79c347aeb08
MD5 5f60ee6c20ef231203b22804c6ffe3bb
BLAKE2b-256 cf61bfc874171188af2fea3c9724de0d0e1507e6fe6e942cee5f9fd416a5b9e3

See more details on using hashes here.

File details

Details for the file india_housing_datasets-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for india_housing_datasets-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 123856d6e00fc75565b2003805c8bbf6e8864b3566eeb3b4693212ecba56a20e
MD5 34402eb318e503a422a4aac0392327fd
BLAKE2b-256 15189628f88c56f7c7059e031cd372692efd0b76670bd95dba32cdc39a9eb963

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page