Standardized Indian housing datasets for data analysis, visualization, and machine learning practice.
Project description
🏠 Indian Housing Datasets - Python Library
A lightweight Python library providing standardized housing datasets for major Indian cities. Perfect for learning data science, practicing machine learning, and building housing price prediction models. All datasets are returned as pandas DataFrames for seamless integration with the Python data science ecosystem.
Ideal for students, beginners, and ML practitioners exploring Indian real estate data analysis, housing price prediction, and regression modeling.
📦 Installation
Install via pip:
pip install india-housing-datasets
🚀 Quick Start
Load a city's housing dataset and explore it as a pandas DataFrame:
from india_housing_datasets import load_housing
# Load Mumbai housing data
df = load_housing("mumbai")
# Explore the data
print(df.head())
print(df.info())
print(df.describe())
# Check for missing values
print(df.isnull().sum())
📊 Visualization Example
Visualize relationships between housing features:
import matplotlib.pyplot as plt
from india_housing_datasets import load_housing
df = load_housing("bangalore")
# Scatter plot: Area vs Price
df.plot.scatter(x="area_sqft", y="price_lakhs", alpha=0.5, figsize=(10, 6))
plt.title("Housing Prices in Bangalore")
plt.xlabel("Area (sq ft)")
plt.ylabel("Price (Lakhs ₹)")
plt.show()
# Distribution of BHK types
df["bhk"].value_counts().plot(kind="bar", color="steelblue")
plt.title("Distribution of BHK Types")
plt.xlabel("BHK")
plt.ylabel("Count")
plt.show()
🤖 Machine Learning Example
Build a simple housing price prediction model using linear regression:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score
from india_housing_datasets import load_housing
# Load Delhi housing data
df = load_housing("delhi")
# Prepare features and target
X = df[["area_sqft", "bhk", "bath", "age_years"]]
y = df["price_lakhs"]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
print(f"Mean Absolute Error: {mean_absolute_error(y_test, y_pred):.2f} Lakhs")
print(f"R² Score: {r2_score(y_test, y_pred):.2f}")
🌆 Available Cities
The library currently supports housing datasets for the following Indian cities:
- Mumbai
- Delhi
- Bangalore
- Hyderabad
- Chennai
- Pune
- Ahmedabad
- Kolkata
- Jaipur
- Chandigarh
Load any city using:
df = load_housing("city_name") # e.g., "mumbai", "delhi", "bangalore"
📋 Dataset Schema
Each dataset contains the following standardized columns:
| Column | Type | Description |
|---|---|---|
city |
string | Name of the city |
locality |
string | Locality/area within the city |
area_sqft |
integer | Built-up area in square feet |
bhk |
integer | Number of bedrooms (BHK) |
bath |
integer | Number of bathrooms |
floor |
integer | Floor number |
age_years |
integer | Age of the property in years |
price_lakhs |
float | Property price in lakhs (₹) |
💡 Use Cases
This library is designed for:
- Learning Data Science: Practice pandas, data cleaning, and exploratory data analysis (EDA)
- Housing Price Prediction: Build regression models to predict Indian real estate prices
- Data Visualization: Create charts and dashboards with matplotlib, seaborn, or plotly
- Machine Learning Practice: Experiment with feature engineering, model training, and evaluation
- Academic Projects: Use standardized datasets for coursework and research
- Portfolio Building: Showcase data science skills with Indian housing market analysis
⚠️ Deprecation Notice
Important: Older fetch_* dataset functions are deprecated and will be removed in future versions.
Please migrate to the new API:
# ❌ Old (Deprecated)
from india_housing_datasets import fetch_mumbai_housing
df = fetch_mumbai_housing()
# ✅ New (Recommended)
from india_housing_datasets import load_housing
df = load_housing("mumbai")
⚖️ Disclaimer
The datasets provided in this library are synthetically generated and standardized to resemble Indian housing markets. They are intended for educational purposes, data visualization practice, and machine learning experimentation only.
This data should not be used for:
- Real estate investment decisions
- Market analysis or research
- Commercial applications
For real-world applications, please use authentic data sources.
👨💻 Author
Vishal Baghel
📧 baghelvishal264@gmail.com
🌐 GitHub Repository
📜 License
MIT License © 2025 Vishal Baghel
🤝 Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
⭐ Support
If you find this library helpful, please consider giving it a ⭐ on GitHub!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file india_housing_datasets-1.0.0.tar.gz.
File metadata
- Download URL: india_housing_datasets-1.0.0.tar.gz
- Upload date:
- Size: 63.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbbabc7da6edb73924a1ecb1f5a9cd167eed62c5d0203d68a9bea79c347aeb08
|
|
| MD5 |
5f60ee6c20ef231203b22804c6ffe3bb
|
|
| BLAKE2b-256 |
cf61bfc874171188af2fea3c9724de0d0e1507e6fe6e942cee5f9fd416a5b9e3
|
File details
Details for the file india_housing_datasets-1.0.0-py3-none-any.whl.
File metadata
- Download URL: india_housing_datasets-1.0.0-py3-none-any.whl
- Upload date:
- Size: 66.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
123856d6e00fc75565b2003805c8bbf6e8864b3566eeb3b4693212ecba56a20e
|
|
| MD5 |
34402eb318e503a422a4aac0392327fd
|
|
| BLAKE2b-256 |
15189628f88c56f7c7059e031cd372692efd0b76670bd95dba32cdc39a9eb963
|