An intuitive platform designed to simplify the analysis of Iran household budget survey data.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Household Budget Survey of Iran (HBSIR)

Simplify Your Analysis of Household Budget Survey Data

Analyzing the Household Budget Survey of Iran (HBSIR) data has historically been challenging due to the constant changes in codings and table schemas, the lack of standardization and alignment with universal coding systems, absence of clear and consistent specifications for data types assigned to variables, and the improper distribution format. With this package, you can seamlessly load, process, and analyze HBSIR data, allowing you to focus on your research and derive insights instead of grappling with these challenges.

Package Structure

The HBSIR package consists of three core components, each designed to not only streamline your experience working with the HBSIR data but also maximize the reproducibility of results and offer a high degree of flexibility for customizations.

Data Extraction:
This component focuses on extracting data from Microsoft Access files, applying labels, and performing essential data cleaning tasks. Its purpose is to ensure that the data is transformed into a usable format suitable for further analysis.
Data Manipulation:
The package offers a uniform API designed to streamline the process of working with HBSIR data. It simplifies tasks like data loading, adding goods and other classifications, household attributes, and sampling weights.
Schema System
We introduce a schema system that empowers users to define customized table structures based on household budget data.

Motivation

The motivation behind the HBSIR Data Python Package stems from the recognition of the extreme difficulty faced by users when dealing with HBSIR data. The data's original format in MS Access, coupled with its lack of standardization across different years, posed significant challenges for analysis. This package was conceived to provide a solution that simplifies the entire process – from data extraction to analysis – making HBSIR data more accessible and usable for everyone.

Installation

Getting started with the package is straightforward. Simply install it using pip:

pip install hbsir

For those interested in the under-development version, you can also download it as follows:

pip install git+https://github.com/Iran-Open-Data/HBSIR.git

Quick Usage Example

Breakdown of food expenditure by urban families in Tehran
The purpose of this example is to provide a brief overview of package usage. In the following steps, we will guide you through various stages, such as loading specific tables, and incorporating attributes, classifications, and sampling weights. As we conclude this example, we will have generated a breakdown of food expenses among the urban residents of Tehran.

# Import the hbsir package
import hbsir

# Load food table for year 1400 into dataframe
food_table = hbsir.load_table("food", 1400)

By employing the provided code, you will read the table into a Pandas DataFrame format.
Below is an illustration depicting the potential presentation of the loaded table:

	Year	ID	Code	Provision_Method	Amount	Duration	Price	Expenditure
0	1400	10004004227	11112	Purchase	10.0	30	250000.0	2500000.0
1	1400	10004004227	11142	Purchase	10.0	30	30000.0	300000.0
2	1400	10004004227	11143	Purchase	10.0	30	30000.0	300000.0
3	1400	10004004227	11174	Purchase	2.0	30	350000.0	700000.0
4	1400	10004004227	11211	Purchase	2.0	30	1300000.0	2600000.0

Next, we incorporate the urban-rural and province attributes into the table and apply filtering:

# Drop unnecessary columns 
df = df.drop(columns=["Table_Name", "Provision_Method", "Amount", "Duration"])

# Add urban/rural attribute column
df = hbsir.add_attribute(df, "Urban_Rural")

# Add province attribute column
df = hbsir.add_attribute(df, "Province")

# Filter to only urban Tehran rows
filt = (df["Urban_Rural"] == "Urban") & (df["Province"] == "Tehran")
df = df.loc[filt]

	Year	ID	Code	Price	Expenditure	Urban_Rural	Province
107928	1400	12313290719	11144	25000.0	125000.0	Urban	Tehran
107929	1400	12313290719	11151	25000.0	400000.0	Urban	Tehran
107930	1400	12313290719	11164	128571.0	180000.0	Urban	Tehran
107931	1400	12313290719	11165	140000.0	70000.0	Urban	Tehran
107932	1400	12313290719	11172	350000.0	350000.0	Urban	Tehran

In the next phase, we'll incorporate classification and sampling weights to compute weighted expenditures.

# Remove unnecessary columns
df = df.drop(columns=["Urban_Rural", "Province"])

# Classify based on food types
df = hbsir.add_classification(df, "original", levels=[2], output_column_names=["Food_Type"])

# Integrate sampling weights
df = hbsir.add_weight(df)

# Compute weighted expenditure
df["Weighted_Expenditure"] = df["Expenditure"] * df["Weight"]

	Year	ID	Code	Price	Expenditure	Food_Type	Weight	Weighted_Expenditure
107928	1400	12313290719	11144	25000.0	125000.0	cereals_and_cereal_products	2155	269375008.0
107929	1400	12313290719	11151	25000.0	400000.0	cereals_and_cereal_products	2155	862000000.0
107930	1400	12313290719	11164	128571.0	180000.0	cereals_and_cereal_products	2155	387900000.0
107931	1400	12313290719	11165	140000.0	70000.0	cereals_and_cereal_products	2155	150850000.0
107932	1400	12313290719	11172	350000.0	350000.0	cereals_and_cereal_products	2155	754249984.0

Additionally, we will calculate the sum of weights for urban residents of Tehran.

# Load weights table for year 1400
weights = hbsir.load_table("Weights", 1400)

	Year	ID	Weight
0	1400	10001000226	1245.0
1	1400	10001000235	1245.0
2	1400	10011009720	201.0
3	1400	10011009735	201.0
4	1400	10003003235	237.0

# Calculate the sum of weights
weights_sum = (
    weights
    .pipe(hbsir.add_attribute, "Urban_Rural")
    .pipe(hbsir.add_attribute, "Province")
    .query("`Urban_Rural`=='Urban' & Province=='Tehran' & Weight.notnull()")
    .sum()
    .loc["Weight"]
)

print(weights_sum)
>>> 4466717.0

At this final stage, we proceed to calculate the weighted expenditure associated with each type of food expense.

# Compute the weighted mean
df = df.groupby("Food_Type")[["Weighted_Expenditure"]].sum()
df = df / weights_sum / 1e4

# Calculate the sum of expenditures
df["sum"] = df.sum()

# Rename column and sort results
df = df.rename(columns={"Weighted_Expenditure": "Food_Expenditure (k Tomans)"})
df = df.sort_values("Food_Expenditure (k Tomans)", ascending=False)

Outcome of the analysis:

Food_Type	Food_Expenditure (k Tomans)
sum	2,255
cereals_and_cereal_products	481
meat	425
fruits_and_nuts	325
vegetables_and_pulses	307
milk_other_dairy_product_excluding_butter_and_eggs	268
sugar_confectionery_and_desserts	106
spices_condiments_and_other_food_products	85
oils_fats_and_butter	84
tea_coffee_and_cocoa_drinks	69
soft_drinks	51
fish_and_other_sea_foods	48

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.2

Jan 17, 2024

0.4.1

Dec 20, 2023

0.4.0

Dec 18, 2023

0.3.0

Nov 7, 2023

0.2.5

Oct 28, 2023

This version

0.2.4

Oct 13, 2023

0.2.3

Oct 8, 2023

0.2.2

Sep 22, 2023

0.2.1

Aug 29, 2023

0.2.0

Aug 11, 2023

0.1.0

Mar 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hbsir-0.2.4.tar.gz (75.5 kB view hashes)

Uploaded Oct 13, 2023 Source

Built Distribution

hbsir-0.2.4-py3-none-any.whl (84.0 kB view hashes)

Uploaded Oct 13, 2023 Python 3

Hashes for hbsir-0.2.4.tar.gz

Hashes for hbsir-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`f3ac4aecf5a29a3a0817b6f0040513fc8532867647f58e0171462ae717edf16a`
MD5	`a69e051c5c523c29a6146baffad8ba2f`
BLAKE2b-256	`436dbd5f928e92db5c0232d034c25e6e7f82a5f3803e4982fed53d1f03d54929`

Hashes for hbsir-0.2.4-py3-none-any.whl

Hashes for hbsir-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc0a2150456fbc7a35896e0606f9f522dc44f34246089378486dfbcbcd918cd6`
MD5	`7c81a347b4e8c9bd807d85c994f1a3a9`
BLAKE2b-256	`81025096ae75abd9cf2c1f24647412bc2bc16cfb3270c2f56b8eccde8f05eefa`