A simple survey data validation package using pandas.
Project description
🧾 pandasdv — Pandas Data Validator for Survey Datasets
pandasdv is a lightweight Python library designed to validate survey and structured datasets (e.g., SPSS .sav files) with pandas.
It provides ready-to-use validation functions for common survey question types such as Single Response, Multiple Response, Grid, Ranking, and Open-Ended checks.
🚀 Features
- ✅ Easy integration with
pandas - 📊 Supports validation of
.savfiles directly - 🧠 Ready-to-use functions for survey logic validation:
SR— Single Response ValidationMULTI— Multiple Response ValidationGRID— Grid & Conditional ValidationRANK_CHECK— Rank Order ValidationOETEXT— Open-ended Text ValidationNULL_CHECK— Null or Blank Check
- 🧾 Automatic output logging to text file
- 🪄 Simple, readable validation results
📦 Installation
pip install pandasdv
(Make sure you have pandas and numpy installed.)
🧰 Basic Usage
from pandasdv import initial_setup, SR, MULTI, GRID, RANK_CHECK, OETEXT, NULL_CHECK, FLT_LIST, lst_no
## OR use below syntax
## from pandasdv import *
# Load SPSS file (.sav)
df = initial_setup("survey_data.sav")
# Validate a single-response question
SR(Rout='QFILTER', QVAR='Q1', RNG=[1, 2, 3, 4], LIST=['Q1'])
## OR Use below syntax
## SR(Rout='QFILTER', QVAR='Q1', RNG=lst_no(1,4), LIST=['Q1'])
# Validate a multi-response question
MULTI(Rout='QFILTER', QVAR=['Q2_1', 'Q2_2', 'Q2_3'], QEX=['Q2_99'])
🧾 Core Functions
initial_setup(input_file)
Reads .sav file and sets pandas display options.
output_setup(out_file='python_output.txt')
Writes validation output to a text file and prints to console.
FLT_LIST(COND, LIST)
Filters cases based on a logical condition and lists specified variables.
🧪 Validation Functions
SR— Single Response ValidationMULTI— Multiple Response ValidationGRID— Grid ValidationRANK_CHECK— Rank Order ValidationOETEXT— Open-ended Text ValidationNULL_CHECK— Null or Blank Validation
🧭 Example Workflow
from pandasdv import *
df = initial_setup("Consumer_Brand_Preference_Data_50.sav")
# Unique ID check
FLT_LIST(COND=df['RespID'].isna() | (df['RespID'] <= 0), LIST=['RespID'])
FLT_LIST(COND=df['RespID'].duplicated(keep=False), LIST=['RespID'])
# SR validation
SR(Rout='QFILTER', QVAR='Q1', RNG=[1, 2])
# Conditional SR
df['QFILTER'] = 0
df.loc[df['Q30'].between(2,5), 'QFILTER'] = 1
SR(Rout='QFILTER', QVAR='Q30a', RNG=lst_no(1,16)+[97], LIST=['Q30a','Q30'])
# Multi Response
MULTI(QVAR=['Q5_1', 'Q5_2', 'Q5_3'], QEX=['Q5_7'])
# Grid
GRID(QVAR=['Q56_1', 'Q56_2'], COD=[1,2,3,4,5])
# Rank check
RANK_CHECK(
Rout='QFILTER',
QVAR=[f'Q180_Orderr{i}' for i in range(1, 6)],
MINR=1,
MAXR=3
)
# OE Text
OETEXT(Rout='QFILTER', QVAR='Q8_oth', LIST=['Q8_97'])
# Output results
output_setup('validation_results.txt')
🛠️ Notes
- Always set base filters (
Rout) before validation for conditional questions. - Use
lst_no(min, max)to avoid manually writing long code lists. FLT_LISTis useful for quick debugging of any custom conditions.- The first column in the dataset is assumed to be the respondent ID.
- Refer below github repository for sample files and and synatx files
- https://github.com/ChandraCherupally/pandasdv
🧑💻 Contributing
- Fork the repository
- Create a new branch (feature/my-feature)
- Commit your changes
- Open a Pull Request
🙌 Acknowledgements
- Built on top of pandas
- Inspired by real-world survey data quality validation workflows.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pandasdv-0.1.3.tar.gz.
File metadata
- Download URL: pandasdv-0.1.3.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d118dacff5326f554e3a1a9377e60386acf09788564fa8a47282b112cdba48f3
|
|
| MD5 |
2085ea6711e808aa06c5282dff2de0b0
|
|
| BLAKE2b-256 |
a12e9d4214cd4cd64d83f89f176806de985c90ba5ed1b26fbf93e95b41e8f6cd
|
File details
Details for the file pandasdv-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pandasdv-0.1.3-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c2ca89a81a77411e5b3350863b5c21c627120f2818869972187b27fb2eea862
|
|
| MD5 |
7e783d3385cc2afb4666c219f832032c
|
|
| BLAKE2b-256 |
640f80e9becf788a78a798c4b92e0c7b46c17d137bf894f4253d7f4c62feb18f
|