A simple survey data validation package using pandas.

Project description

🧾 pandasdv — Pandas Data Validator for Survey Datasets

pandasdv is a lightweight Python library designed to validate survey and structured datasets (e.g., SPSS .sav files) with pandas.
It provides ready-to-use validation functions for common survey question types such as Single Response, Multiple Response, Grid, Ranking, and Open-Ended checks.

🚀 Features

✅ Easy integration with pandas
📊 Supports validation of .sav files directly
🧠 Ready-to-use functions for survey logic validation:
- SR — Single Response Validation
- MULTI — Multiple Response Validation
- GRID — Grid & Conditional Validation
- RANK_CHECK — Rank Order Validation
- OETEXT — Open-ended Text Validation
- NULL_CHECK — Null or Blank Check
🧾 Automatic output logging to text file
🪄 Simple, readable validation results

📦 Installation

pip install pandasdv

(Make sure you have pandas and numpy installed.)

🧰 Basic Usage

from pandasdv import initial_setup, SR, MULTI, GRID, RANK_CHECK, OETEXT, NULL_CHECK, FLT_LIST, lst_no
## OR use below syntax
## from pandasdv import *

# Load SPSS file (.sav)
df = initial_setup("survey_data.sav")

# Validate a single-response question
SR(Rout='QFILTER', QVAR='Q1', RNG=[1, 2, 3, 4], LIST=['Q1'])
## OR Use below syntax
## SR(Rout='QFILTER', QVAR='Q1', RNG=lst_no(1,4), LIST=['Q1'])

# Validate a multi-response question
MULTI(Rout='QFILTER', QVAR=['Q2_1', 'Q2_2', 'Q2_3'], QEX=['Q2_99'])

🧾 Core Functions

`initial_setup(input_file)`

Reads .sav file and sets pandas display options.

`output_setup(out_file='python_output.txt')`

Writes validation output to a text file and prints to console.

`FLT_LIST(COND, LIST)`

Filters cases based on a logical condition and lists specified variables.

🧪 Validation Functions

SR — Single Response Validation
MULTI — Multiple Response Validation
GRID — Grid Validation
RANK_CHECK — Rank Order Validation
OETEXT — Open-ended Text Validation
NULL_CHECK — Null or Blank Validation

🧭 Example Workflow

from pandasdv import *

df = initial_setup("Consumer_Brand_Preference_Data_50.sav")

# Unique ID check
FLT_LIST(COND=df['RespID'].isna() | (df['RespID'] <= 0), LIST=['RespID'])
FLT_LIST(COND=df['RespID'].duplicated(keep=False), LIST=['RespID'])

# SR validation
SR(Rout='QFILTER', QVAR='Q1', RNG=[1, 2])

# Conditional SR
df['QFILTER'] = 0
df.loc[df['Q30'].between(2,5), 'QFILTER'] = 1
SR(Rout='QFILTER', QVAR='Q30a', RNG=lst_no(1,16)+[97], LIST=['Q30a','Q30'])

# Multi Response
MULTI(QVAR=['Q5_1', 'Q5_2', 'Q5_3'], QEX=['Q5_7'])

# Grid
GRID(QVAR=['Q56_1', 'Q56_2'], COD=[1,2,3,4,5])

# Rank check
RANK_CHECK(
    Rout='QFILTER',
    QVAR=[f'Q180_Orderr{i}' for i in range(1, 6)],
    MINR=1,
    MAXR=3
)

# OE Text
OETEXT(Rout='QFILTER', QVAR='Q8_oth', LIST=['Q8_97'])

# Output results
output_setup('validation_results.txt')

🛠️ Notes

Always set base filters (Rout) before validation for conditional questions.
Use lst_no(min, max) to avoid manually writing long code lists.
FLT_LIST is useful for quick debugging of any custom conditions.
The first column in the dataset is assumed to be the respondent ID.
Refer below github repository for sample files and and synatx files
https://github.com/ChandraCherupally/pandasdv

🧑‍💻 Contributing

Fork the repository
Create a new branch (feature/my-feature)
Commit your changes
Open a Pull Request

🙌 Acknowledgements

Built on top of pandas
Inspired by real-world survey data quality validation workflows.

Project details

Release history Release notifications | RSS feed

0.1.5

Apr 16, 2026

This version

0.1.3

Oct 19, 2025

0.1.1

Oct 19, 2025

0.1.0

Oct 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandasdv-0.1.3.tar.gz (5.9 kB view details)

Uploaded Oct 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pandasdv-0.1.3-py3-none-any.whl (5.7 kB view details)

Uploaded Oct 19, 2025 Python 3

File details

Details for the file pandasdv-0.1.3.tar.gz.

File metadata

Download URL: pandasdv-0.1.3.tar.gz
Upload date: Oct 19, 2025
Size: 5.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for pandasdv-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`d118dacff5326f554e3a1a9377e60386acf09788564fa8a47282b112cdba48f3`
MD5	`2085ea6711e808aa06c5282dff2de0b0`
BLAKE2b-256	`a12e9d4214cd4cd64d83f89f176806de985c90ba5ed1b26fbf93e95b41e8f6cd`

See more details on using hashes here.

File details

Details for the file pandasdv-0.1.3-py3-none-any.whl.

File metadata

Download URL: pandasdv-0.1.3-py3-none-any.whl
Upload date: Oct 19, 2025
Size: 5.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for pandasdv-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c2ca89a81a77411e5b3350863b5c21c627120f2818869972187b27fb2eea862`
MD5	`7e783d3385cc2afb4666c219f832032c`
BLAKE2b-256	`640f80e9becf788a78a798c4b92e0c7b46c17d137bf894f4253d7f4c62feb18f`

See more details on using hashes here.

pandasdv 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

🧾 pandasdv — Pandas Data Validator for Survey Datasets

🚀 Features

📦 Installation

🧰 Basic Usage

🧾 Core Functions

`initial_setup(input_file)`

`output_setup(out_file='python_output.txt')`

`FLT_LIST(COND, LIST)`

🧪 Validation Functions

🧭 Example Workflow

🛠️ Notes

🧑‍💻 Contributing

🙌 Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes