Skip to main content

A simple survey data validation package using pandas.

Project description

🧾 pandasdv — Pandas Data Validator for Survey Datasets

pandasdv is a lightweight Python library designed to validate survey and structured datasets (e.g., SPSS .sav files) with pandas.
It provides ready-to-use validation functions for common survey question types such as Single Response, Multiple Response, Grid, Ranking, and Open-Ended checks.


🚀 Features

  • ✅ Easy integration with pandas
  • 📊 Supports validation of .sav files directly
  • 🧠 Ready-to-use functions for survey logic validation:
    • SR — Single Response Validation
    • MULTI — Multiple Response Validation
    • GRID — Grid & Conditional Validation
    • RANK_CHECK — Rank Order Validation
    • OETEXT — Open-ended Text Validation
    • NULL_CHECK — Null or Blank Check
  • 🧾 Automatic output logging to text file
  • 🪄 Simple, readable validation results

📦 Installation

pip install pandasdv

(Make sure you have pandas and numpy installed.)


🧰 Basic Usage

from pandasdv import initial_setup, SR, MULTI, GRID, RANK_CHECK, OETEXT, NULL_CHECK, FLT_LIST, lst_no
## OR use below syntax
## from pandasdv import *

# Load SPSS file (.sav)
df = initial_setup("survey_data.sav")

# Validate a single-response question
SR(Rout='QFILTER', QVAR='Q1', RNG=[1, 2, 3, 4], LIST=['Q1'])
## OR Use below syntax
## SR(Rout='QFILTER', QVAR='Q1', RNG=lst_no(1,4), LIST=['Q1'])

# Validate a multi-response question
MULTI(Rout='QFILTER', QVAR=['Q2_1', 'Q2_2', 'Q2_3'], QEX=['Q2_99'])

🧾 Core Functions

initial_setup(input_file)

Reads .sav file and sets pandas display options.

output_setup(out_file='python_output.txt')

Writes validation output to a text file and prints to console.

FLT_LIST(COND, LIST)

Filters cases based on a logical condition and lists specified variables.


🧪 Validation Functions

  • SR — Single Response Validation
  • MULTI — Multiple Response Validation
  • GRID — Grid Validation
  • RANK_CHECK — Rank Order Validation
  • OETEXT — Open-ended Text Validation
  • NULL_CHECK — Null or Blank Validation

🧭 Example Workflow

from pandasdv import *

df = initial_setup("Consumer_Brand_Preference_Data_50.sav")

# Unique ID check
FLT_LIST(COND=df['RespID'].isna() | (df['RespID'] <= 0), LIST=['RespID'])
FLT_LIST(COND=df['RespID'].duplicated(keep=False), LIST=['RespID'])

# SR validation
SR(Rout='QFILTER', QVAR='Q1', RNG=[1, 2])

# Conditional SR
df['QFILTER'] = 0
df.loc[df['Q30'].between(2,5), 'QFILTER'] = 1
SR(Rout='QFILTER', QVAR='Q30a', RNG=lst_no(1,16)+[97], LIST=['Q30a','Q30'])

# Multi Response
MULTI(QVAR=['Q5_1', 'Q5_2', 'Q5_3'], QEX=['Q5_7'])

# Grid
GRID(QVAR=['Q56_1', 'Q56_2'], COD=[1,2,3,4,5])

# Rank check
RANK_CHECK(
    Rout='QFILTER',
    QVAR=[f'Q180_Orderr{i}' for i in range(1, 6)],
    MINR=1,
    MAXR=3
)

# OE Text
OETEXT(Rout='QFILTER', QVAR='Q8_oth', LIST=['Q8_97'])

# Output results
output_setup('validation_results.txt')

🛠️ Notes

  • Always set base filters (Rout) before validation for conditional questions.
  • Use lst_no(min, max) to avoid manually writing long code lists.
  • FLT_LIST is useful for quick debugging of any custom conditions.
  • The first column in the dataset is assumed to be the respondent ID.
  • Refer below github repository for sample files and and synatx files
  • https://github.com/ChandraCherupally/pandasdv

🧑‍💻 Contributing

  1. Fork the repository
  2. Create a new branch (feature/my-feature)
  3. Commit your changes
  4. Open a Pull Request

🙌 Acknowledgements

  • Built on top of pandas
  • Inspired by real-world survey data quality validation workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandasdv-0.1.3.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandasdv-0.1.3-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file pandasdv-0.1.3.tar.gz.

File metadata

  • Download URL: pandasdv-0.1.3.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for pandasdv-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d118dacff5326f554e3a1a9377e60386acf09788564fa8a47282b112cdba48f3
MD5 2085ea6711e808aa06c5282dff2de0b0
BLAKE2b-256 a12e9d4214cd4cd64d83f89f176806de985c90ba5ed1b26fbf93e95b41e8f6cd

See more details on using hashes here.

File details

Details for the file pandasdv-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pandasdv-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for pandasdv-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9c2ca89a81a77411e5b3350863b5c21c627120f2818869972187b27fb2eea862
MD5 7e783d3385cc2afb4666c219f832032c
BLAKE2b-256 640f80e9becf788a78a798c4b92e0c7b46c17d137bf894f4253d7f4c62feb18f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page