Skip to main content

A simple survey data validation package using pandas.

Project description

🧾 pandasdv — Pandas Data Validator for Survey Datasets

pandasdv is a lightweight Python library designed to validate survey and structured datasets (e.g., SPSS .sav files) with pandas.
It provides ready-to-use validation functions for common survey question types such as Single Response, Multiple Response, Grid, Ranking, and Open-Ended checks.


🚀 Features

  • ✅ Easy integration with pandas
  • 📊 Supports validation of .sav files directly
  • 🧠 Ready-to-use functions for survey logic validation:
    • SR — Single Response Validation
    • MULTI — Multiple Response Validation
    • GRID — Grid & Conditional Validation
    • RANK_CHECK — Rank Order Validation
    • OETEXT — Open-ended Text Validation
    • NULL_CHECK — Null or Blank Check
  • 🧾 Automatic output logging to text file
  • 🪄 Simple, readable validation results

📦 Installation

pip install pandasdv

(Make sure you have pandas and numpy installed.)


🧰 Basic Usage

from pandasdv import initial_setup, SR, MULTI, GRID, RANK_CHECK, OETEXT, NULL_CHECK, FLT_LIST, lst_no
## OR use below syntax
## from pandasdv import *

# Load SPSS file (.sav)
df = initial_setup("survey_data.sav")

# Validate a single-response question
SR(Rout='QFILTER', QVAR='Q1', RNG=[1, 2, 3, 4], LIST=['Q1'])
## OR Use below syntax
## SR(Rout='QFILTER', QVAR='Q1', RNG=lst_no(1,4), LIST=['Q1'])

# Validate a multi-response question
MULTI(Rout='QFILTER', QVAR=['Q2_1', 'Q2_2', 'Q2_3'], QEX=['Q2_99'])

# Output results
output_setup('validation_results.txt')

🧾 Core Functions

initial_setup(input_file)

Reads .sav file and sets pandas display options.

output_setup(out_file='python_output.txt')

Writes validation output to a text file and prints to console.

FLT_LIST(COND, LIST)

Filters cases based on a logical condition and lists specified variables.


🧪 Validation Functions

  • SR — Single Response Validation
  • MULTI — Multiple Response Validation
  • GRID — Grid Validation
  • RANK_CHECK — Rank Order Validation
  • OETEXT — Open-ended Text Validation
  • NULL_CHECK — Null or Blank Validation

🧭 Example Workflow

from pandasdv import *

df = initial_setup("Consumer_Brand_Preference_Data_50.sav")

# Unique ID check
FLT_LIST(COND=df['RespID'].isna() | (df['RespID'] <= 0), LIST=['RespID'])
FLT_LIST(COND=df['RespID'].duplicated(keep=False), LIST=['RespID'])

# SR validation
SR(Rout='QFILTER', QVAR='Q1', RNG=[1, 2])

# Conditional SR
df['QFILTER'] = 0
df.loc[df['Q30'].between(2,5), 'QFILTER'] = 1
SR(Rout='QFILTER', QVAR='Q30a', RNG=lst_no(1,16)+[97], LIST=['Q30a','Q30'])

# Multi Response
MULTI(QVAR=['Q5_1', 'Q5_2', 'Q5_3'], QEX=['Q5_7'])

# Grid
GRID(QVAR=['Q56_1', 'Q56_2'], COD=[1,2,3,4,5])

# Rank check
RANK_CHECK(
    Rout='QFILTER',
    QVAR=[f'Q180_Orderr{i}' for i in range(1, 6)],
    MINR=1,
    MAXR=3
)

# OE Text
OETEXT(Rout='QFILTER', QVAR='Q8_oth', LIST=['Q8_97'])

# Output results
output_setup('validation_results.txt')

🛠️ Notes

  • Always set base filters (Rout) before validation for conditional questions.
  • Use lst_no(min, max) to avoid manually writing long code lists.
  • FLT_LIST is useful for quick debugging of any custom conditions.
  • The first column in the dataset is assumed to be the respondent ID.
  • Refer below github repository for sample files and and synatx files
  • https://github.com/ChandraCherupally/pandasdv

🧑‍💻 Contributing

  1. Fork the repository
  2. Create a new branch (feature/my-feature)
  3. Commit your changes
  4. Open a Pull Request

🙌 Acknowledgements

  • Built on top of pandas
  • Inspired by real-world survey data quality validation workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandasdv-0.1.5.tar.gz (42.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandasdv-0.1.5-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file pandasdv-0.1.5.tar.gz.

File metadata

  • Download URL: pandasdv-0.1.5.tar.gz
  • Upload date:
  • Size: 42.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pandasdv-0.1.5.tar.gz
Algorithm Hash digest
SHA256 436eeb90cb2b1610d082ac2af303945f4a23c5fb54af53c069a46cdcfad98ac7
MD5 df1d8c52799eabd9a2b511d42ef60bcd
BLAKE2b-256 5af55223d51fe9bb44b0b35672a753d861a7dfe693185443e47e52ff94e5864a

See more details on using hashes here.

File details

Details for the file pandasdv-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: pandasdv-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pandasdv-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e690a960d28589b790f1370da2d6411cf79f92682ba100c692bed77de2c72a9d
MD5 cd87d2749cf4d396c19c69a404a04822
BLAKE2b-256 e9590aad186e9dcdf092828253102880407d9c2a4be9be63751b1a269a76b831

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page