Skip to main content

Survey data pipeline: rawdata.csv + metadata.json + datatable.xlsx with sig testing

Project description

surveyflow

A Python library for processing survey data — parse survey definitions and responses into structured outputs ready for analysis.

Features

  • Parse survey definition (question structure, types, positions) into metadata.json
  • Parse survey response rows into rawdata.csv with numeric codes
    • Single-choice → integer code (e.g. 1)
    • Multi-choice / ranking → semicolon-separated codes (e.g. "1;3;5")
    • Open-ended / matrix / number → raw text
  • Filter responses by status (default: approved only)
  • Consistent columns between rawdata.csv and metadata.json

Installation

pip install surveyflow

Quick Start

from surveyflow.steps.ingestion import IngestionStep

# definition: dict from your survey platform's definition API
# rows_pages: list of paginated response pages from your survey platform

step = IngestionStep()
context = step.run({
    "definition":  definition,
    "rows_pages":  rows_pages,
    "output_dir":  "./output",
})

df       = context["rawdata"]      # pandas DataFrame
metadata = context["metadata"]     # dict with question info + value labels

Output

rawdata.csv

task_id date_time q6 q7 q10 q18
task_001 2026-03-01 1 2 1 1;3
task_002 2026-03-01 2 1 2 2

metadata.json

{
  "survey_id": 12345,
  "questions": {
    "q6": {
      "position": 6,
      "english_question": "Please provide your current address",
      "answer_type": "singlechoice",
      "values": { "1": "Ward 1", "2": "Ward 2", "3": "Ward 3" }
    },
    "q18": {
      "position": 18,
      "english_question": "Who do you live with",
      "answer_type": "multiplechoice",
      "values": { "1": "Spouse", "2": "Parents", "3": "Children" }
    }
  }
}

Input Format

definition

{
    "survey": { "survey_id": 12345, "title": "...", ... },
    "questions": [
        {
            "question_id": 1001,
            "position": 6,
            "question": "...",
            "english_question": "Please provide your current address",
            "type": 2,        # 2=singlechoice, 3=multiplechoice, 6=ranking, 4=matrix, ...
            "input_type": 0,
            "mandatory": True,
            "status": 1
        },
        ...
    ]
}

rows_pages

[
    {   # page 1
        "rows": [
            {
                "task_id": "task_001",
                "date_time": "2026-03-01 09:00:00",
                "profile_status": "approved",
                "questions": [
                    { "type": "singlechoice", "question": "Please provide your current address", "answer": "Ward 1" },
                    { "type": "multiplechoice", "question": "Who do you live with",
                      "answer": [{"answer_name": "Spouse"}, {"answer_name": "Children"}] },
                    ...
                ]
            },
            ...
        ]
    },
    # page 2, page 3, ...
]

Answer Types

type value answer_type Encoded in rawdata?
2 singlechoice Yes → int
3 multiplechoice Yes → "1;3;5"
6 ranking Yes → "2;1;3"
4 matrix No → "row:col|row:col"
1 + input_type=100 multiplenumber No → "label:num|label:num"
1 freetext No → raw text
1 + input_type=3 singlenumber No → raw number
1109 area No → raw text

Excluded from output: audio, user-name, user-phone, instruction, reward.

Profile Status Filter

# Default: approved only
step.run({ ..., "profile_status": ["approved"] })

# Include all statuses
step.run({ ..., "profile_status": [] })

# Custom filter
step.run({ ..., "profile_status": ["approved", "pending"] })

Requirements

  • Python >= 3.10
  • pandas >= 2.0
  • openpyxl >= 3.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surveyflow-0.4.5.tar.gz (346.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

surveyflow-0.4.5-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file surveyflow-0.4.5.tar.gz.

File metadata

  • Download URL: surveyflow-0.4.5.tar.gz
  • Upload date:
  • Size: 346.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.4.5.tar.gz
Algorithm Hash digest
SHA256 f89fd50c657adc113a52bf9aa6643b7737e8ca58d88c02fa80077ebda0396ff1
MD5 31ee739ea44f16e689d5e0e459b104d4
BLAKE2b-256 78402e3779fdc742673d7e57def61c37c3f07f209f53956fb68b99cfc6920e20

See more details on using hashes here.

File details

Details for the file surveyflow-0.4.5-py3-none-any.whl.

File metadata

  • Download URL: surveyflow-0.4.5-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cbed597b7359ebe75d77124c72df269026734f2fe25a66c846b05177730bec0e
MD5 09f6c8a8af432d336902ac2e1c808196
BLAKE2b-256 831af5c1f6e6f48b9fe9a25f4cea17e2a56793f6b175353ea3eb6ee0899eec85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page