Skip to main content

Survey data pipeline: rawdata.csv + metadata.json + datatable.xlsx with sig testing

Project description

surveyflow

A Python library for processing survey data — parse survey definitions and responses into structured outputs ready for analysis.

Features

  • Parse survey definition (question structure, types, positions) into metadata.json
  • Parse survey response rows into rawdata.csv with numeric codes
    • Single-choice → integer code (e.g. 1)
    • Multi-choice / ranking → semicolon-separated codes (e.g. "1;3;5")
    • Open-ended / matrix / number → raw text
  • Filter responses by status (default: approved only)
  • Consistent columns between rawdata.csv and metadata.json

Installation

pip install surveyflow

Quick Start

from surveyflow.steps.ingestion import IngestionStep

# definition: dict from your survey platform's definition API
# rows_pages: list of paginated response pages from your survey platform

step = IngestionStep()
context = step.run({
    "definition":  definition,
    "rows_pages":  rows_pages,
    "output_dir":  "./output",
})

df       = context["rawdata"]      # pandas DataFrame
metadata = context["metadata"]     # dict with question info + value labels

Output

rawdata.csv

task_id date_time q6 q7 q10 q18
task_001 2026-03-01 1 2 1 1;3
task_002 2026-03-01 2 1 2 2

metadata.json

{
  "survey_id": 12345,
  "questions": {
    "q6": {
      "position": 6,
      "english_question": "Please provide your current address",
      "answer_type": "singlechoice",
      "values": { "1": "Ward 1", "2": "Ward 2", "3": "Ward 3" }
    },
    "q18": {
      "position": 18,
      "english_question": "Who do you live with",
      "answer_type": "multiplechoice",
      "values": { "1": "Spouse", "2": "Parents", "3": "Children" }
    }
  }
}

Input Format

definition

{
    "survey": { "survey_id": 12345, "title": "...", ... },
    "questions": [
        {
            "question_id": 1001,
            "position": 6,
            "question": "...",
            "english_question": "Please provide your current address",
            "type": 2,        # 2=singlechoice, 3=multiplechoice, 6=ranking, 4=matrix, ...
            "input_type": 0,
            "mandatory": True,
            "status": 1
        },
        ...
    ]
}

rows_pages

[
    {   # page 1
        "rows": [
            {
                "task_id": "task_001",
                "date_time": "2026-03-01 09:00:00",
                "profile_status": "approved",
                "questions": [
                    { "type": "singlechoice", "question": "Please provide your current address", "answer": "Ward 1" },
                    { "type": "multiplechoice", "question": "Who do you live with",
                      "answer": [{"answer_name": "Spouse"}, {"answer_name": "Children"}] },
                    ...
                ]
            },
            ...
        ]
    },
    # page 2, page 3, ...
]

Answer Types

type value answer_type Encoded in rawdata?
2 singlechoice Yes → int
3 multiplechoice Yes → "1;3;5"
6 ranking Yes → "2;1;3"
4 matrix No → "row:col|row:col"
1 + input_type=100 multiplenumber No → "label:num|label:num"
1 freetext No → raw text
1 + input_type=3 singlenumber No → raw number
1109 area No → raw text

Excluded from output: audio, user-name, user-phone, instruction, reward.

Profile Status Filter

# Default: approved only
step.run({ ..., "profile_status": ["approved"] })

# Include all statuses
step.run({ ..., "profile_status": [] })

# Custom filter
step.run({ ..., "profile_status": ["approved", "pending"] })

Requirements

  • Python >= 3.10
  • pandas >= 2.0
  • openpyxl >= 3.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surveyflow-0.3.0.tar.gz (330.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

surveyflow-0.3.0-py3-none-any.whl (28.8 kB view details)

Uploaded Python 3

File details

Details for the file surveyflow-0.3.0.tar.gz.

File metadata

  • Download URL: surveyflow-0.3.0.tar.gz
  • Upload date:
  • Size: 330.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bdebd74cbe597c0c21fc5693eb6a364a9650f125e8d599fcc350e794576c7b6c
MD5 c1f3887257f2129a669082a405501e01
BLAKE2b-256 bac1a4dc32ef6b5c2b16a24073c566da04c3fec7f6c67d41672f3928091316a1

See more details on using hashes here.

File details

Details for the file surveyflow-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: surveyflow-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 28.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f52836f607bd73f5bb2f214b3943690e688ca479a309fcc51e2c38cc4a0b9225
MD5 bc3dadb10b3f50623b4dce6a5ef13bda
BLAKE2b-256 cee395a890b98a08ea52592b96be96b4fdf93ec0b48ff32d8ea0766d8f13b2c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page