Skip to main content

Survey data pipeline: rawdata.csv + metadata.json + datatable.xlsx with sig testing

Project description

surveyflow

A Python library for processing survey data — parse survey definitions and responses into structured outputs ready for analysis.

Features

  • Parse survey definition (question structure, types, positions) into metadata.json
  • Parse survey response rows into rawdata.csv with numeric codes
    • Single-choice → integer code (e.g. 1)
    • Multi-choice / ranking → semicolon-separated codes (e.g. "1;3;5")
    • Open-ended / matrix / number → raw text
  • Filter responses by status (default: approved only)
  • Consistent columns between rawdata.csv and metadata.json

Installation

pip install surveyflow

Quick Start

from surveyflow.steps.ingestion import IngestionStep

# definition: dict from your survey platform's definition API
# rows_pages: list of paginated response pages from your survey platform

step = IngestionStep()
context = step.run({
    "definition":  definition,
    "rows_pages":  rows_pages,
    "output_dir":  "./output",
})

df       = context["rawdata"]      # pandas DataFrame
metadata = context["metadata"]     # dict with question info + value labels

Output

rawdata.csv

task_id date_time q6 q7 q10 q18
task_001 2026-03-01 1 2 1 1;3
task_002 2026-03-01 2 1 2 2

metadata.json

{
  "survey_id": 12345,
  "questions": {
    "q6": {
      "position": 6,
      "english_question": "Please provide your current address",
      "answer_type": "singlechoice",
      "values": { "1": "Ward 1", "2": "Ward 2", "3": "Ward 3" }
    },
    "q18": {
      "position": 18,
      "english_question": "Who do you live with",
      "answer_type": "multiplechoice",
      "values": { "1": "Spouse", "2": "Parents", "3": "Children" }
    }
  }
}

Input Format

definition

{
    "survey": { "survey_id": 12345, "title": "...", ... },
    "questions": [
        {
            "question_id": 1001,
            "position": 6,
            "question": "...",
            "english_question": "Please provide your current address",
            "type": 2,        # 2=singlechoice, 3=multiplechoice, 6=ranking, 4=matrix, ...
            "input_type": 0,
            "mandatory": True,
            "status": 1
        },
        ...
    ]
}

rows_pages

[
    {   # page 1
        "rows": [
            {
                "task_id": "task_001",
                "date_time": "2026-03-01 09:00:00",
                "profile_status": "approved",
                "questions": [
                    { "type": "singlechoice", "question": "Please provide your current address", "answer": "Ward 1" },
                    { "type": "multiplechoice", "question": "Who do you live with",
                      "answer": [{"answer_name": "Spouse"}, {"answer_name": "Children"}] },
                    ...
                ]
            },
            ...
        ]
    },
    # page 2, page 3, ...
]

Answer Types

type value answer_type Encoded in rawdata?
2 singlechoice Yes → int
3 multiplechoice Yes → "1;3;5"
6 ranking Yes → "2;1;3"
4 matrix No → "row:col|row:col"
1 + input_type=100 multiplenumber No → "label:num|label:num"
1 freetext No → raw text
1 + input_type=3 singlenumber No → raw number
1109 area No → raw text

Excluded from output: audio, user-name, user-phone, instruction, reward.

Profile Status Filter

# Default: approved only
step.run({ ..., "profile_status": ["approved"] })

# Include all statuses
step.run({ ..., "profile_status": [] })

# Custom filter
step.run({ ..., "profile_status": ["approved", "pending"] })

Requirements

  • Python >= 3.10
  • pandas >= 2.0
  • openpyxl >= 3.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surveyflow-0.3.4.tar.gz (324.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

surveyflow-0.3.4-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file surveyflow-0.3.4.tar.gz.

File metadata

  • Download URL: surveyflow-0.3.4.tar.gz
  • Upload date:
  • Size: 324.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.3.4.tar.gz
Algorithm Hash digest
SHA256 2a4f77faf118af17f132da0db2cfd43c45be540920b9dab0da08a61b4b0670ff
MD5 813ddff40a1c91a72d3e6a45a50661b8
BLAKE2b-256 6b3546947096e3736d4107821090c8fefb3a66e94d504cbc1f86f4fac0397249

See more details on using hashes here.

File details

Details for the file surveyflow-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: surveyflow-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 82e18797fd3d1a500e6d882cc77ddd2b22763bac909424bf27d5c64e9b2cc951
MD5 b581c1b0825e3610652ec30a2025b9ae
BLAKE2b-256 15539282fa918f570ea1b1412c31176918110ef30a5d3a4d081fab7780c56c6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page