Skip to main content

Survey data pipeline: rawdata.csv + metadata.json + datatable.xlsx with sig testing

Project description

surveyflow

A Python library for processing survey data — parse survey definitions and responses into structured outputs ready for analysis.

Features

  • Parse survey definition (question structure, types, positions) into metadata.json
  • Parse survey response rows into rawdata.csv with numeric codes
    • Single-choice → integer code (e.g. 1)
    • Multi-choice / ranking → semicolon-separated codes (e.g. "1;3;5")
    • Open-ended / matrix / number → raw text
  • Filter responses by status (default: approved only)
  • Consistent columns between rawdata.csv and metadata.json

Installation

pip install surveyflow

Quick Start

from surveyflow.steps.ingestion import IngestionStep

# definition: dict from your survey platform's definition API
# rows_pages: list of paginated response pages from your survey platform

step = IngestionStep()
context = step.run({
    "definition":  definition,
    "rows_pages":  rows_pages,
    "output_dir":  "./output",
})

df       = context["rawdata"]      # pandas DataFrame
metadata = context["metadata"]     # dict with question info + value labels

Output

rawdata.csv

task_id date_time q6 q7 q10 q18
task_001 2026-03-01 1 2 1 1;3
task_002 2026-03-01 2 1 2 2

metadata.json

{
  "survey_id": 12345,
  "questions": {
    "q6": {
      "position": 6,
      "english_question": "Please provide your current address",
      "answer_type": "singlechoice",
      "values": { "1": "Ward 1", "2": "Ward 2", "3": "Ward 3" }
    },
    "q18": {
      "position": 18,
      "english_question": "Who do you live with",
      "answer_type": "multiplechoice",
      "values": { "1": "Spouse", "2": "Parents", "3": "Children" }
    }
  }
}

Input Format

definition

{
    "survey": { "survey_id": 12345, "title": "...", ... },
    "questions": [
        {
            "question_id": 1001,
            "position": 6,
            "question": "...",
            "english_question": "Please provide your current address",
            "type": 2,        # 2=singlechoice, 3=multiplechoice, 6=ranking, 4=matrix, ...
            "input_type": 0,
            "mandatory": True,
            "status": 1
        },
        ...
    ]
}

rows_pages

[
    {   # page 1
        "rows": [
            {
                "task_id": "task_001",
                "date_time": "2026-03-01 09:00:00",
                "profile_status": "approved",
                "questions": [
                    { "type": "singlechoice", "question": "Please provide your current address", "answer": "Ward 1" },
                    { "type": "multiplechoice", "question": "Who do you live with",
                      "answer": [{"answer_name": "Spouse"}, {"answer_name": "Children"}] },
                    ...
                ]
            },
            ...
        ]
    },
    # page 2, page 3, ...
]

Answer Types

type value answer_type Encoded in rawdata?
2 singlechoice Yes → int
3 multiplechoice Yes → "1;3;5"
6 ranking Yes → "2;1;3"
4 matrix No → "row:col|row:col"
1 + input_type=100 multiplenumber No → "label:num|label:num"
1 freetext No → raw text
1 + input_type=3 singlenumber No → raw number
1109 area No → raw text

Excluded from output: audio, user-name, user-phone, instruction, reward.

Profile Status Filter

# Default: approved only
step.run({ ..., "profile_status": ["approved"] })

# Include all statuses
step.run({ ..., "profile_status": [] })

# Custom filter
step.run({ ..., "profile_status": ["approved", "pending"] })

Requirements

  • Python >= 3.10
  • pandas >= 2.0
  • openpyxl >= 3.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surveyflow-0.4.16.tar.gz (81.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

surveyflow-0.4.16-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file surveyflow-0.4.16.tar.gz.

File metadata

  • Download URL: surveyflow-0.4.16.tar.gz
  • Upload date:
  • Size: 81.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.4.16.tar.gz
Algorithm Hash digest
SHA256 19718ddf8176156e773d168878c50617cb6e4cd577007e485181e8d048a37697
MD5 5013cb4ea6d5fbbffdd4055756422abc
BLAKE2b-256 3f48bf613dee8db7a525d2bc42ccb42b05c6f14e64499254b99b1a2919815737

See more details on using hashes here.

File details

Details for the file surveyflow-0.4.16-py3-none-any.whl.

File metadata

  • Download URL: surveyflow-0.4.16-py3-none-any.whl
  • Upload date:
  • Size: 65.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.4.16-py3-none-any.whl
Algorithm Hash digest
SHA256 0e2769126871396978dbde3ff339940d6b09a5957e2a23681fe1733f12dca155
MD5 09c5bd512f064a2926c69de7cd06b233
BLAKE2b-256 ee596251da319cb1b32ad666630ee26e5664f342810d70e63b071f42a324b9c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page