Skip to main content

Survey data pipeline: parse survey definitions and responses into rawdata.csv and metadata.json

Project description

surveyflow

A Python library for processing survey data — parse survey definitions and responses into structured outputs ready for analysis.

Features

  • Parse survey definition (question structure, types, positions) into metadata.json
  • Parse survey response rows into rawdata.csv with numeric codes
    • Single-choice → integer code (e.g. 1)
    • Multi-choice / ranking → semicolon-separated codes (e.g. "1;3;5")
    • Open-ended / matrix / number → raw text
  • Filter responses by status (default: approved only)
  • Consistent columns between rawdata.csv and metadata.json

Installation

pip install surveyflow

Quick Start

from surveyflow.steps.ingestion import IngestionStep

# definition: dict from your survey platform's definition API
# rows_pages: list of paginated response pages from your survey platform

step = IngestionStep()
context = step.run({
    "definition":  definition,
    "rows_pages":  rows_pages,
    "output_dir":  "./output",
})

df       = context["rawdata"]      # pandas DataFrame
metadata = context["metadata"]     # dict with question info + value labels

Output

rawdata.csv

task_id date_time q6 q7 q10 q18
task_001 2026-03-01 1 2 1 1;3
task_002 2026-03-01 2 1 2 2

metadata.json

{
  "survey_id": 12345,
  "questions": {
    "q6": {
      "position": 6,
      "english_question": "Please provide your current address",
      "answer_type": "singlechoice",
      "values": { "1": "Ward 1", "2": "Ward 2", "3": "Ward 3" }
    },
    "q18": {
      "position": 18,
      "english_question": "Who do you live with",
      "answer_type": "multiplechoice",
      "values": { "1": "Spouse", "2": "Parents", "3": "Children" }
    }
  }
}

Input Format

definition

{
    "survey": { "survey_id": 12345, "title": "...", ... },
    "questions": [
        {
            "question_id": 1001,
            "position": 6,
            "question": "...",
            "english_question": "Please provide your current address",
            "type": 2,        # 2=singlechoice, 3=multiplechoice, 6=ranking, 4=matrix, ...
            "input_type": 0,
            "mandatory": True,
            "status": 1
        },
        ...
    ]
}

rows_pages

[
    {   # page 1
        "rows": [
            {
                "task_id": "task_001",
                "date_time": "2026-03-01 09:00:00",
                "profile_status": "approved",
                "questions": [
                    { "type": "singlechoice", "question": "Please provide your current address", "answer": "Ward 1" },
                    { "type": "multiplechoice", "question": "Who do you live with",
                      "answer": [{"answer_name": "Spouse"}, {"answer_name": "Children"}] },
                    ...
                ]
            },
            ...
        ]
    },
    # page 2, page 3, ...
]

Answer Types

type value answer_type Encoded in rawdata?
2 singlechoice Yes → int
3 multiplechoice Yes → "1;3;5"
6 ranking Yes → "2;1;3"
4 matrix No → "row:col|row:col"
1 + input_type=100 multiplenumber No → "label:num|label:num"
1 freetext No → raw text
1 + input_type=3 singlenumber No → raw number
1109 area No → raw text

Excluded from output: audio, user-name, user-phone, instruction, reward.

Profile Status Filter

# Default: approved only
step.run({ ..., "profile_status": ["approved"] })

# Include all statuses
step.run({ ..., "profile_status": [] })

# Custom filter
step.run({ ..., "profile_status": ["approved", "pending"] })

Requirements

  • Python >= 3.10
  • pandas >= 2.0
  • openpyxl >= 3.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surveyflow-0.1.0.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

surveyflow-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file surveyflow-0.1.0.tar.gz.

File metadata

  • Download URL: surveyflow-0.1.0.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 92b8c35e5b028c0b0a10f6b880489777607987a6ed298507ff912560c6ae457c
MD5 2b8e87b28843c646979523a392b7c973
BLAKE2b-256 a9b6c178cc737bea8969bf5620faa340e57b1d59bdac5cb81657cb89f9f37eea

See more details on using hashes here.

File details

Details for the file surveyflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: surveyflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for surveyflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7c413c8730e6b1c8a20068ca239897ba81106a8aeb16f5fdaaed1581ee109da
MD5 6360ce2a04a6a24903378de5cc62f481
BLAKE2b-256 c030e574866282c3be8f6b668d48d1fa371dcb82e9c0a7114568257f309cfd42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page