Survey data pipeline: parse survey definitions and responses into rawdata.csv and metadata.json
Project description
surveyflow
A Python library for processing survey data — parse survey definitions and responses into structured outputs ready for analysis.
Features
- Parse survey definition (question structure, types, positions) into
metadata.json - Parse survey response rows into
rawdata.csvwith numeric codes- Single-choice → integer code (e.g.
1) - Multi-choice / ranking → semicolon-separated codes (e.g.
"1;3;5") - Open-ended / matrix / number → raw text
- Single-choice → integer code (e.g.
- Filter responses by status (default:
approvedonly) - Consistent columns between
rawdata.csvandmetadata.json
Installation
pip install surveyflow
Quick Start
from surveyflow.steps.ingestion import IngestionStep
# definition: dict from your survey platform's definition API
# rows_pages: list of paginated response pages from your survey platform
step = IngestionStep()
context = step.run({
"definition": definition,
"rows_pages": rows_pages,
"output_dir": "./output",
})
df = context["rawdata"] # pandas DataFrame
metadata = context["metadata"] # dict with question info + value labels
Output
rawdata.csv
| task_id | date_time | q6 | q7 | q10 | q18 |
|---|---|---|---|---|---|
| task_001 | 2026-03-01 | 1 | 2 | 1 | 1;3 |
| task_002 | 2026-03-01 | 2 | 1 | 2 | 2 |
metadata.json
{
"survey_id": 12345,
"questions": {
"q6": {
"position": 6,
"english_question": "Please provide your current address",
"answer_type": "singlechoice",
"values": { "1": "Ward 1", "2": "Ward 2", "3": "Ward 3" }
},
"q18": {
"position": 18,
"english_question": "Who do you live with",
"answer_type": "multiplechoice",
"values": { "1": "Spouse", "2": "Parents", "3": "Children" }
}
}
}
Input Format
definition
{
"survey": { "survey_id": 12345, "title": "...", ... },
"questions": [
{
"question_id": 1001,
"position": 6,
"question": "...",
"english_question": "Please provide your current address",
"type": 2, # 2=singlechoice, 3=multiplechoice, 6=ranking, 4=matrix, ...
"input_type": 0,
"mandatory": True,
"status": 1
},
...
]
}
rows_pages
[
{ # page 1
"rows": [
{
"task_id": "task_001",
"date_time": "2026-03-01 09:00:00",
"profile_status": "approved",
"questions": [
{ "type": "singlechoice", "question": "Please provide your current address", "answer": "Ward 1" },
{ "type": "multiplechoice", "question": "Who do you live with",
"answer": [{"answer_name": "Spouse"}, {"answer_name": "Children"}] },
...
]
},
...
]
},
# page 2, page 3, ...
]
Answer Types
type value |
answer_type |
Encoded in rawdata? |
|---|---|---|
| 2 | singlechoice |
Yes → int |
| 3 | multiplechoice |
Yes → "1;3;5" |
| 6 | ranking |
Yes → "2;1;3" |
| 4 | matrix |
No → "row:col|row:col" |
| 1 + input_type=100 | multiplenumber |
No → "label:num|label:num" |
| 1 | freetext |
No → raw text |
| 1 + input_type=3 | singlenumber |
No → raw number |
| 1109 | area |
No → raw text |
Excluded from output: audio, user-name, user-phone, instruction, reward.
Profile Status Filter
# Default: approved only
step.run({ ..., "profile_status": ["approved"] })
# Include all statuses
step.run({ ..., "profile_status": [] })
# Custom filter
step.run({ ..., "profile_status": ["approved", "pending"] })
Requirements
- Python >= 3.10
- pandas >= 2.0
- openpyxl >= 3.1
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file surveyflow-0.1.0.tar.gz.
File metadata
- Download URL: surveyflow-0.1.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92b8c35e5b028c0b0a10f6b880489777607987a6ed298507ff912560c6ae457c
|
|
| MD5 |
2b8e87b28843c646979523a392b7c973
|
|
| BLAKE2b-256 |
a9b6c178cc737bea8969bf5620faa340e57b1d59bdac5cb81657cb89f9f37eea
|
File details
Details for the file surveyflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: surveyflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7c413c8730e6b1c8a20068ca239897ba81106a8aeb16f5fdaaed1581ee109da
|
|
| MD5 |
6360ce2a04a6a24903378de5cc62f481
|
|
| BLAKE2b-256 |
c030e574866282c3be8f6b668d48d1fa371dcb82e9c0a7114568257f309cfd42
|