Survey data pipeline: rawdata.csv + metadata.json + datatable.xlsx with sig testing
Project description
surveyflow
A Python library for processing survey data — parse survey definitions and responses into structured outputs ready for analysis.
Features
- Parse survey definition (question structure, types, positions) into
metadata.json - Parse survey response rows into
rawdata.csvwith numeric codes- Single-choice → integer code (e.g.
1) - Multi-choice / ranking → semicolon-separated codes (e.g.
"1;3;5") - Open-ended / matrix / number → raw text
- Single-choice → integer code (e.g.
- Filter responses by status (default:
approvedonly) - Consistent columns between
rawdata.csvandmetadata.json
Installation
pip install surveyflow
Quick Start
from surveyflow.steps.ingestion import IngestionStep
# definition: dict from your survey platform's definition API
# rows_pages: list of paginated response pages from your survey platform
step = IngestionStep()
context = step.run({
"definition": definition,
"rows_pages": rows_pages,
"output_dir": "./output",
})
df = context["rawdata"] # pandas DataFrame
metadata = context["metadata"] # dict with question info + value labels
Output
rawdata.csv
| task_id | date_time | q6 | q7 | q10 | q18 |
|---|---|---|---|---|---|
| task_001 | 2026-03-01 | 1 | 2 | 1 | 1;3 |
| task_002 | 2026-03-01 | 2 | 1 | 2 | 2 |
metadata.json
{
"survey_id": 12345,
"questions": {
"q6": {
"position": 6,
"english_question": "Please provide your current address",
"answer_type": "singlechoice",
"values": { "1": "Ward 1", "2": "Ward 2", "3": "Ward 3" }
},
"q18": {
"position": 18,
"english_question": "Who do you live with",
"answer_type": "multiplechoice",
"values": { "1": "Spouse", "2": "Parents", "3": "Children" }
}
}
}
Input Format
definition
{
"survey": { "survey_id": 12345, "title": "...", ... },
"questions": [
{
"question_id": 1001,
"position": 6,
"question": "...",
"english_question": "Please provide your current address",
"type": 2, # 2=singlechoice, 3=multiplechoice, 6=ranking, 4=matrix, ...
"input_type": 0,
"mandatory": True,
"status": 1
},
...
]
}
rows_pages
[
{ # page 1
"rows": [
{
"task_id": "task_001",
"date_time": "2026-03-01 09:00:00",
"profile_status": "approved",
"questions": [
{ "type": "singlechoice", "question": "Please provide your current address", "answer": "Ward 1" },
{ "type": "multiplechoice", "question": "Who do you live with",
"answer": [{"answer_name": "Spouse"}, {"answer_name": "Children"}] },
...
]
},
...
]
},
# page 2, page 3, ...
]
Answer Types
type value |
answer_type |
Encoded in rawdata? |
|---|---|---|
| 2 | singlechoice |
Yes → int |
| 3 | multiplechoice |
Yes → "1;3;5" |
| 6 | ranking |
Yes → "2;1;3" |
| 4 | matrix |
No → "row:col|row:col" |
| 1 + input_type=100 | multiplenumber |
No → "label:num|label:num" |
| 1 | freetext |
No → raw text |
| 1 + input_type=3 | singlenumber |
No → raw number |
| 1109 | area |
No → raw text |
Excluded from output: audio, user-name, user-phone, instruction, reward.
Profile Status Filter
# Default: approved only
step.run({ ..., "profile_status": ["approved"] })
# Include all statuses
step.run({ ..., "profile_status": [] })
# Custom filter
step.run({ ..., "profile_status": ["approved", "pending"] })
Requirements
- Python >= 3.10
- pandas >= 2.0
- openpyxl >= 3.1
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
surveyflow-0.3.1.tar.gz
(331.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file surveyflow-0.3.1.tar.gz.
File metadata
- Download URL: surveyflow-0.3.1.tar.gz
- Upload date:
- Size: 331.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd742b5fd6610d6a2e1ff1d2e6f4d28c2ffae1a318d3f0fa0c302a57807f41d0
|
|
| MD5 |
03c4683e3882a4bd61063c32f7f4371b
|
|
| BLAKE2b-256 |
c50dd8db1b15075099f4a72baa4f445211d80d44bb964b276bbe6b6254421829
|
File details
Details for the file surveyflow-0.3.1-py3-none-any.whl.
File metadata
- Download URL: surveyflow-0.3.1-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8ebde9f03a11e477217d260e43e71dc759b9f2084a418da881a4e75c280a12d
|
|
| MD5 |
08bec8a3bbfb95de3d9b8c3c5ec34ae1
|
|
| BLAKE2b-256 |
64d3912d61b9880297647bc1f73cd7ac8dfb5b46511be78ad6701ccc72836d8c
|