The HIFIS Surveyval Framework is a tool to analyse survey data from LimeSurvey data exports.
Project description
HIFIS Surveyval Framework
What Is the Framework Intended For?
The HIFIS Surveyval Framework is a tool to analyse data from LimeSurvey data exports. It is meant to fill the gap between LimeSurvey and Python data analysis and visualisation libraries such as Numpy, Pandas and Matplotlib. It is not meant to replace them. We aim at making life easier for people who are about to conduct LimeSurvey surveys and who want to analyse the resulting survey data by providing a tested and working set of features, so that you need not reinvent the wheel for your data analysis. We recommend getting familiar with the framework before the survey is planned and conducted to make sure that your use case can be covered by the framework.
What Do You Need to Provide Yourself?
Metadata of Your Survey
The metadata that is describing your survey such as the list of questions and answer options including their translations that are not included in the LimeSurvey data export file are given in a metadata YAML file that you need to provide by yourself. Please have a look into the example metadata YAML file given in this project to get an idea how this file is structured and which information need to be put there.
Pre-processing Steps on Raw Data
Before the analysis is started, the raw data is pre-processed by a pre-processing script that you need to write by yourself as well if data cleaning and other pre-processing steps need to be done for your survey data beforehand. In order to get an impression how the pre-processing script can look like please see the example pre-processing scripts provided in this project to get you started.
Analysis Scripts for Your Survey Data Analysis
Under the hood the architecture of the framework is executing a series of independent data analysis scripts written by yourself in an analysis run. As a result, output will be generated in form of
- terminal output on the CLI,
- plots in different formats and
- text files.
There are also example analysis scripts available in this project that you can take a look at to understand how these analysis scripts make use of the framework.
Your Use Cases Are Not Covered?
As the name of the framework suggests it has been developed to aid in the analysis of HIFIS surveys conducted for the Helmholtz Federated IT Services (HIFIS) platform. Please be aware that as we try to develop a generic framework that can be used in many use cases, it might still be the case that the implemented features arose from our use cases and are not applicable in your case. If this is the case, please get in touch with us, so we can make things happen for you. You are also most welcome to contribute to the project with issues, bug fixes, feature implementations, documentations and testing. Thank you very much in advance for your valuable contributions!
Table of Content
- Installation
- Getting Started
- Start Analysis from Command-Line-Interface
- Contribute with Own Analysis Scripts
- Resources
- Author Information
- Contributors
- License
Getting Started
The project's documentation contains a section to help you as a user of the analysis scripts to run the analysis scripts or as a developer of the framework to set up the development environment.
Getting Started for Users
Installation
To install the package locally, you can use Pip.
pip install hifis-surveyval
After the installation, you can use the tool from the command line with
hifis-surveyval --help
.
Getting Started for Developers
Installation
To install the package locally, you can use Poetry.
Using Poetry
If you want to actively contribute changes to the project, you are required to also install the development packages alongside the framework.
git clone https://gitlab.hzdr.de/hifis/overall/surveys/hifis-surveyval.git
cd hifis-surveyval
poetry install
After the installation, you can use the tool from the command line with
poetry run hifis-surveyval --help
Poetry installs some packages that are required for performing quality checks. Usually they are also performed via GitLab CI, but can also be executed locally.
It is common practice to run some checks locally before pushing them online. Therefore, execute below commands:
$ # Order your imports
$ isort -rc .
$ make lint
The following documentation references the pip installation.
You can use the same commands with a poetry installation, if you prefix your
commands with poetry run COMMAND
.
Start Analysis from Command-Line-Interface
The survey analysis package is a program to be executed on the Command-Line-Interface (CLI).
Quick Start Example: Run Analysis
Due to sensible defaults of the project's configurations you need to have the
analysis scripts, the preprocessing script as well as metadata and data files
in certain locations in order to run the survey analysis.
This configuration file hifis-surveyval.yml which includes these defaults is
created with the command hifis-surveyval init
.
Please put your analysis scripts into a sub-folder called scripts.
The preprocessing script preprocess.py is expected in the root folder of the
project.
Make sure that the file meta.yml is put into sub-folder metadata.
Finally, copy the CSV data file of your survey to a central location like a
data sub-folder and tell the program the path to that data file on the
command line when running the survey analysis.
Now you can do the following to start the survey analysis from the CLI:
hifis-surveyval analyze data/<data_file_name>.csv
The output is then put into a sub-folder within the folder output which is named after the stamp of the current date-time if not specified differently.
Caution:
Depending on the Operating System used an issue with the file
encoding might occur.
There might be data-CSV-files around which are encoded with UTF-8-BOM
which causes errors when read in on Windows OS.
In this case you need to change the encoding to UTF-8
before running
the survey analysis.
Flags
The program accepts two flags:
- Help flag
- Verbosity flag
Help flag
Calling the program with the help-flag is the first thing to do when being encountered with this program. It outputs a so-called Usage-message to the CLI:
$ hifis-surveyval --help
Please issue this command on the CLI and read the detailed Usage-message before continuing with reading the documentation of the Usage-message here.
Verbosity flag
The verbosity-flag can be provided in order to specify the verbosity
of the output to the CLI.
This flag is called --verbose
or -v
for short:
hifis-surveyval --verbose <COMMAND>
hifis-surveyval -v <COMMAND>
The verbosity of the output can be increased even more
by duplicating the flag --verbose
or -v
up to two times:
hifis-surveyval --verbose --verbose --verbose <COMMAND>
hifis-surveyval -vvv <COMMAND>
Commands
There are three different commands implemented which come with its own set of flags and parameters:
- Command version
- Command init
- Command analyze
Command version
The version
command outputs the version number of this CLI-program like so:
hifis-surveyval version
Command init
Before you start the analysis you may want to change the defaults of the configuration variables. In order to do so, you can create a configuration file that is named hifis-surveyval.yml by issuing the init command:
hifis-surveyval init
This file contains the following information:
ANONYMOUS_QUESTION_ID: _
DATA_ID_SEPARATOR: _
HIERARCHY_SEPARATOR: /
ID_COLUMN_NAME: id
METADATA: metadata
OUTPUT_FOLDER: output
OUTPUT_FORMAT: SCREEN
PREPROCESSING_FILENAME: preprocess.py
SCRIPT_FOLDER: scripts
SCRIPT_NAMES: []
CUSTOM_PLOT_STYLE: "report_style" # Optional
Concepts that You Need to Know
- QuestionCollection: This concept refers to a set of
Questions
that cover the same topic.- Question: This concept refers to an atomic
Question
that can not be on its own and needs to be wrapped up into aQuestionCollection
.Note: Other terms that may describe similar concepts are
question
(which equals toQuestionCollection
) andsub-question
(which equals toQuestion
).
Configuration File Entries Explained
ANONYMOUS_QUESTION_ID
defines a placeholder for Question IDs. The CSV data might not explicitly mention a full ID of a Question but solely the QuestionCollection ID. In this case the HIFIS Surveyval Framework adds a character, by default _ (underscore), to the QuestionCollection ID to mark this situation.- The CSV data file is structured into header and body rows. The header row consists of a comma-separated list of column names. Some column names contain a separator character that concatenates the QuestionCollection ID with the Question ID, the
DATA_ID_SEPARATOR
. This variable indicates which character is used to separate these IDs. If not specified otherwise, it defaults to _ (underscore).- This
DATA_ID_SEPARATOR
character is internally replaced by a different character, the so calledHIERARCHY_SEPARATOR
, which defaults to a / (slash).- With
ID_COLUMN_NAME
you may want to specify the name of the id column in the CSV data file.- Each analysis needs metadata about the questions asked in the survey and answers that participants may give. Setting
METADATA
specifies the location of the metadata files which are by default located in a folder called metadata. Be aware that it is recommended to have one YAML file per QuestionCollection. Each YAML file then covers the metadata of a single QuestionCollection and is named according to the ID of this QuestionCollection.- You may specify the output folder by setting
OUTPUT_FOLDER
which is named output by default.- You may prefer a specific output format like PDF, PNG, SVG or SCREEN which you may select via
OUTPUT_FORMAT
. The default value is SCREEN. Note: Be aware that other output formats like text or markdown files may be created, which depends largely upon the implementation of the analysis scripts.- You might want to tell the program where to find the preprocessing file preprocess.py that preprocesses and filters your survey data according to specific rules. You can do so by setting
PREPROCESSING_FILENAME
.- You may specify the folder which contains the analysis scripts with setting
SCRIPT_FOLDER
, which is the scripts folder by default.- With
SCRIPT_NAMES
you may select a subset of the analysis scripts available as a list that ought to be executed. This list is empty by default, which means, all scripts are executed.
Hint for LimeSurvey Users
There is an option
Expression Manager code
in LimeSurvey when exporting the data into a CSV file that uses the separator character_
(underscore) to concatenate QuestionCollection ID and Question ID in the CSV data header. Otherwise the default[]
is used which is not compatible with the HIFIS Surveyval Framework.
Additional Files Generated
Additional to the configuration file, three more files are created:
- File preprocess.py is created in the root folder of the project.
- File example_script.py is created in the scripts folder of the project.
- File style_template.mplstyle is created in the custom_plot_styles folder of the project.
Command analyze
The more interesting command is the analyze
command
which comes with a data-parameter.
The data-parameter can not be omitted and need to be given explicitly
in order to be able to start the analysis.
This is an example of how to do the analysis:
hifis-surveyval analyze data/<data_file_name>.csv
Contribute with Own Analysis Scripts
Essential Requirements for Developing Own Analysis Scripts
As you might have read in the previous sections the actual analysis scripts
reside in a specific folder called scripts
.
All scripts in that folder will be automatically discovered by the package
hifis-surveyval
when running the analysis.
In order that the program recognizes the scripts in that folder as
analysis scripts they need to fulfill the following two criteria:
- The filename need to end on
.py
. - The file need to contain a function called
run
without any parameters.
"""
A dummy script for testing the function dispatch
.. currentmodule:: hifis_surveyval.scripts.dummy
.. moduleauthor:: HIFIS Software <software@hifis.net>
"""
def run():
print("Example Script")
If both requirements are satisfied the program will execute the run
-functions
of the analysis scripts in an arbitrary order.
File-System Structure of the Core Component
$ tree hifis_surveyval/
hifis_surveyval
├── cli.py
├── core
│ ├── dispatch.py
│ ├── preprocess.py
│ ├── settings.py
│ └── util.py
├── data_container.py
├── files
│ ├── example_preprocess.py
│ ├── example_script.py
│ ├── preprocess.py
│ ├── scripts
│ │ └── example-01-accessing-data.py
│ └── style_template.mplstyle
├── hifis_surveyval.py
├── models
│ ├── answer_option.py
│ ├── answer_types.py
│ ├── mixins
│ │ ├── mixins.py
│ │ ├── uses_settings.py
│ │ └── yaml_constructable.py
│ ├── question_collection.py
│ ├── question.py
│ └── translated.py
├── plotting
│ ├── matplotlib_plotter.py
│ ├── plot_styles
│ │ └── report_style.mplstyle
│ ├── plotter.py
│ └── supported_output_format.py
└── printing
└── printer.py
Matplotlib Custom Plot Styles
Files with file ending .mplstyle
contain styling information for Matplotlib
plots.
An example Matplotlib plot style template called style_template.mplstyle
is
created with the init
-command of the HIFIS-Surveyval framework.
These files can be put in a folder called custom_plot_styles
.
Custom plot style files that are generally used within the framework need to be
put into folder hifis_surveyval/plotting/plot_styles/
.
Plots created with Matplotlib accept a kwargs
-argument plot_style_name
which chooses which plot style to use for the plot.
Styles in folder custom_plot_styles
have precedence over styles in folder
hifis_surveyval/plotting/plot_styles/
if they have the same name.
Resources
Below are some handy resource links:
- Project Documentation
- Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary.
- Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Geog Brandl and licnsed under the BSD license.
- pytest helps you write better programs.
- GNU Make is a tool which controls the generation of executables and other non-source files of a program from the program's source files.
Author Information
HIFIS-Surveyval was created by HIFIS Software Services.
Contributors
We would like to thank and give credits to the following contributors of this project:
- Be the first to be named here!
License
Copyright © 2021 HIFIS Software support@hifis.net
This work is licensed under the following license(s):
- Everything else is licensed under GPL-3.0-or-later
Please see the individual files for more accurate information.
Hint: We provided the copyright and license information in accordance to the REUSE Specification 3.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hifis_surveyval-1.5.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70fef468af703d2badd4602c254b135ffcdd5f8a6f0d13d4033735c06deb6fbf |
|
MD5 | b4288de78fe942bd4a1338f3d1daec99 |
|
BLAKE2b-256 | 16e5a99ac7cf295b0fe494bc831cd2b4fc10addd67235f3c30d5a8f380797ab0 |