Skip to main content

The HIFIS Surveyval Framework is a tool to analyse survey data from LimeSurvey data exports.

Project description

HIFIS Surveyval Framework

What Is the Framework Intended For?

The HIFIS Surveyval Framework is a tool to analyse data from LimeSurvey data exports. It is meant to fill the gap between LimeSurvey and Python data analysis and visualisation libraries such as Numpy, Pandas and Matplotlib. It is not meant to replace them. We aim at making life easier for people who are about to conduct LimeSurvey surveys and who want to analyse the resulting survey data by providing a tested and working set of features, so that you need not reinvent the wheel for your data analysis. We recommend getting familiar with the framework before the survey is planned and conducted to make sure that your use case can be covered by the framework.

What Do You Need to Provide Yourself?

Metadata of Your Survey

The metadata that is describing your survey such as the list of questions and answer options including their translations that are not included in the LimeSurvey data export file are given in a metadata YAML file that you need to provide by yourself. Please have a look into the example metadata YAML file given in this project to get an idea how this file is structured and which information need to be put there.

Pre-processing Steps on Raw Data

Before the analysis is started, the raw data is pre-processed by a pre-processing script that you need to write by yourself as well if data cleaning and other pre-processing steps need to be done for your survey data beforehand. In order to get an impression how the pre-processing script can look like please see the example pre-processing scripts provided in this project to get you started.

Analysis Scripts for Your Survey Data Analysis

Under the hood the architecture of the framework is executing a series of independent data analysis scripts written by yourself in an analysis run. As a result, output will be generated in form of

  • terminal output on the CLI,
  • plots in different formats and
  • text files.

There are also example analysis scripts available in this project that you can take a look at to understand how these analysis scripts make use of the framework.

Your Use Cases Are Not Covered?

As the name of the framework suggests it has been developed to aid in the analysis of HIFIS surveys conducted for the Helmholtz Federated IT Services (HIFIS) platform. Please be aware that as we try to develop a generic framework that can be used in many use cases, it might still be the case that the implemented features arose from our use cases and are not applicable in your case. If this is the case, please get in touch with us, so we can make things happen for you. You are also most welcome to contribute to the project with issues, bug fixes, feature implementations, documentations and testing. Thank you very much in advance for your valuable contributions!

Table of Content

Getting Started

The project's documentation contains a section to help you as a user of the analysis scripts to run the analysis scripts or as a developer of the framework to set up the development environment.

Getting Started for Users

Installation

To install the package locally, you can use Pip.

pip install hifis-surveyval

After the installation, you can use the tool from the command line with hifis-surveyval --help.

Getting Started for Developers

Installation

To install the package locally, you can use Poetry.

Using Poetry

If you want to actively contribute changes to the project, you are required to also install the development packages alongside the framework.

git clone https://gitlab.hzdr.de/hifis/overall/surveys/hifis-surveyval.git
cd hifis-surveyval
poetry install

After the installation, you can use the tool from the command line with poetry run hifis-surveyval --help

Poetry installs some packages that are required for performing quality checks. Usually they are also performed via GitLab CI, but can also be executed locally.

It is common practice to run some checks locally before pushing them online. Therefore, execute below commands:

$ # Order your imports
$ isort -rc .
$ make lint

The following documentation references the pip installation. You can use the same commands with a poetry installation, if you prefix your commands with poetry run COMMAND.

Start Analysis from Command-Line-Interface

The survey analysis package is a program to be executed on the Command-Line-Interface (CLI).

Quick Start Example: Run Analysis

Due to sensible defaults of the project's configurations you need to have the analysis scripts, the preprocessing script as well as metadata and data files in certain locations in order to run the survey analysis. This configuration file hifis-surveyval.yml which includes these defaults is created with the command hifis-surveyval init. Please put your analysis scripts into a sub-folder called scripts. The preprocessing script preprocess.py is expected in the root folder of the project. Make sure that the file meta.yml is put into sub-folder metadata. Finally, copy the CSV data file of your survey to a central location like a data sub-folder and tell the program the path to that data file on the command line when running the survey analysis.

Now you can do the following to start the survey analysis from the CLI:

hifis-surveyval analyze data/<data_file_name>.csv

The output is then put into a sub-folder within the folder output which is named after the stamp of the current date-time if not specified differently.

Caution: Depending on the Operating System used an issue with the file encoding might occur. There might be data-CSV-files around which are encoded with UTF-8-BOM which causes errors when read in on Windows OS. In this case you need to change the encoding to UTF-8 before running the survey analysis.

Flags

The program accepts two flags:

  1. Help flag
  2. Verbosity flag

Help flag

Calling the program with the help-flag is the first thing to do when being encountered with this program. It outputs a so-called Usage-message to the CLI:

$ hifis-surveyval --help

Please issue this command on the CLI and read the detailed Usage-message before continuing with reading the documentation of the Usage-message here.

Verbosity flag

The verbosity-flag can be provided in order to specify the verbosity of the output to the CLI. This flag is called --verbose or -v for short:

hifis-surveyval --verbose <COMMAND>
hifis-surveyval -v <COMMAND>

The verbosity of the output can be increased even more by duplicating the flag --verbose or -v up to two times:

hifis-surveyval --verbose --verbose --verbose <COMMAND>
hifis-surveyval -vvv <COMMAND>

Commands

There are three different commands implemented which come with its own set of flags and parameters:

  1. Command version
  2. Command init
  3. Command analyze

Command version

The version command outputs the version number of this CLI-program like so:

hifis-surveyval version

Command init

Before you start the analysis you may want to change the defaults of the configuration variables. In order to do so, you can create a configuration file that is named hifis-surveyval.yml by issuing the init command:

hifis-surveyval init

This file contains the following information:

ANONYMOUS_QUESTION_ID: _
DATA_ID_SEPARATOR: _
HIERARCHY_SEPARATOR: /
ID_COLUMN_NAME: id
METADATA: metadata
OUTPUT_FOLDER: output
OUTPUT_FORMAT: SCREEN
PREPROCESSING_FILENAME: preprocess.py
SCRIPT_FOLDER: scripts
SCRIPT_NAMES: []
CUSTOM_PLOT_STYLE: "report_style"  # Optional
Concepts that You Need to Know
  • QuestionCollection: This concept refers to a set of Questions that cover the same topic.
  • Question: This concept refers to an atomic Question that can not be on its own and needs to be wrapped up into a QuestionCollection.

Note: Other terms that may describe similar concepts are question (which equals to QuestionCollection) and sub-question (which equals to Question).

Configuration File Entries Explained
  • ANONYMOUS_QUESTION_ID defines a placeholder for Question IDs. The CSV data might not explicitly mention a full ID of a Question but solely the QuestionCollection ID. In this case the HIFIS Surveyval Framework adds a character, by default _ (underscore), to the QuestionCollection ID to mark this situation.
  • The CSV data file is structured into header and body rows. The header row consists of a comma-separated list of column names. Some column names contain a separator character that concatenates the QuestionCollection ID with the Question ID, the DATA_ID_SEPARATOR. This variable indicates which character is used to separate these IDs. If not specified otherwise, it defaults to _ (underscore).
  • This DATA_ID_SEPARATOR character is internally replaced by a different character, the so called HIERARCHY_SEPARATOR, which defaults to a / (slash).
  • With ID_COLUMN_NAME you may want to specify the name of the id column in the CSV data file.
  • Each analysis needs metadata about the questions asked in the survey and answers that participants may give. Setting METADATA specifies the location of the metadata files which are by default located in a folder called metadata. Be aware that it is recommended to have one YAML file per QuestionCollection. Each YAML file then covers the metadata of a single QuestionCollection and is named according to the ID of this QuestionCollection.
  • You may specify the output folder by setting OUTPUT_FOLDER which is named output by default.
  • You may prefer a specific output format like PDF, PNG, SVG or SCREEN which you may select via OUTPUT_FORMAT. The default value is SCREEN. Note: Be aware that other output formats like text or markdown files may be created, which depends largely upon the implementation of the analysis scripts.
  • You might want to tell the program where to find the preprocessing file preprocess.py that preprocesses and filters your survey data according to specific rules. You can do so by setting PREPROCESSING_FILENAME.
  • You may specify the folder which contains the analysis scripts with setting SCRIPT_FOLDER, which is the scripts folder by default.
  • With SCRIPT_NAMES you may select a subset of the analysis scripts available as a list that ought to be executed. This list is empty by default, which means, all scripts are executed.

Hint for LimeSurvey Users

There is an option Expression Manager code in LimeSurvey when exporting the data into a CSV file that uses the separator character _ (underscore) to concatenate QuestionCollection ID and Question ID in the CSV data header. Otherwise the default [] is used which is not compatible with the HIFIS Surveyval Framework.


Additional Files Generated

Additional to the configuration file, three more files are created:

  1. File preprocess.py is created in the root folder of the project.
  2. File example_script.py is created in the scripts folder of the project.
  3. File style_template.mplstyle is created in the custom_plot_styles folder of the project.

Command analyze

The more interesting command is the analyze command which comes with a data-parameter. The data-parameter can not be omitted and need to be given explicitly in order to be able to start the analysis. This is an example of how to do the analysis:

hifis-surveyval analyze data/<data_file_name>.csv

Contribute with Own Analysis Scripts

Essential Requirements for Developing Own Analysis Scripts

As you might have read in the previous sections the actual analysis scripts reside in a specific folder called scripts. All scripts in that folder will be automatically discovered by the package hifis-surveyval when running the analysis. In order that the program recognizes the scripts in that folder as analysis scripts they need to fulfill the following two criteria:

  1. The filename need to end on .py.
  2. The file need to contain a function called run without any parameters.
"""
A dummy script for testing the function dispatch

.. currentmodule:: hifis_surveyval.scripts.dummy
.. moduleauthor:: HIFIS Software <software@hifis.net>
"""

def run():
    print("Example Script")

If both requirements are satisfied the program will execute the run-functions of the analysis scripts in an arbitrary order.

File-System Structure of the Core Component

$ tree hifis_surveyval/
hifis_surveyval
├── cli.py
├── core
│   ├── dispatch.py
│   ├── preprocess.py
│   ├── settings.py
│   └── util.py
├── data_container.py
├── files
│   ├── example_preprocess.py
│   ├── example_script.py
│   ├── preprocess.py
│   ├── scripts
│      └── example-01-accessing-data.py
│   └── style_template.mplstyle
├── hifis_surveyval.py
├── models
│   ├── answer_option.py
│   ├── answer_types.py
│   ├── mixins
│      ├── mixins.py
│      ├── uses_settings.py
│      └── yaml_constructable.py
│   ├── question_collection.py
│   ├── question.py
│   └── translated.py
├── plotting
│   ├── matplotlib_plotter.py
│   ├── plot_styles
│      └── report_style.mplstyle
│   ├── plotter.py
│   └── supported_output_format.py
└── printing
    └── printer.py

Matplotlib Custom Plot Styles

Files with file ending .mplstyle contain styling information for Matplotlib plots. An example Matplotlib plot style template called style_template.mplstyle is created with the init-command of the HIFIS-Surveyval framework. These files can be put in a folder called custom_plot_styles. Custom plot style files that are generally used within the framework need to be put into folder hifis_surveyval/plotting/plot_styles/. Plots created with Matplotlib accept a kwargs-argument plot_style_name which chooses which plot style to use for the plot. Styles in folder custom_plot_styles have precedence over styles in folder hifis_surveyval/plotting/plot_styles/ if they have the same name.

Resources

Below are some handy resource links:

  • Project Documentation
  • Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary.
  • Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Geog Brandl and licnsed under the BSD license.
  • pytest helps you write better programs.
  • GNU Make is a tool which controls the generation of executables and other non-source files of a program from the program's source files.

Author Information

HIFIS-Surveyval was created by HIFIS Software Services.

Contributors

We would like to thank and give credits to the following contributors of this project:

  • Be the first to be named here!

License

Copyright © 2021 HIFIS Software support@hifis.net

This work is licensed under the following license(s):

Please see the individual files for more accurate information.

Hint: We provided the copyright and license information in accordance to the REUSE Specification 3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hifis_surveyval-1.5.2.tar.gz (3.1 MB view hashes)

Uploaded Source

Built Distribution

hifis_surveyval-1.5.2-py3-none-any.whl (3.2 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page