A Python module that adds features to OpenLA data to make it easier to use for ML
Project description
openla-feature-representation: generate features for EventStream data
Introduction
openla-feature-representation is an open-source Python module that generates features from OpenLA EventStream data, to make the data easier to use for ML.
Installation
This module module is available on PyPI and it can be installed using pip
as follows:
pip install openla-feature-representation
Downloading the model
For the E2Vec class, you will need the openla-feature-representation-fastText_1min.bin
model .
Feel free to dowload it from the OpenLA models download site.
Usage of the E2Vec class
First, import the openla_feature_representation
package with an arbitrary name, here lafr
.
import openla_feature_representation as lafr
Initializing the class
This is the constructor:
e2Vec = lafr.E2Vec(ftmodel_path, input_csv_dir_path, course_id)
ftmodel_path
is the path to a fastText language model trained for this taskinput_csv_dir_path
is the path to a directory with the dataset (see below)course_id
is a string to identify files for the course to analyze within theinfo_dir
directory (e.g.'A-2023'
)
After getting your own e2Vec
object, all methods the class provides can be used on it.
Generate sentences for the event log
The fastText model uses an artificial language to express event log entries as sentences. This is how you can generate them:
sentences = e2Vec.generate_sentences(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=input_csv_dir_path,
course_id=course_id,
)
If you need to select or filter a time span:
sentences = e2Vec.generate_sentences(
sentences_dir_path=sentence_path,
use_timespan=True,
start_minute=0,
total_minutes=90,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=input_csv_dir_path,
course_id=course_id,
)
sentences_dir_path
is the path to the directory where you want the sentence files to be writteneventstream_file_path
is the path to the event stream csv fileinput_csv_dir_path
is the path to a directory with the dataset (see below)course_id
is a string to identify files for the course to analyze within theinfo_dir
directoryuse_timespan
ifTrue
, the args below will be used to extract a timespan from the data (optional)start_minute
is the minute in the data the sentence generation should start (optional)total_minutes
is the number of minutes worth of sentences that should be generated (optional)
This function saves the sentences to a text file and returns a path to it.
Vectorize the sentences
This function returns a pandas DataFrame with the vectors generated from the sentences.
user_vectors = e2Vec.vectorize_sentences(sentences_file_path)
sentences_file_path
is the path to the sentence files generated in the previous step
Concatenation
The class has a function to concatenate vectors by time (minutes) or weeks.
This will concatenate the vectors in 10-minute spans.
vectors = e2Vec.concatenate_vectors(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=eduData,
course_id=course_id,
start_minute=0,
total_minutes=10,
)
This will concatenate the vectors by the week or lesson.
vectors = e2Vec.concatenate_vectors(
sentences_dir_path=sentences_dir_path,
eventstream_file_path=eventstream_file_path,
input_csv_dir_path=eduData,
course_id=course_id,
by_weeks=True,
start_minute=0,
)
sentences_dir_path
is the path to the sentence files generated in the previous stepeventstream_file_path
is the path to the event stream csv fileinput_csv_dir_path
is the path to a directory with the dataset (see below)course_id
is a string to identify files for the course to analyze within theinfo_dir
directoryby_weeks
concatenates vectors by week ifTrue
(by time by default)start_minute
is the minute in the data the sentence generation should start (optional)total_minutes
is the number of minutes worth of sentences that should be generated each time (optional)
Usage of the ALP class
ALP (Active Learner Point) is a set of metrics that take BookRoll (ebook) and Moodle activity per lecture into account: attendance, report submissions, course views, slide views, adding markers or memos, and other actions.
from openla_feature_representation import Alp
alp = Alp(course_id="114")
course_id
is a string to identify files for the course to analyze within theDataset
directory
The Alp
class constructor above makes three DataFrames available as properties of the returned Alp
object:
features_df
: aggregated totals of how many times each user took any of the relevant actions for each lecturealp_df
: the features above replaced by a number from 0 to 5 following the criteria belowalp_df_normalized
: same as above, only the 0 to 5 numbers are normalized between 0 and 1
Criteria for the ALP 0-to-5 scale
value | description |
---|---|
5 : |
Top 10%, or attending the lecture, or submitting a report |
4 : |
Top 20% |
3 : |
Top 30%, or being late to the lecture, or submitting late |
2 : |
Top 40% |
1 : |
Top 50% |
0 : |
Bottom 50%, or not attending, or not submitting |
Examples for the class's methods and instance variables
alp.features_df # These are the aggregated features
alp.alp_df # These are the ALP 0-to-5 values
alp.alp_df_normalized # These are the normalized values
# The following will write CSV files for the relevant DataFrame
# Paths and filenames can be specified, but there are default
# filenames and the paths default to the present directory
alp.write_features_csv()
alp.write_alp_csv()
alp.write_alp_normalized_csv()
Datasets for OpenLA
This module uses data in the same or a similar format as OpenLA. Please refer to the OpenLA documentation for further information.
Datasets for the Alp
class must be placed in the "Dataset/"
directory relative to the Python script they are called from.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file openla_feature_representation-0.1.3.tar.gz
.
File metadata
- Download URL: openla_feature_representation-0.1.3.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.9.9 Darwin/21.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a08f7ba6e660dbfe056c2a8e0e0816234c45d1e208707181fd60d38e49ddaa67 |
|
MD5 | 971df68162085469fb9bd1487e2643b3 |
|
BLAKE2b-256 | 99c9a40a2daeb6427dc058e73ad622ca6126e531314158beabe2a4132b9ee14a |
File details
Details for the file openla_feature_representation-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: openla_feature_representation-0.1.3-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.9.9 Darwin/21.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13f6ad0470b25346583447d0ad82387df9fea6356b20503b5e7b5c75c85426c5 |
|
MD5 | 2b5d90fbe9de91e0d0f0bff4a58b2702 |
|
BLAKE2b-256 | 2e34f9af92f8d0bfdab3fb58a147d9c7b510f9c237499d763f2f66d4939c9950 |