API for adding content to the Kolibri content curation server
Project description
ricecooker
The ricecooker
library is a framework for automating the conversion of educational content into
Kolibri content channels and uploading them to Kolibri Studio,
which is the central content server for Kolibri.
Overview
ricecooker
is used to take openly licensed educational content available on the
web and convert it into an offline-friendly package that can be imported into Kolibri.
The basic process of getting new content into Kolibri is as follows:
- Create and upload a new Kolibri Channel using either
ricecooker
integration script or by manually uploading content through the Kolibri Studio web interface. - Publish the new channel using Kolibri Studio to make it accessible to Kolibri.
- Copy the channel's token in Kolibri Studio, and paste it into Kolibri's import screen to import the channel.
The diagram below illustrates the three steps of this process:
Key Concepts
Before we go any further, let us provide more details on some key concepts in the Kolibri Content Pipeline.
Kolibri Channel
- A Kolibri Channel is a tree-like data structure that consists of the following types of content:
- Topics (folders)
- Content of a type supported by Kolibri, including:
- Document (ePub and PDF files)
- Audio (mp3 files)
- Video (mp4 files)
- HTML5App zip files (generic container for web content: HTML+JS+CSS)
- SlidesShow (jpg and png slide images)
- Exercises, which contain different types of questions:
- SingleSelectQuestion (multiple choice)
- MultipleSelectQuestion (multiple choice with multiple correct answers)
- InputQuestion (good for numeric inputs)
- PerseusQuestion (a rich exercise question format developed at Khan Academy)
ContentNode
A ContentNode is a technical term used to describe a piece of content in Kolibri, along with
the metadata associated with it, such as the licensing, description, and thumbnail. A Kolibri Channel
contains a content tree (i.e. table of contents) made up of ContentNodes
.
Content Integration Script (aka SushiChef)
The content integration scripts that use the ricecooker
library to generate Kolibri Channels
are commonly referred to as SushiChef scripts. The responsibility of a SushiChef
is to download the source
content, perform any necessary format or structure conversions to create a content tree viewable
in Kolibri, then to upload the output of this process to Kolibri Studio for review and publishing.
Conceptually, SushiChef
scripts are very similar to web scrapers, but with specialized functions
for optimizing the content for Kolibri's data structures and capabilities.
Content Pipeline
The combination of software tools and procedures that content moves through from
starting as an external content source to becoming a Kolibri Channel available
for use in the Kolibri Learning Platform. The ricecooker
framework is the
"main actor" in the first part of the content pipeline, and touches all aspects
of the pipeline within the region highlighted in blue in the above diagram.
Installation
We'll assume you have a Python 3 installation on your computer and are familiar
with best practices for working with Python codes (e.g. virtualenv
or pipenv
).
If this is not the case, you can consult the Kolibri developer docs as a guide for
setting up a Python virtualenv.
The ricecooker
library is a standard Python library distributed through PyPI:
- Run
pip install ricecooker
to installricecooker
and all Python dependencies. - Some of the utility functions in
ricecooker.utils
require additional software:- The multimedia command line tool ffmpeg
- The
imagemagick
(version 6) image manipulation tools - The
poppler
library for PDF utilities
For details about the installation steps, see docs/installation.md.
In order to upload your ricecooker
generated channels to Kolibri Studio and make them importable
into Kolibri, you will also need to create an account on Kolibri Studio. To do so, visit
Kolibri Studio and click the "Create an Account" link.
The instructions below assume you have already completed this step.
Creating Your First Content Integration Script
Below is code for a simple sushi chef script that uses the ricecooker
library to create a Kolibri
channel with a single topic node (Folder), and puts a single PDF content node inside that folder.
To get started, create a new project folder and save the following code in a file called sushichef.py
:
Important Note Be sure to give unique values for the CHANNEL_SOURCE_DOMAIN
and CHANNEL_SOURCE_ID
,
as these values are used to determine your channel's ID and using duplicate values will lead
to an error when trying to upload.
#!/usr/bin/env python
from ricecooker.chefs import SushiChef
from ricecooker.classes.nodes import ChannelNode, TopicNode, DocumentNode
from ricecooker.classes.files import DocumentFile
from ricecooker.classes.licenses import get_license
class SimpleChef(SushiChef):
channel_info = {
'CHANNEL_TITLE': 'Potatoes info channel',
'CHANNEL_SOURCE_DOMAIN': '<domain.org>', # where you got the content (change me!!)
'CHANNEL_SOURCE_ID': '<unique id for channel>', # channel's unique id (change me!!)
'CHANNEL_LANGUAGE': 'en', # le_utils language code
'CHANNEL_THUMBNAIL': 'https://upload.wikimedia.org/wikipedia/commons/b/b7/A_Grande_Batata.jpg', # (optional)
'CHANNEL_DESCRIPTION': 'What is this channel about?', # (optional)
}
def construct_channel(self, **kwargs):
channel = self.get_channel(**kwargs)
potato_topic = TopicNode(title="Potatoes!", source_id="<potatos_id>")
channel.add_child(potato_topic)
doc_node = DocumentNode(
title='Growing potatoes',
description='An article about growing potatoes on your rooftop.',
source_id='pubs/mafri-potatoe',
license=get_license('CC BY', copyright_holder='University of Alberta'),
language='en',
files=[DocumentFile(path='https://www.gov.mb.ca/inr/pdf/pubs/mafri-potatoe.pdf',
language='en')],
)
potato_topic.add_child(doc_node)
return channel
if __name__ == '__main__':
"""
Run this script on the command line using:
python sushichef.py -v --reset --token=YOURTOKENHERE9139139f3a23232
"""
simple_chef = SimpleChef()
simple_chef.main()
You can run the chef script by passing the appropriate command line arguments:
python sushichef.py --reset --token=YOURTOKENHERE9139139f3a23232
The most important argument when running a chef script is --token
, which is used
to pass in the Studio Access Token used to allow upload access. You can find this token
by going to the settings page of
the account you created earlier and copying the token it displays.
The flag --reset
is generally useful in development. It ensures the chef script
starts the upload process from scratch every time you run the script
(otherwise the script will prompt you to resume from the last saved checkpoint).
To see all the ricecooker
command line options, run python sushichef.py -h
.
For more details about running chef scripts see the chefops page.
If you get an error when running the chef, make sure you've replaced
YOURTOKENHERE9139139f3a23232
by the token you obtained from Studio.
Also make sure you've changed the value of channel_info['CHANNEL_SOURCE_DOMAIN']
and channel_info['CHANNEL_SOURCE_ID']
instead of using the default values.
Next Steps
The Kolibri Content Pipeline is a collaborative effort between educational experts and software developers. As such, we have provided some getting docs of particular relevance for each role in the process:
-
Content specialists and Administrators can read the non-technical part of the documentation to learn about how content works in the Kolibri platform.
- The best place to start is the Kolibri Platform overview.
- The page on content workflows also has a useful overview of the steps of the process.
- You can read about the supported content types here.
- The page on Reviewing Channel provides more information about the possible content issues to watch out for.
-
Chef authors can read the remainder of this README, and get started using the
ricecooker
library by following these first steps:- Quickstart, which will introduce you to the steps needed to create a sushi chef script.
- After the quickstart, you should be ready to take things into your own hands, and complete all steps in the ricecooker tutorial.
- The next step after that is to read the ricecooker usage docs, which is also available Jupyter notebooks under docs/tutorial/. More detailed technical documentation is available on the following topics:
- Installation
- Content Nodes
- File types
- Exercises
- HTML5 apps
- Parsing HTML
- Running chef scripts to learn about the command line args, for controlling chef operation, managing caches, and other options.
- Sushi chef style guide
-
Ricecooker developers should read all the documentation for chef authors, and also consult the docs in the developer/ folder for additional information info about the "behind the scenes" work needed to support the Kolibri content pipeline:
- Running chef scripts, also known as chefops.
- Running chef scripts in daemon mode
- Managing the content pipeline, also known as sushops.
Further reading
- Read the Kolibri Studio docs to learn more about the Kolibri Studio features
- Read the Kolibri user guide to learn how to install Kolibri on your machine (useful for testing channels)
- Read the Kolibri developer docs to learn about the inner workings of Kolibri.
======= History
0.6.42 (2020-04-10)
- Added
--sample N
command line option. Run script with--sample 10
to produce a test version of the channel with 10 randomly selected nodes from the full channel. Use this to check transformations are working as expected. - Added
dryrun
command. Use the command./sushichef.py dryrun
to run the chef as normal but skip the step where the files get uploaded to Studio. - Added HTTP proxy functionality for YouTubeVideoFile and YouTubeSubtitleFile
Set the
PROXY_LIST
env variable to a;
-separated list of{ip}:{port}
. Ricecooker will detect the presence of thePROXY_LIST
and use it when accessing resources via YoutubeDL. Alernarively, setUSEPROXY
env var to use a list of free proxy servers, which are very slow and not reliable. - Improved colored logging functionality and customizability of logging output.
0.6.40 (2020-02-07)
- Changed default behaviour to upload the staging tree instead of the main tree
- Added
--deploy
flag to reproduce old bahavior (upload to main tree) - Added thumbnail generating methods for audio, HTML5, PDF, and ePub nodes.
Set the
derive_thumbnail=True
when creating the Node instance, or pass the command line argument--thumbnails
to generate thumbnails for all nodes. Note: automatic thumbnail generation will only work ifthumbnail
is None.
0.6.38 (2019-12-27)
- Added support the
h5p
content kind andh5p
file type - Removed monkey-patching of
localStorage
anddocument.cookie
in the helper methoddownload_static_assets
- Added validation logic for tags
- Improved error reporting
0.6.36 (2019-09-25)
- Added support for tags using the
JsonChef
workflow - Added validation step to ensure subtitles file are unique for each language code
- Document new
SlidesShow
content kind coming in Kolibri 0.13 - Added docs with detailed instruction for content upload and update workflows
- Bugfixes to file extension logic and improved error handling around subtitles
0.6.32 (2019-08-01)
- Updated documentation to use top-level headings
- Removed support for Python 3.4
- Removed support for the "sous chef" workflow
0.6.31 (2019-07-01)
- Handle more subtitle convertible formats:
SRT
,TTML
,SCC
,DFXP
, andSAMI
0.6.30 (2019-05-01)
- Updated docs build scripts to make ricecooker docs available on read the docs
- Added
corrections
command line script for making bulk edits to content metadata - Added
StudioApi
client to support CRUD (created, read, update, delete) Studio actions - Added pdf-splitting helper methods (see
ricecooker/utils/pdf.py
)
0.6.23 (2018-11-08)
- Updated
le-utils
andpressurcooker
dependencies to latest version - Added support for ePub files (
EPubFile
s can be added ofDocumentNode
s) - Added tag support
- Changed default value for
STUDIO_URL
toapi.studio.learningequality.org
- Added
aggregator
andprovider
fields for content nodes - Various bugfixes to image processing in exercises
- Changed validation logic to use
self.filename
to check file format is inself.allowed_formats
- Added
is_youtube_subtitle_file_supported_language
helper function to support importing youtube subs - Added
srt2vtt
subtitles conversion - Added static assets downloader helper method in
utils.downloader.download_static_assets
- Added LineCook chef functions to
--generate
CSV from directory structure - Fixed the always
randomize=True
bug - Docs: general content node metadata guidelines
- Docs: video compression instructions and helper scripts
convertvideo.bat
andconvertvideo.sh
0.6.17 (2018-04-20)
- Added support for
role
attribute on ConentNodes (currentlycoach
||learner
) - Update pressurecooker dependency (to catch compression errors)
- Docs improvements, see https://github.com/learningequality/ricecooker/tree/master/docs
0.6.15 (2018-03-06)
- Added support for non-mp4 video files, with auto-conversion using ffmpeg. See
git diff b1d15fa 87f2528
- Added CSV exercises workflow support to
LineCook
chef class - Added --nomonitor CLI argument to disable sushibar functionality
- Defined new ENV variables:
- PHANTOMJS_PATH: set this to a phantomjs binary (instead of assuming one in node_modules)
- STUDIO_URL (alias CONTENTWORKSHOP_URL): set to URL of Kolibri Studio server where to upload files
- Various fixes to support sushi chefs
- Removed
minimize_html_css_js
utility function fromricecooker/utils/html.py
to remove dependency oncss_html_js_minify
and support Py3.4 fully.
0.6.9 (2017-11-14)
- Changed default logging level to --verbose
- Added support for cronjobs scripts via
--cmdsock
(see docs/daemonization.md) - Added tools for creating HTML5Zip files in utils/html_writer.py
- Added utility for downloading HTML with optional js support in utils/downloader.py
- Added utils/path_builder.py and utils/data_writer.py for creating souschef archives (zip archive that contains files in a folder hierarchy + Channel.csv + Content.csv)
0.6.7 (2017-10-04)
- Sibling content nodes are now required to have unique source_id
- The field
copyright_holder
is required for all licenses other than public domain
0.6.7 (2017-10-04)
- Sibling content nodes are now required to have unique source_id
- The field
copyright_holder
is required for all licenses other than public domain
0.6.6 (2017-09-29)
- Added
JsonTreeChef
class for creating channels from ricecooker json trees - Added
LineCook
chef class to support souschef-based channel workflows
0.6.4 (2017-08-31)
- Added
language
attribute forContentNode
(string key in internal repr. defined in le-utils) - Made
language
a required attribute for ChannelNode - Enabled sushibar.learningequality.org progress monitoring by default Set SUSHIBAR_URL env. var to control where progress is reported (e.g. http://localhost:8001)
- Updated le-utils and pressurecooker dependencies to latest
0.6.2 (2017-07-07)
- Clarify ricecooker is Python3 only (for now)
- Use https:// and wss:// for SuhiBar reporting
0.6.0 (2017-06-28)
- Remote progress reporting and logging to SushiBar (MVP version)
- New API based on the SuchiChef classes
- Support existing old-API chefs in compatibility mode
0.5.13 (2017-06-15)
- Last stable release before SushiBar functionality was added
- Renamed --do-not-activate argument to --stage
0.1.0 (2016-09-30)
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.