Skip to main content

API for adding content to the Kolibri content curation server

Project description

ricecooker
==========

The ``ricecooker`` library is a framework for creating Kolibri content
channels and uploading them to `Kolibri
Studio <https://studio.learningequality.org/>`__, which is the central
content server that `Kolibri <http://learningequality.org/kolibri/>`__
applications talk to when they import content.

The Kolibri content pipeline is pictured below:

.. figure:: https://raw.githubusercontent.com/learningequality/ricecooker/master/docs/figures/content_pipeline_diagram.png
:alt: The Kolibri Content Pipeline

The Kolibri Content Pipeline

This ``ricecooker`` framework is the “main actor” in the first part of
the content pipeline, and touches all aspects of the pipeline within the
region highlighted in blue in the above diagram.

Before we continue, let’s have some definitions: - A **Kolibri channel**
is a tree-like data structure that consist of the following content
nodes: - Topic nodes (folders) - Content types: - Document (PDF files) -
Audio (mp3 files) - Video (mp4 files) - HTML5App zip files (generic
container for web content: HTML+JS+CSS) - Exercises - A **sushi chef**
is a Python script that uses the ``ricecooker`` library to import
content from various sources, organize content into Kolibri channels and
upload the channel to Kolibri Studio.

Overview
--------

Use the following shortcuts to jump to the most relevant parts of the
``ricecooker`` documentation depending on your role:

- **Content specialists and Administrators** can read the non-technical
part of the documentation to learn about how content works in the
Kolibri platform.

- The best place to start is the `Kolibri Platform
overview <https://github.com/learningequality/ricecooker/blob/master/docs/platform/README.md>`__.
- Read more about the supported `content types
here <https://github.com/learningequality/ricecooker/blob/master/docs/platform/content_types.md>`__
- Content curators can consult `this
document <https://docs.google.com/document/d/1slwoNT90Wqu0Rr8MJMAEsA-9LWLRvSeOgdg9u7HrZB8/edit?usp=sharing>`__
for information about how to prepare “spec sheets” that guide
developers how to import content into the Kolibri ecosystem.
- The Non-technical of particular interest is the `CSV
workflow <https://github.com/learningequality/ricecooker/blob/master/docs/csv_metadata/README.md>`__
channel metadata as spreadsheets

- **Chef authors** can read the remainder of this README, and get
started using the ``ricecooker`` library by following these first
steps:

- `Quickstart <https://github.com/learningequality/ricecooker/blob/master/docs/tutorial/quickstart.ipynb>`__,
which will introduce you to the steps needed to create a sushi
chef script.
- After the quickstart, you should be ready to take things into your
own hands, and complete all steps in the `ricecooker
tutorial <https://gist.github.com/jayoshih/6678546d2a2fa3e7f04fc9090d81aff6>`__.
- The next step after that is to read the `ricecooker usage
docs <https://github.com/learningequality/ricecooker/blob/master/docs/usage.md>`__,
which is also available Jupyter notebooks under
`docs/tutorial/ <https://github.com/learningequality/ricecooker/blob/master/docs/tutorial/>`__.
More detailed technical documentation is available on the
following topics:
- `Installation <https://github.com/learningequality/ricecooker/blob/master/docs/installation.md>`__
- `Content
Nodes <https://github.com/learningequality/ricecooker/blob/master/docs/nodes.md>`__
- `File
types <https://github.com/learningequality/ricecooker/blob/master/docs/files.md>`__
- `Exercises <https://github.com/learningequality/ricecooker/blob/master/docs/exercises.md>`__
- `HTML5
apps <https://github.com/learningequality/ricecooker/blob/master/docs/htmlapps.md>`__
- `Parsing
HTML <https://github.com/learningequality/ricecooker/blob/master/docs/parsing_html.md>`__
- `Running chef
scripts <https://github.com/learningequality/ricecooker/blob/master/docs/chefops.md>`__
to learn about the command line args, for controlling chef
operation, managing caches, and other options.
- `Sushi chef style
guide <https://docs.google.com/document/d/1_Wh7IxPmFScQSuIb9k58XXMbXeSM0ZQLkoXFnzKyi_s/edit>`__

- **Ricecooker developers** should read all the documentation for chef
authors, and also consult the docs in the
`developer/ <https://github.com/learningequality/ricecooker/blob/master/docs/developer>`__
folder for additional information info about the “behind the scenes”
work needed to support the Kolibri content pipeline:

- `Running chef scripts <chefops.md>`__, also known as **chefops**.
- `Running chef scripts in daemon
mode <https://github.com/learningequality/ricecooker/blob/master/docs/developer/daemonization.md>`__
- `Managing the content
pipeline <https://github.com/learningequality/ricecooker/blob/master/docs/developer/sushops.md>`__,
also known as **sushops**.

Installation
------------

We’ll assume you have a Python 3 installation on your computer and are
familiar with best practices for working with Python codes (e.g.
``virtualenv`` or ``pipenv``). If this is not the case, you can consult
the Kolibri developer docs as a guide for `setting up a Python
virtualenv <http://kolibri-dev.readthedocs.io/en/latest/start/getting_started.html#virtual-environment>`__.

The ``ricecooker`` library is a standard Python library distributed
through PyPI: - Run ``pip install ricecooker`` to install You can then
use ``import ricecooker`` in your chef script. - Some of functions in
``ricecooker.utils`` require additional software: - Make sure you
install the command line tool `ffmpeg <https://ffmpeg.org/>`__ - Running
javascript code while scraping webpages requires the phantomJS browser.
You can run ``npm install phantomjs-prebuilt`` in your chef’s working
directory.

For more details and install options, see
`docs/installation.md <https://github.com/learningequality/ricecooker/blob/master/docs/installation.md>`__.

Simple chef example
-------------------

This is a sushi chef script that uses the ``ricecooker`` library to
create a Kolibri channel with a single topic node (Folder), and puts a
single PDF content node inside that folder.

::

#!/usr/bin/env python
from ricecooker.chefs import SushiChef
from ricecooker.classes.nodes import ChannelNode, TopicNode, DocumentNode
from ricecooker.classes.files import DocumentFile
from ricecooker.classes.licenses import get_license


class SimpleChef(SushiChef):
channel_info = {
'CHANNEL_TITLE': 'Potatoes info channel',
'CHANNEL_SOURCE_DOMAIN': '<domain.org>', # where you got the content (change me!!)
'CHANNEL_SOURCE_ID': '<unique id for channel>', # channel's unique id (change me!!)
'CHANNEL_LANGUAGE': 'en', # le_utils language code
'CHANNEL_THUMBNAIL': 'https://upload.wikimedia.org/wikipedia/commons/b/b7/A_Grande_Batata.jpg', # (optional)
'CHANNEL_DESCRIPTION': 'What is this channel about?', # (optional)
}

def construct_channel(self, **kwargs):
channel = self.get_channel(**kwargs)
potato_topic = TopicNode(title="Potatoes!", source_id="<potatos_id>")
channel.add_child(potato_topic)
doc_node = DocumentNode(
title='Growing potatoes',
description='An article about growing potatoes on your rooftop.',
source_id='pubs/mafri-potatoe',
license=get_license('CC BY', copyright_holder='University of Alberta'),
language='en',
files=[DocumentFile(path='https://www.gov.mb.ca/inr/pdf/pubs/mafri-potatoe.pdf',
language='en')],
)
potato_topic.add_child(doc_node)
return channel


if __name__ == '__main__':
"""
Run this script on the command line using:
python simple_chef.py -v --reset --token=YOURTOKENHERE9139139f3a23232
"""
simple_chef = SimpleChef()
simple_chef.main()

Let’s assume the above code snippet is saved as the file
``simple_chef.py``.

You can run the chef script by passing the appropriate command line
arguments:

::

python simple_chef.py -v --reset --token=YOURTOKENHERE9139139f3a23232

The most important argument when running a chef script is ``--token``
which is used to pass in the Studio Access Token which you can obtain
from your profile’s `settings
page <http://studio.learningequality.org/settings/tokens>`__.

The flags ``-v`` (verbose) and ``--reset`` are generally useful in
development. These make sure the chef script will start the process from
scratch and displays useful debugging information on the command line.

To see all the ``ricecooker`` command line options, run
``python simple_chef.py -h``. For more details about running chef
scripts see `the chefops
page <https://github.com/learningequality/ricecooker/blob/master/docs/chefops.md>`__.

If you get an error when running the chef, make sure you’ve replaced
``YOURTOKENHERE9139139f3a23232`` by the token you obtained from Studio.
Also make sure you’ve changed the value of
``channel_info['CHANNEL_SOURCE_DOMAIN']`` and
``channel_info['CHANNEL_SOURCE_ID']`` instead of using the default
values.

Next steps
----------

- See the `usage
docs <https://github.com/learningequality/ricecooker/blob/master/docs/usage.md>`__
for more explanations about the above code.
- See
`nodes <https://github.com/learningequality/ricecooker/blob/master/docs/nodes.md>`__
to learn how to create different content node types.
- See
`file <https://github.com/learningequality/ricecooker/blob/master/docs/files.md>`__
to learn about the file types supported, and how to create them.

Further reading
---------------

- Read the `Kolibri Studio
docs <http://kolibri-studio.readthedocs.io/en/latest/>`__ to learn
more about the Kolibri Studio features
- Read the `Kolibri user
guide <http://kolibri.readthedocs.io/en/latest/>`__ to learn how to
install Kolibri on your machine (useful for testing channels)
- Read the `Kolibri developer
docs <http://kolibri-dev.readthedocs.io/en/latest/>`__ to learn about
the inner workings of Kolibri.


=======
History
=======

0.6.23 (2018-11-08)
-------------------
* Updated ``le-utils`` and ``pressurcooker`` dependencies to latest version
* Added support for ePub files (``EPubFile``s can be added of ``DocumentNode``s)
* Added tag support
* Changed default value for ``STUDIO_URL`` to ``api.studio.learningequality.org``
* Added ``aggregator`` and ``provider`` fields for content nodes
* Various bugfixes to image processing in exercises
* Changed validation logic to use ``self.filename`` to check file format is in ``self.allowed_formats``
* Added ``is_youtube_subtitle_file_supported_language`` helper function to support importing youtube subs
* Added ``srt2vtt`` subtitles conversion
* Added static assets downloader helper method in ``utils.downloader.download_static_assets``
* Added LineCook chef functions to ``--generate`` CSV from directory structure
* Fixed the always ``randomize=True`` bug
* Docs: general content node metadata guidelines
* Docs: video compression instructions and helper scripts ``convertvideo.bat`` and ``convertvideo.sh``


0.6.17 (2018-04-20)
-------------------
* Added support for ``role`` attribute on ConentNodes (currently ``coach`` || ``learner``)
* Update pressurecooker dependency (to catch compression errors)
* Docs improvements, see https://github.com/learningequality/ricecooker/tree/master/docs


0.6.15 (2018-03-06)
-------------------
* Added support for non-mp4 video files, with auto-conversion using ffmpeg. See ``git diff b1d15fa 87f2528``
* Added CSV exercises workflow support to ``LineCook`` chef class
* Added --nomonitor CLI argument to disable sushibar functionality
* Defined new ENV variables:
* PHANTOMJS_PATH: set this to a phantomjs binary (instead of assuming one in node_modules)
* STUDIO_URL (alias CONTENTWORKSHOP_URL): set to URL of Kolibri Studio server where to upload files
* Various fixes to support sushi chefs
* Removed ``minimize_html_css_js`` utility function from ``ricecooker/utils/html.py``
to remove dependency on ``css_html_js_minify`` and support Py3.4 fully.


0.6.9 (2017-11-14)
------------------
* Changed default logging level to --verbose
* Added support for cronjobs scripts via `--cmdsock` (see docs/daemonization.md)
* Added tools for creating HTML5Zip files in utils/html_writer.py
* Added utility for downloading HTML with optional js support in utils/downloader.py
* Added utils/path_builder.py and utils/data_writer.py for creating souschef archives
(zip archive that contains files in a folder hierarchy + Channel.csv + Content.csv)


0.6.7 (2017-10-04)
------------------
* Sibling content nodes are now required to have unique source_id
* The field `copyright_holder` is required for all licenses other than public domain


0.6.7 (2017-10-04)
------------------
* Sibling content nodes are now required to have unique source_id
* The field `copyright_holder` is required for all licenses other than public domain


0.6.6 (2017-09-29)
------------------
* Added `JsonTreeChef` class for creating channels from ricecooker json trees
* Added `LineCook` chef class to support souschef-based channel workflows


0.6.4 (2017-08-31)
------------------
* Added `language` attribute for `ContentNode` (string key in internal repr. defined in le-utils)
* Made `language` a required attribute for ChannelNode
* Enabled sushibar.learningequality.org progress monitoring by default
Set SUSHIBAR_URL env. var to control where progress is reported (e.g. http://localhost:8001)
* Updated le-utils and pressurecooker dependencies to latest


0.6.2 (2017-07-07)
------------------
* Clarify ricecooker is Python3 only (for now)
* Use https:// and wss:// for SuhiBar reporting


0.6.0 (2017-06-28)
------------------
* Remote progress reporting and logging to SushiBar (MVP version)
* New API based on the SuchiChef classes
* Support existing old-API chefs in compatibility mode



0.5.13 (2017-06-15)
-------------------
* Last stable release before SushiBar functionality was added
* Renamed --do-not-activate argument to --stage



0.1.0 (2016-09-30)
------------------
* First release on PyPI.

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
ricecooker-0.6.24.tar.gz (1.1 MB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page