A Flask blueprint that provides a faceted search interface for bibliographies based on Zotero.
Project description
Kerko
Kerko is a web application component for the Flask framework that provides a user-friendly search and browsing interface for sharing a bibliography managed with the Zotero reference manager.
How it works
Kerko does not provide any tools for managing bibliographic records. Instead, a well-established reference management software, Zotero, is used for that purpose. The Zotero desktop application provides powerful tools to individuals or teams for managing bibliographic data, which it stores in the cloud on zotero.org. Kerko can be configured to automatically synchronize its search index from zotero.org on a regular basis, ensuring that visitors get an up-to-date bibliography even if it is changing frequently. When users interact with the Kerko application component, Kerko gets all its data from its own search index; it is only at indexing time that Kerko contacts zotero.org.
The combination of Kerko and Zotero gives you the best of both worlds: a user-friendly interface for end-users of the bibliography, and a powerful bibliographic reference management tool for working on the bibliography's content.
Kerko is implemented in Python as a Flask blueprint and, as such, cannot do much unless it is incorporated into a Flask application. A sample application is available, KerkoApp, which anyone with basic requirements could deploy directly on a web server. It is expected, however, that Kerko will usually be integrated into a larger application, either derived from KerkoApp or custom-built to specific needs. The Kerko-powered bibliography might be just one section of a larger website.
Demo site
A demo site is available for you to try. You may also view the Zotero library that contains the source data for the demo site.
Features
The following features are implemented in Kerko:
- Faceted search interface: allows exploration of the bibliography both in search mode and in browsing mode, potentially suiting different user needs, behaviors and abilities. For example, users with a prior idea of the topic or expected results are able to enter keywords or a more complex query in a search field, while those who wish to become familiar with the content of the bibliography or discover new topics may choose to navigate along the proposed facets, to narrow or broaden their search. Since both modes are integrated into a single interface, it is possible to combine them.
- Keyword search features:
- Boolean operators:
AND
: matches items that contain all specified terms. This is the default relation between terms when no operator is specified, e.g.,a b
is the same asa AND b
.OR
: matches items that contain any of the specified terms, e.g.,a OR b
.NOT
: excludes items that match the term, e.g.,NOT a
.- Boolean operators must be specified in uppercase and may be translated in other languages.
- Logical grouping (with parentheses), e.g.,
(a OR b) AND c
. - Sequence of words (with double quotes), e.g.,
"a b c"
. The default difference between word positions is 1, meaning that an item will match if it contains the words next to each other, but a different maximum distance may be selected (with the tilde character), e.g."web search"~2
allows up to 1 word betweenweb
andsearch
, meaning it could matchweb site search
as well asweb search
. - Term boosting (with the caret), e.g.,
faceted^2 search browsing^0.5
specifies thatfaceted
is twice as important assearch
when computing the relevance score of results, whilebrowsing
is half as important. Boosting may be applied to a logical grouping, e.g.,(a b)^3 c
. - Keyword search is case-insentitive, accents are folded, and punctuation is
ignored. To further improve recall (albeit at the cost of precision),
stemming is also performed on terms from most text fields, e.g., title,
abstract, notes. Stemming relieves the user from having to specify all
variants of a word when searching, e.g., terms such as
search
,searches
, andsearching
all return the same results. The Snowball algorithm is used for that purpose. - Field search: users may target all fields, author/contributor fields only, or titles only. Applications may provide additional choices.
- Boolean operators:
- Faceted browsing: allows filtering by topic (Zotero tag), by resource type (Zotero item type), by publication year. Moreover, an application may define facets modeled on collections and subcollections; in such case, any collection can be represented as a facet, and each subcollection as a value within that facet. By taking advantage of Zotero's ability to assign any given item to multiple collections, a faceted classification scheme can be modeled (including hierarchies within facets).
- Relevance scoring: provided by the Whoosh library and based on the BM25F algorithm, which determines how important a term is to a document in the context of the whole collection of documents, while taking into account its relation to document structure (in this regard most fields are neutral, but the score is boosted when a term appears in specific fields, e.g., DOI, ISBN, ISSN, title, author/contributor). Any keyword search asks the question "how well does this document match this query clause?", which requires calculating a relevance score for each document. Filtering with facets, on the other hand, has no effect on the score because it asks "does this document match this query clause?", which leads to a yes or no answer.
- Sort options: by relevance score (only applicable with keyword search), by publication date, by author, by title.
- Citation styles: any from the Zotero Style Repository, or custom stylesheet defined in the Citation Style Language (stylesheet must be accessible by URL).
- Language support: the default user interface is in English, but some translations are provided. Additional translations may be created using gettext-compatible tools; see the Translating Kerko section below. Also to consider: locales supported by Zotero (which provides the names of fields, item types and author types displayed by Kerko), and languages supported by Whoosh (which provides the search capabilities): ar, da, nl, en, fi, fr, de, hu, it, no, pt, ro, ru, es, sv, tr.
- Responsive design: the simple default implementation works on large monitors as well as on small screens. It is based on Bootstrap.
- Customizable front-end: applications may partly or fully replace the default templates, scripts and stylesheets with their own.
- Semantic markup: users may easily import citations into their own reference manager software, either from search results pages or individual bibliographic record pages, both of which embed bibliographic metadata (using the OpenURL COinS model). Zotero Connector, for example, will automatically detect the metadata present in the page, but similar behavior applies to many other reference management software as well.
- Exporting: users may export individual citations as well as complete bibliographies corresponding to search results. By default, download links are provided for the RIS and BibTeX formats, but applications may be configured to export any format supported by the Zotero API.
- Printing: stylesheets are provided for printing individual bibliographic records as well as lists of search results. When printing search results, all results get printed (not just the current page of results).
- Notes and attachments: notes and file attachments are synchronized from zotero.org and made available to users of the bibliography. Regular expressions may be used to include or exclude notes or attachments from the bibliography based on their tags.
- Modularity: although a standalone application is available, Kerko is designed not as a standalone application, but to be part of a larger Flask application.
Requirements
Kerko requires Python 3.6 or later.
Dependencies
The following packages will be automatically installed when installing Kerko:
- Babel: utilities for internationalization and localization.
- Bootstrap-Flask: helper for integrating Bootstrap.
- environs: helper for separating configuration from code.
- Flask: web application framework.
- Flask-BabelEx: allows Kerko to provide its own translations, at the blueprint level.
- Flask-WTF: simple integration of Flask and WTForms.
- Jinja2: template engine.
- Pyzotero: Python client for the Zotero API.
- Werkzeug: WSGI web application library (also required by Flask).
- Whoosh: pure Python full-text indexing and searching library.
- WTForms: web forms validation and rendering library.
The following front-end resources are loaded from CDNs by Kerko's default templates (but could be completely removed or replaced by your application):
- Bootstrap: front-end component library for web applications.
- FontAwesome: beautiful open source icons.
- jQuery: JavaScript library (required by Bootstrap).
- Popper.js: JavaScript library for handling tooltips, popovers, etc. (used by Bootstrap).
Getting started
This section only applies if you intend to integrate Kerko into your own application. If you are more interested into the standalone KerkoApp application, please refer to its installation instructions.
We'll assume that you have some familiarity with Flask and suggest steps for
building a minimal app, let's call it hello_kerko.py
, to get you started.
-
The first step is to install Kerko. As with any Python library, it is highly recommended to install Kerko within a virtualenv.
Once the virtualenv is set and active, use the following command:
pip install kerko
-
In
hello_kerko.py
, configure variables required by Kerko and create yourapp
object, as in the example below:from flask import Flask from kerko.composer import Composer app = Flask(__name__) app.config['SECRET_KEY'] = '_5#y2L"F4Q8z\n\xec]/' # Replace this value. app.config['KERKO_ZOTERO_API_KEY'] = 'xxxxxxxxxxxxxxxxxxxxxxxx' # Replace this value. app.config['KERKO_ZOTERO_LIBRARY_ID'] = '9999999' # Replace this value. app.config['KERKO_ZOTERO_LIBRARY_TYPE'] = 'group' # Replace this value if necessary. app.config['KERKO_COMPOSER'] = Composer()
SECRET_KEY
: This variable is required for generating secure tokens in web forms. It should have a secure, random value and it really has to be secret. It is usually set in an environment variable rather than in Python code, to make sure it never ends up in a code repository. But here we're taking the minimal route and thus are cutting some corners!KERKO_ZOTERO_API_KEY
,KERKO_ZOTERO_LIBRARY_ID
andKERKO_ZOTERO_LIBRARY_TYPE
: These variables are required for Kerko to be able to access your Zotero library. See the Configuration variables section for details on how to properly set these variables.KERKO_COMPOSER
: This variable specifies key elements needed by Kerko, e.g., fields for display and search, facets for filtering. These are defined by instanciating theComposer
class. Your application may manipulate the resulting object at configuration time to add, remove or alter fields, facets, sort options, search scopes, citation download formats, or badges. See the Kerko Recipes section for some examples.
-
Also configure the Flask-BabelEx and Bootstrap-Flask extensions:
from flask_babelex import Babel from flask_bootstrap import Bootstrap babel = Babel(app) bootstrap = Bootstrap(app)
See the respective docs of Flask-BabelEx and Bootstrap-Flask for more details.
-
Instanciate the Kerko blueprint and register it in your app:
from kerko import blueprint as kerko_blueprint app.register_blueprint(kerko_blueprint, url_prefix='/bibliography')
The
url_prefix
argument defines the base path for every URL provided by Kerko. -
In the same directory as
hello_kerko.py
with your virtualenv active, run the following shell commands:export FLASK_APP=hello_kerko.py flask kerko sync
Kerko will retrieve your bibliographic data from zotero.org. If you have a large bibliography or large attachments, this may take a while (and there is no progress indicator). In production use, that command is usually added to the crontab file for regular execution (with enough time between executions for each to complete before the next one starts).
To list all commands provided by Kerko:
flask kerko --help
-
Run your application:
flask run
-
Open http://127.0.0.1:5000/bibliography/ in your browser and explore the bibliography.
You have just built a really minimal application for Kerko. Check KerkoApp for a slightly more complete example.
Configuration variables
The variables below are required and have no default values:
KERKO_COMPOSER
: An instance of thekerko.composer.Composer
class.KERKO_ZOTERO_API_KEY
: The API key associated to the library on zotero.org. You have to create that key.KERKO_ZOTERO_LIBRARY_ID
: Your personal userID for API calls, as given on zotero.org (you must be logged-in on zotero.org).KERKO_ZOTERO_LIBRARY_TYPE
: The type of library on zotero.org (either'user'
for your main personal library, or'group'
for a group library).
Any of the following variables may be added to your configuration if you wish to override their default value:
KERKO_CSL_STYLE
: The citation style to use for formatted references. Can be either the file name (without the.csl
extension) of one of the styles in the Zotero Styles Repository (e.g.,apa
) or the URL of a remote CSL file. Defaults to'apa'
.KERKO_DATA_DIR
: The directory where to store the search index and the file attachments. Defaults todata/kerko
. Subdirectoriesindex
andattachments
will be created if they don't already exist.KERKO_DOWNLOAD_CITATIONS_LINK
: Provide a citation download button on search results pages. Defaults toTrue
.KERKO_DOWNLOAD_CITATIONS_MAX_COUNT
: Limit over which the citation download button should be hidden from search results pages. Defaults to0
(i.e. no limit).KERKO_FACET_COLLAPSING
: Allow collapsible facets. Defaults toFalse
.KERKO_PAGE_LEN
: The number of search results per page. Defaults to20
.KERKO_PAGER_LINKS
: Number of pages to show in the pager (not counting the current page). Defaults to8
.KERKO_PRINT_ITEM_LINK
: Provide a print button on item pages. Defaults toFalse
.KERKO_PRINT_CITATIONS_LINK
: Provide a print button on search results pages. Defaults toFalse
.KERKO_PRINT_CITATIONS_MAX_COUNT
: Limit over which the print button should be hidden from search results pages. Defaults to0
(i.e. no limit).KERKO_RESULTS_ABSTRACT
: Show abstracts in search result pages. Defaults toFalse
.KERKO_RESULTS_FIELDS
: List of item fields to retrieve for use in search results pages (i.e. in theKERKO_TEMPLATE_SEARCH
template). Values are keys identifying fields or facets assigned to thekerko.composer.Composer
instance. Defaults to['id', 'bib', 'coins']
. Note that'data'
gets added to the list ifKERKO_RESULTS_ABSTRACT
isTrue
, as well as any field that is required by badges.KERKO_TEMPLATE_SEARCH
: Name of the Jinja2 template to render for the search page with list of results. Defaults tokerko/search.html.jinja2
.KERKO_TEMPLATE_SEARCH_ITEM
: Name of the Jinja2 template to render for the search page with a single bibliographic record. Defaults tokerko/search-item.html.jinja2
.KERKO_TEMPLATE_ITEM
: Name of the Jinja2 template to render for the bibliographic record view. Defaults tokerko/item.html.jinja2
.KERKO_TEMPLATE_LAYOUT
: Name of the Jinja2 template that is extended by the search, search-item, and item templates. Defaults tokerko/layout.html.jinja2
.KERKO_TEMPLATE_BASE
: Name of the Jinja2 template that is extended by the layout template. Defaults tokerko/base.html.jinja2
.KERKO_TITLE
: The title to display in web pages. Defaults to'Kerko'
.KERKO_ZOTERO_BATCH_SIZE
: Number of items to request on each call to the Zotero API. Defaults to100
(which is the maximum currently allowed by the API).KERKO_ZOTERO_MAX_ATTEMPTS
: Maximum number of tries after the Zotero API has returned an error or not responded during indexing. Defaults to10
.KERKO_ZOTERO_WAIT
: Time to wait (in seconds) between failed attempts to call the Zotero API. Defaults to120
.- Localization-related variables:
BABEL_DEFAULT_LOCALE
: The default language of the user interface. Defaults to'en'
. Your application may set this variable and/or implement a locale selector function to override it (see the Flask-BabelEx documentation).KERKO_USE_TRANSLATIONS
: Use translations provided by the Kerko package. Defaults toTrue
. When this is set toFalse
, translations may be provided by the application's own translation catalog.KERKO_WHOOSH_LANGUAGE
: The language of search requests. Defaults to'en'
. You may refer to Whoosh's source to get the list of supported languages (whoosh.lang.languages
) and the list of languages that support stemming (whoosh.lang.has_stemmer()
).KERKO_ZOTERO_LOCALE
: The locale to use with Zotero API calls. This dictates the locale of Zotero item types, field names, creator types and citations. Defaults to'en-US'
. Supported locales are listed at https://api.zotero.org/schema, under "locales".
- Development/test-related variables:
KERKO_ZOTERO_START
: Skip items, start at the specified position. Defaults to0
. Useful only for development/tests.KERKO_ZOTERO_END
: Load items from Zotero until the specified position. Defaults to0
(no limit). Useful only for development/tests.
Kerko Recipes
TODO
Known limitations
- The system can probably handle relatively large bibliographies (it has been tested so far with ~15k entries), but the number of distinct facet values has more impact on response times. For the best response times, it is recommended to limit the number of distinct facet values to a few hundreds.
- Kerko can only manage a single bibliography per application.
- Although Kerko might be integrated in a multilingual web application were the visitor may select a language, Zotero does not provide a way to manage tags or collections in multiple languages. Thus, there is no easy way for Kerko to provide those names in the user's language.
- Whoosh does not provide much out-of-the-box support for non-Western languages. Therefore, search might not work very well with such languages.
- No other referencement management tool than Zotero may serve as a back-end for Kerko.
Design choices
- Do not build a back-end. Let Zotero act as the "content management" system.
- Allow Kerko to integrate into richer web applications.
- Only implement in Kerko features that are related to the exploration of a bibliography. Let other parts of the web application handle all other features that might be needed.
- Use a lightweight framework (Flask) to avoid carrying many features that are not needed.
- Use pure Python dependencies to keep installation and deployment simple. Hence the use of Whoosh for search, for example, instead of Elasticsearch or Solr.
- Use a classic architecture for the front-end. Keep it simple and avoid asset management. Some will want to replace the front-end anyway.
Translating Kerko
Kerko can be translated using Babel's setuptools integration.
The following commands should be executed from the directory that contains
setup.py
, and the appropriate virtualenv must have been activated
beforehand.
Create or update the PO template (POT) file:
python setup.py extract_messages
Create a new PO file (for a new locale) based on the POT file. Replace
YOUR_LOCALE
with the appropriate language code, e.g., de
, es
, fr
:
python setup.py init_catalog --locale YOUR_LOCALE
Update an existing PO file based on the POT file:
python setup.py update_catalog --locale YOUR_LOCALE
Compile MO files:
python setup.py compile_catalog
You are welcome to contribute your translation. See the Submitting a translation section.
Contributing
Reporting issues
Issues may be submitted on Kerko's issue tracker. Please consider the following guidelines:
- Make sure that the same issue has not already been reported or fixed in the repository.
- Describe what you expected to happen.
- If possible, include a minimal reproducible example to help others identify the issue.
- Describe what actually happened. Include the full traceback if there was an exception.
Submitting code changes
Pull requests may be submitted against Kerko's repository. Please consider the following guidelines:
- Use Yapf to autoformat your code (with
option
--style='{based_on_style: facebook, column_limit: 100}'
). Many editors provide Yapf integration. - Include a string like "Fixes #123" in your commit message (where 123 is the issue you fixed). See Closing issues using keywords.
- If a Jinja2 template represents a page fragment or a collection of macros, prefix its file name with the underscore character.
Submitting a translation
Some guidelines:
- The PO file encoding must be UTF-8.
- The header of the PO file must be filled out appropriately.
- All messages of the PO file must be translated.
Please submit your translation as a pull request against Kerko's repository, or by e-mail, with the PO file included as an attachment (do not copy the PO file's content into an e-mail's body, since that could introduce formatting or encoding issues).
Supporting the project
Nurturing an open source project such as Kerko, following up on issues and helping others in working with the system is a lot of work, but hiring the original developers of Kerko can do a lot in ensuring continued support and development of the project.
If you need professionnal support related to Kerko, have requirements not currently implemented in Kerko, want to make sure that some Kerko issue important to you gets resolved, or if you just like our work and would like to hire us for an unrelated project, please e-mail us.
Project background
Kerko was inspired by two prior projects:
- Bibliographie sur l’histoire de Montréal, developed in 2014 by David Lesieur and Patrick Fournier, of Whisky Echo Bravo, for the Laboratoire d'histoire et de patrimoine de Montréal (Université du Québec à Montréal, Canada).
- Bibliography on English-speaking Quebec, developed in 2017 by David Lesieur, for the Quebec English-Speaking Communities Research Network (QUESCREN) (Concordia University, Canada).
Later on, it became clear that other organizations needed a similar solution. However, software from the prior projects had to be rewritten so it could more easily be configured for different bibliographies from organizations with different needs. That led to Kerko, whose development was made possible through the following project:
- Bibliographie francophone sur l'archivistique, funded by the Association internationale des archives francophones (AIAF) and hosted by the École de bibliothéconomie et des sciences de l’information (EBSI) (Université de Montréal, Canada).
Etymology
The name Zotero reportedly derives from the Albanian word zotëroj, which means "to learn something extremely well, that is to master or acquire a skill in learning" (Source: Etymology of Zotero).
The name Kerko is a nod to Zotero as it takes a similar etymological route: it derives from the Albanian word kërkoj, which means "to ask, to request, to seek, to look for, to demand, to search" and seems fit to describe a search tool.
Powered by Kerko
The following online bibliographies are powered by Kerko:
If you wish to add your Kerko-powered online bibliography to this list, please e-mail us or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Kerko-0.5.tar.gz
.
File metadata
- Download URL: Kerko-0.5.tar.gz
- Upload date:
- Size: 98.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0175bd6657bc44766721461cd6f62aab441b5638d41fd77b58585817bb4c267e |
|
MD5 | c03be111b093c2f9fbc2b728d7f85f6a |
|
BLAKE2b-256 | fab7d230f4e5a8ae00bdc151276380f8ac21ec9abd0a1111fedbb8b25fdb5e31 |
File details
Details for the file Kerko-0.5-py3-none-any.whl
.
File metadata
- Download URL: Kerko-0.5-py3-none-any.whl
- Upload date:
- Size: 96.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6296fee7c41d39594a2399ac520286de673e47ea7b50674b368890fa56a4b96 |
|
MD5 | 02d7aeeef6b1e0a7439e51334d9faba9 |
|
BLAKE2b-256 | bb805eb403ecd308066d98277d6b5dabea7ec9fa62cfb10c0b002d588008f320 |