Skip to main content

A Flask blueprint that provides a faceted search interface for bibliographies based on Zotero.

Project description

License Version Tests status

Kerko

Kerko is a web application component implemented in Python for the Flask framework that provides a user-friendly search and browsing interface for sharing a bibliography managed with the Zotero reference manager.

The combination of Kerko and Zotero gives you the best of both worlds: a rich but easy to use web interface for end-users of the bibliography, and a well-established and powerful bibliographic reference management tool for individuals or teams working on the bibliography's content.

Contents

How it works

A Kerko-powered bibliography is managed using Zotero, and stored in the cloud on zotero.org, while Kerko itself is incorporated into an application which is installed on a web server. The bibliographic references may reside in a Zotero group library, where multiple users may collaborate to manage the content, or in a Zotero private library. On the web server, Kerko maintains a search index, which is a copy of the Zotero library that is optimized for search. When users interact with the web application, Kerko gets all the required data from that search index, without ever contacting zotero.org. It is through a scheduled task, which runs at regular intervals, that Kerko automatically brings its search index up to date by using the Zotero Web API to retrieve the latest data from zotero.org.

As a Flask blueprint (a "blueprint" is Flask's term for an application component, similar to what some other systems might call a plugin or an extension), Kerko only works when incorporated into a Flask application. However, a sample stand-alone application is available, KerkoApp, which is pre-built with Kerko and ready to be deployed on a web server. KerkoApp might work for you if you like the default appearance and if the provided configuration options are sufficient for your needs, otherwise you should probably consider building a custom application. In a custom application, the Kerko-powered bibliography might be just one section of a larger website.

Demo site

A KerkoApp-based demo site is available for you to try. You may also view the Zotero library that contains the source data for the demo site.

Features

The following features are implemented in Kerko:

  • Faceted search interface: allows exploration of the bibliography both in search mode and in browsing mode, potentially suiting different user needs, behaviors and abilities. For example, users with a prior idea of the topic or expected results are able to enter keywords or a more complex query in a search field, while those who wish to become familiar with the content of the bibliography or discover new topics may choose to navigate along the proposed facets, to narrow or broaden their search. Since both modes are integrated into a single interface, it is possible to combine them.
  • Keyword search features:
    • Boolean operators:
      • AND: matches items that contain all specified terms. This is the default relation between terms when no operator is specified, e.g., a b is the same as a AND b.
      • OR: matches items that contain any of the specified terms, e.g., a OR b.
      • NOT: excludes items that match the term, e.g., NOT a.
      • Boolean operators must be specified in uppercase and may be translated in other languages.
    • Logical grouping (with parentheses), e.g., (a OR b) AND c.
    • Sequence of words (with double quotes), e.g., "a b c". The default difference between word positions is 1, meaning that an item will match if it contains the words next to each other, but a different maximum distance may be selected (with the tilde character), e.g. "web search"~2 allows up to 1 word between web and search, meaning it could match web site search as well as web search.
    • Term boosting (with the caret), e.g., faceted^2 search browsing^0.5 specifies that faceted is twice as important as search when computing the relevance score of results, while browsing is half as important. Boosting may be applied to a logical grouping, e.g., (a b)^3 c.
    • Keyword search is case-insensitive, accents are folded, and punctuation is ignored. To further improve recall (albeit at the cost of precision), stemming is also performed on terms from most text fields, e.g., title, abstract, notes. Stemming relieves the user from having to specify all variants of a word when searching, e.g., terms such as search, searches, and searching all return the same results. The Snowball algorithm is used for that purpose.
    • Full-text search: the text content of PDF attachments can be searched.
    • Scope of search: users may choose to search everywhere, in author/contributor names, in titles, in all fields (i.e., in metadata and notes), or in documents (i.e., in the text content of attachments). Applications may provide additional choices.
  • Faceted browsing: allows filtering by topic (Zotero tag), by resource type (Zotero item type), by publication year. Moreover, an application may define facets modeled on collections and subcollections; in such case, any collection can be represented as a facet, and each subcollection as a value within that facet. By taking advantage of Zotero's ability to assign any given item to multiple collections, a faceted classification scheme can be modeled (including hierarchies within facets).
  • Relevance scoring: provided by the Whoosh library and based on the BM25F algorithm, which determines how important a term is to a document in the context of the whole collection of documents, while taking into account its relation to document structure (in this regard most fields are neutral, but the score is boosted when a term appears in specific fields, e.g., DOI, ISBN, ISSN, title, author/contributor). Any keyword search asks the question "how well does this document match this query clause?", which requires calculating a relevance score for each document. Filtering with facets, on the other hand, has no effect on the score because it asks "does this document match this query clause?", which leads to a yes or no answer.
  • Sort options: by relevance score (only applicable with keyword search), by publication date, by author, by title.
  • Citation styles: any from the Zotero Style Repository, or custom stylesheet defined in the Citation Style Language (stylesheet must be accessible by URL).
  • Language support: the default user interface is in English, but some translations are provided. Additional translations may be created using gettext-compatible tools; see Translating Kerko. Also to consider: locales supported by the Zotero Data Schema (which provides the names of fields, item types and author types displayed by Kerko); languages supported by Whoosh (which provides the search capabilities), i.e., ar, da, nl, en, fi, fr, de, hu, it, no, pt, ro, ru, es, sv, tr.
  • Responsive design: the simple default implementation works on large monitors as well as on small screens. It is based on Bootstrap.
  • Customizable front-end: applications may partly or fully replace the default templates, scripts and stylesheets with their own.
  • Semantic markup: pages generated by Kerko embed HTML markup that can be detected by web crawlers (helping the indexing of your records by search engines) or by web browsers (allowing users of reference management tools to easily import metadata in their library). Supported schemes are:
    • OpenURL COinS, in search results pages and individual bibliographic record pages. COinS is recognized by many reference management tools, including the Zotero Connector browser extension.
    • Highwire Press tags, in the individual bibliographic record pages of book, conference paper, journal article, report or thesis items. These tags are recommended for indexing by Google Scholar, and are recognized by many other databases and reference management tools, including the Zotero Connector browser extension.
  • Exporting: users may export individual records as well as complete bibliographies corresponding to search results. By default, download links are provided for the RIS and BibTeX formats, but applications may be configured to export any format supported by the Zotero API.
  • Printing: stylesheets are provided for printing individual bibliographic records as well as lists of search results. When printing search results, all results get printed (not just the current page of results).
  • Notes and attachments: notes, attached files, and attached links to URIs are synchronized from zotero.org and made available to users of the bibliography. Regular expressions may be used to include or exclude such child items from the bibliography, based on their tags.
  • DOI, ISBN and ISSN resolver: items that have such identifier in your library can be referenced by appending their identifier to your Kerko site's base URL.
  • Relations: bibliographic record pages show links to related items, if any. You may define such relations using Zotero's Related field. Moreover, Kerko adds the Cites and Cited by relation types, which can be managed in Zotero through notes (see Kerko Recipes. Custom applications can add more types of relations if desired.
  • Badges: icons can be displayed next to items, based on custom conditions.
  • Integration: although a standalone application is available, Kerko itself is not an application, but it can be integrated into any Flask application.
  • Command line interface: Kerko provides commands for synchronizing or deleting its data. These can be invoked through the flask command (see Command line interface).

Requirements

Kerko requires Python 3.7 or later.

Dependencies

The following packages will be automatically installed when installing Kerko:

  • Babel: utilities for internationalization and localization.
  • Bootstrap-Flask: helper for integrating Bootstrap.
  • Flask: web application framework.
  • Flask-Babel: helps Kerko provide its own translations, at the blueprint level.
  • Flask-WTF: simple integration of Flask and WTForms.
  • Jinja2: template engine.
  • Pyzotero: Python client for the Zotero API.
  • w3lib: URL and HTML manipulation utilities.
  • Werkzeug: WSGI web application library (also required by Flask).
  • Whoosh: pure Python full-text indexing and searching library.
  • WTForms: web forms validation and rendering library.

The following front-end resources are loaded from CDNs by Kerko's default templates (but could be completely removed or replaced by your application):

  • Bootstrap: front-end component library for web applications.
  • FontAwesome: beautiful open source icons.
  • jQuery: JavaScript library (required by Bootstrap).
  • Popper.js: JavaScript library for handling tooltips, popovers, etc. (used by Bootstrap).

Getting started

This section only applies if you intend to integrate Kerko into your own application. If you are more interested into the standalone KerkoApp application, please refer to its installation instructions.

We'll assume that you have some familiarity with Flask and suggest steps for building a minimal app, let's call it hello_kerko.py, to get you started.

  1. The first step is to install Kerko. As with any Python library, it is highly recommended to install Kerko within a virtual environment.

    Once the virtual environment is set and active, use the following command:

    pip install kerko
    
  2. In hello_kerko.py, configure variables required by Kerko and create your app object, as in the example below:

    import pathlib
    
    from flask import Flask
    from kerko.composer import Composer
    
    app = Flask(__name__)
    app.config['SECRET_KEY'] = '_5#y2L"F4Q8z\n\xec]/'  # Replace this value.
    app.config['KERKO_ZOTERO_API_KEY'] = 'xxxxxxxxxxxxxxxxxxxxxxxx'  # Replace this value.
    app.config['KERKO_ZOTERO_LIBRARY_ID'] = '9999999'  # Replace this value.
    app.config['KERKO_ZOTERO_LIBRARY_TYPE'] = 'group'  # Replace this value if necessary.
    app.config['KERKO_DATA_DIR'] = str(pathlib.Path(__file__).parent / 'data' / 'kerko')
    app.config['KERKO_COMPOSER'] = Composer()
    
    • SECRET_KEY: This variable is required for generating secure tokens in web forms. It should have a secure, random value and it really has to be secret. It is usually set in an environment variable rather than in Python code, to make sure it never ends up in a code repository. But here we're taking the minimal route and thus are cutting some corners!
    • KERKO_ZOTERO_API_KEY, KERKO_ZOTERO_LIBRARY_ID and KERKO_ZOTERO_LIBRARY_TYPE: These variables are required for Kerko to be able to access your Zotero library. See Configuration variables for details on how to properly set these variables.
    • KERKO_DATA_DIR: This variable specifies the directory where to store the search index and the file attachments. If the specified directory does not already exists, Kerko will try to create it.
    • KERKO_COMPOSER: This variable specifies key elements needed by Kerko, e.g., fields for display and search, facets for filtering. These are defined by instantiating the Composer class. Your application may manipulate the resulting object at configuration time to add, remove or alter fields, facets, sort options, search scopes, record download formats, or badges. See Kerko Recipes for some examples.
  3. Also configure the Flask-Babel and Bootstrap-Flask extensions:

    from flask_babel import Babel
    from flask_bootstrap import Bootstrap
    
    babel = Babel(app)
    bootstrap = Bootstrap(app)
    

    See the respective docs of Flask-Babel and Bootstrap-Flask for more details.

  4. Instantiate the Kerko blueprint and register it in your app:

    from kerko import blueprint as kerko_blueprint
    
    app.register_blueprint(kerko_blueprint, url_prefix='/bibliography')
    

    The url_prefix argument defines the base path for every URL provided by Kerko.

  5. In the same directory as hello_kerko.py with your virtual environment active, run the following shell commands:

    export FLASK_APP=hello_kerko.py
    flask kerko sync
    

    Kerko will retrieve your bibliographic data from zotero.org. If you have a large bibliography or large attachments, this may take a while (and there is no progress indicator). In production use, that command is usually added to the crontab file for regular execution (with enough time between executions for each to complete before the next one starts).

    To list all commands provided by Kerko:

    flask kerko --help
    
  6. Run your application:

    flask run
    
  7. Open http://127.0.0.1:5000/bibliography/ in your browser and explore the bibliography.

You have just built a really minimal application for Kerko. This code example is available at KerkoStart. See also KerkoApp for a slightly more complete example.

Configuration variables

The variables below are required and have no default values:

  • KERKO_COMPOSER: An instance of the kerko.composer.Composer class.
  • KERKO_DATA_DIR: The directory where to store the search index and the file attachments. Subdirectories index and attachments will be created if they do not already exist.
  • KERKO_ZOTERO_API_KEY: Your API key, as created on zotero.org.
  • KERKO_ZOTERO_LIBRARY_ID: The identifier of the library to get data from. For your personal library this value should be your userID, as found on https://www.zotero.org/settings/keys (you must be logged-in). For a group library this value should be the groupID of the library, as found in the URL of that library (e.g., in https://www.zotero.org/groups/2348869/kerko_demo, the groupID is 2348869).
  • KERKO_ZOTERO_LIBRARY_TYPE: The type of library to get data from, either 'user' for your personal library, or 'group' for a group library.

Any of the following variables may be added to your configuration if you wish to override their default value:

  • KERKO_CSL_STYLE: The citation style to use for formatted references. Can be either the file name (without the .csl extension) of one of the styles in the Zotero Styles Repository (e.g., apa) or the URL of a remote CSL file. Defaults to 'apa'.
  • KERKO_DOWNLOAD_ATTACHMENT_NEW_WINDOW: Open attachments in new windows, i.e., add the target="_blank" attribute to attachment links. Defaults to False.
  • KERKO_DOWNLOAD_CITATIONS_LINK: Provide a record download button on search results pages. Defaults to True.
  • KERKO_DOWNLOAD_CITATIONS_MAX_COUNT: Limit over which the record download button should be hidden from search results pages. Defaults to 0 (i.e. no limit).
  • KERKO_FACET_COLLAPSING: Allow collapsible facets. Defaults to False.
  • KERKO_FULLTEXT_SEARCH: Allow full-text search of PDF attachments. Defaults to True. To get consistent results, see Ensuring full-text indexing of your attachments in Zotero.
  • KERKO_HIGHWIREPRESS_TAGS: Embed Highwire Press tags into the HTML of item pages. This should help search engines such as Google Scholar index your items, but works only with book, conference paper, journal article, report or thesis items. Defaults to True (i.e. enabled).
  • KERKO_PAGE_LEN: The number of search results per page. Defaults to 20.
  • KERKO_PAGER_LINKS: Number of pages to show in the pager (not counting the current page). Defaults to 4.
  • KERKO_PRINT_ITEM_LINK: Provide a print button on item pages. Defaults to False.
  • KERKO_PRINT_CITATIONS_LINK: Provide a print button on search results pages. Defaults to False.
  • KERKO_PRINT_CITATIONS_MAX_COUNT: Limit over which the print button should be hidden from search results pages. Defaults to 0 (i.e. no limit).
  • KERKO_RELATIONS_INITIAL_LIMIT: Number of related items to show above the "view all" link. Defaults to 5.
  • KERKO_RELATIONS_LINKS: Show item links in lists of related items. Defaults to False. Enabling this only has an effect if at least one of the following variables is also set to True: KERKO_RESULTS_ATTACHMENT_LINKS, KERKO_RESULTS_URL_LINKS).
  • KERKO_RESULTS_ABSTRACTS: Show abstracts on search result pages. Defaults to False (abstracts are hidden).
  • KERKO_RESULTS_ABSTRACTS_TOGGLER: Show a button letting users show or hide abstracts on search results pages. Defaults to True (toggle is displayed).
  • KERKO_RESULTS_ABSTRACTS_MAX_LENGTH: Truncate abstracts at the given length (in number of characters). If text is to be truncated in the middle of a word, the whole word is discarded instead. Truncated text is appended with an ellipsis sign ("..."). Defaults to 0 (abstracts get displayed in their full length, without any truncation).
  • KERKO_RESULTS_ABSTRACTS_MAX_LENGTH_LEEWAY: If the length of an abstract only exceeds KERKO_RESULTS_ABSTRACTS_MAX_LENGTH by this tolerance margin (in number of characters), it will not be truncated. Defaults to 0 (no tolerance margin).
  • KERKO_RESULTS_ATTACHMENT_LINKS: Provide links to attachments in search results. Defaults to True.
  • KERKO_RESULTS_URL_LINKS: Provide links to online resources in search results (for items whose URL field has a value). Defaults to True.
  • KERKO_RESULTS_FIELDS: List of item fields to retrieve for search results (most notably used by the KERKO_TEMPLATE_SEARCH_ITEM template). Values in this list are keys identifying fields defined in the kerko.composer.Composer instance. One probably only needs to change the default list when overriding the template to display additional fields. Note that some fields from the default list may be required by other Kerko functions.
  • KERKO_TEMPLATE_SEARCH: Name of the Jinja2 template to render for the search page with list of results. Defaults to kerko/search.html.jinja2.
  • KERKO_TEMPLATE_SEARCH_ITEM: Name of the Jinja2 template to render for the search page with a single bibliographic record. Defaults to kerko/search-item.html.jinja2.
  • KERKO_TEMPLATE_ITEM: Name of the Jinja2 template to render for the bibliographic record view. Defaults to kerko/item.html.jinja2.
  • KERKO_TEMPLATE_LAYOUT: Name of the Jinja2 template that is extended by the search, search-item, and item templates. Defaults to kerko/layout.html.jinja2.
  • KERKO_TEMPLATE_BASE: Name of the Jinja2 template that is extended by the layout template. Defaults to kerko/base.html.jinja2.
  • KERKO_TITLE: The title to display in web pages. Defaults to 'Kerko'.
  • KERKO_ZOTERO_BATCH_SIZE: Number of items to request on each call to the Zotero API. Defaults to 100 (which is the maximum currently allowed by the API).
  • KERKO_ZOTERO_MAX_ATTEMPTS: Maximum number of tries after the Zotero API has returned an error or not responded during indexing. Defaults to 10.
  • KERKO_ZOTERO_WAIT: Time to wait (in seconds) between failed attempts to call the Zotero API. Defaults to 120.
  • Localization-related variables:
    • BABEL_DEFAULT_LOCALE: The default language of the user interface. Defaults to 'en'. Your application may set this variable and/or implement a locale selector function to override it (see the Flask-Babel documentation).
    • BABEL_DEFAULT_TIMEZONE: The timezone to use for user facing dates. Defaults to 'UTC'. Your application may set this variable and/or implement a timezone selector function to override it (see the Flask-Babel documentation). Any timezone name supported by the pytz package should work.
    • KERKO_USE_TRANSLATIONS: Use translations provided by the Kerko package. Defaults to True. When this is set to False, translations may be provided by the application's own translation catalog.
    • KERKO_WHOOSH_LANGUAGE: The language of search requests. Defaults to 'en'. You may refer to Whoosh's source to get the list of supported languages (whoosh.lang.languages) and the list of languages that support stemming (whoosh.lang.has_stemmer()).
    • KERKO_ZOTERO_LOCALE: The locale to use with Zotero API calls. This dictates the locale of Zotero item types, field names, creator types and citations. Defaults to 'en-US'. Supported locales are listed at https://api.zotero.org/schema, under "locales".
  • GOOGLE_ANALYTICS_ID: A Google Analytics property ID, e.g., 'UA-99999-9'. This variable is optional and there is no default value. If set, the Google Analytics tag is inserted into the pages.
  • Development/test-related variables:
    • KERKO_ZOTERO_START: Skip items, start at the specified position. Defaults to 0. Useful only for development/tests.
    • KERKO_ZOTERO_END: Load items from Zotero until the specified position. Defaults to 0 (no limit). Useful only for development/tests.

Caution: Many of the configuration variables cause changes to the structure of Kerko's cache or search index. Changing those variables may require that you rebuild the cache or the search index, and restart the application. See the command line interface for the cleaning and synchronization commands.

Synchronization process

Kerko does one-way data synchronization from zotero.org through a 3-step process:

  1. Synchronize the Zotero library into a local cache.
  2. Update of the search index from the cache.
  3. Download the file attachments from Zotero.

The first step performs incremental updates of the local cache. After an initial full update, the subsequent synchronization runs will request only new and updated items from Zotero. This greatly reduces the number of Zotero API calls, and thus the time required to complete the synchronization process.

The second step reads data from the cache to update the search index. If the cache has changed since the last update, it performs a full update of the search index, otherwise it skips to the next step. Any changes to the search index are "committed" as a whole at the end of this step, thus up to that point any user using the application sees the data that was available prior to the synchronization run.

The third and last step reads the list of file attachments from the search index, with their MD5 hashes. It compares those with the available local copies of the files, and downloads new or changed files from Zotero. It also deletes any local files that may no longer be used.

Normally, all synchronization steps are executed. But under certain circumstances it can be useful to execute a given step individually. For example, after changing some configuration settings, one may clean just the search index and rebuild it from the cache (see the command line interface below), which will be much faster than re-synchronizing from Zotero.

Command line interface (CLI)

Kerko provides an integration with the Flask command line interface. The flask command will work with your virtual environment active, and with the FLASK_APP environment variable set to tell it where to find your application.

Some frequently used commands are:

# List all commands provided by Kerko:
flask kerko --help

# Delete all of Kerko's data.
flask kerko clean

# Get help about the clean command:
flask kerko clean --help

# Synchronize everything from Zotero.
flask kerko sync

# Get help about the sync command:
flask kerko sync --help

# Delete the cache (the next sync will perform a full update from Zotero, but
# it will not re-download all file attachments).
flask kerko clean cache

# Delete just the search index.
flask kerko clean index

# Synchronize just the search index.
flask kerko sync index

Known limitations

  • The system can probably handle relatively large bibliographies (it has been tested so far with ~15k entries), but the number of distinct facet values has more impact on response times. For the best response times, it is recommended to limit the number of distinct facet values to a few hundreds.
  • Kerko can only manage a single bibliography per application.
  • Although Kerko can be integrated in a multilingual web application were the visitor may select a language, Zotero does not provide a way to manage tags or collections in multiple languages. Thus, there is no easy way for Kerko to provide those names in the user's language.
  • Whoosh does not provide much out-of-the-box support for non-Western languages. Therefore, search might not work very well with such languages.
  • Zotero is the sole reference management tool supported as a back-end to Kerko.

Design choices

  • Do not build a back-end. Let Zotero act as the "content management" system.
  • Allow Kerko to integrate into richer web applications.
  • Only implement in Kerko features that are related to the exploration of a bibliography. Let other parts of the web application handle all other features that might be needed.
  • Use a lightweight framework (Flask) to avoid carrying many features that are not needed.
  • Use pure Python dependencies to keep installation and deployment simple. Hence the use of Whoosh for search, for example, instead of Elasticsearch or Solr.
  • Use a classic fullstack architecture. Keep it simple and avoid asset management. Some will want to replace the templates and stylesheets anyway.

Kerko Recipes

TODO: More recipes!

Ensuring full-text indexing of your attachments in Zotero

Kerko's full-text indexing relies on text content extracted from attachments by Zotero. Consequently, for Kerko's full-text search to work, you must make sure that full-text indexing works in Zotero first; see Zotero's documentation on full-text indexing.

Individual attachments in Zotero can be indexed, partially indexed, or unindexed. Various conditions may cause an attachment to be partially indexed or unindexed, e.g., file is large, has not been processed yet, or does not contain text.

Zotero shows the indexing status in the attachment's right pane. If it shows "Indexed: Yes", all is good. If it shows "Indexed: No" or "Indexed: Partial", then clicking the "Reindex Item" button (next to the indexing status) should ensure that the attachment gets fully indexed, that is if the file actually contains text. If there is no "Reindex Item" button, it probably means that Zotero does not support that file type for full-text indexing (at the moment, it only supports PDF and plain text files).

It can be tedious to go through hundreds of attachments just to find out whether they are indexed or not. To make things easier, you could create a saved search in your Zotero library to get an always up-to-date list of unindexed PDFs. Use the following search conditions:

  • Match all of the following:
    • Attachment File TypeisPDF
    • Attachment Contentdoes not contain. (that's a period; also select RegExp in the small dropdown list, as that will make the period match any character)

Controlling the indexing status will not only improve full-text search on your Kerko site, but also full-text search from within Zotero!

Providing Cites and Cited by relations

Zotero allows one to link items together through its Related field. However, such relations are not typed nor directed, making it impossible (1) to indicate the nature of the relation, or (2) to distinguish which of two related items is the citing entity, and which is the one being cited. Consequently, Kerko has its own method for setting up those relations.

To establish Cites relations in your Zotero library, you must follow the procedure below:

  • Install the Zutilo plugin for Zotero. Once it is installed, go to Tools > Zutilo Preferences... in Zotero. Then, under Zotero item menu, select Zotero context menu next to the Copy Zotero URIs menu item. This configuration step only needs to be done once.
  • Select one or more items from your library that you wish to show as cited by another. Right-click on one of the selected items to open the context menu, and select Copy Zotero URIs from that menu. This copies the references of the selected items items to the clipboard.
  • Right-click the item from your library that cites the items. Select Add Note from that item's context menu to add a child note.
  • In the note editor, paste the content of the clipboard. The note should then contain a series of URIs looking like https://www.zotero.org/groups/9999999/items/ABCDEFGH or https://www.zotero.org/users/9999999/items/ABCDEFGH.
  • At the bottom of the note editor, click into the Tags field and type _cites. That tag that will tell Kerko that this particular note is special, that it contains relations.

At the next synchronization, Kerko will retrieve the references found in notes tagged with _cites. Afterwards, proper hyperlinked citations will appear in the Cites and Cited by sections of the related bibliographic records.

Remarks:

  • Enter only the Cites relations. The reverse Cited by relations will be inferred automatically.
  • You may only relate items that belong to the same Zotero library.
  • You may use Zotero Item Selects (URIs starting with zotero://select/) in the notes, if you prefer those to Zotero URIs.
  • If entered as plain text, URIs must be separated by one or more whitespace character(s). Alternatively, URIs may be entered in HTML links, i.e., in the href attribute of <a> elements.
  • Hopefully, Zotero will provide nicer ways for handling relation types in the future. In the meantime, using child notes is how Kerko handles it. If relation types are important to you, consider describing your use case in the Zotero forums.
  • Custom Kerko applications can provide more types of relations, if desired, in addition to Cites and Cited by.

Translating Kerko

Kerko can be translated using Babel's setuptools integration.

The following commands should be executed from the directory that contains setup.py, and the appropriate virtual environment must have been activated beforehand.

Create or update the PO template (POT) file:

python setup.py extract_messages

Create a new PO file (for a new locale) based on the POT file. Replace YOUR_LOCALE with the appropriate language code, e.g., de, es, fr:

python setup.py init_catalog --locale YOUR_LOCALE

Update an existing PO file based on the POT file:

python setup.py update_catalog --locale YOUR_LOCALE

Compile MO files:

python setup.py compile_catalog

You are welcome to contribute your translation. See Submitting a translation.

Contributing

Reporting issues

Issues may be submitted on Kerko's issue tracker. Please consider the following guidelines:

  • Make sure that the same issue has not already been reported or fixed in the repository.
  • Describe what you expected to happen.
  • If possible, include a minimal reproducible example to help others identify the issue.
  • Describe what actually happened. Include the full traceback if there was an exception.

Making code changes

Clone the Kerko repository into a local directory. Set up a virtual environment, then install this local version of Kerko in the virtual environment, including development and testing dependencies by running the following command from Kerko's root directory, i.e., where setup.cfg resides:

pip install -e .[dev,tests]

Running the tests

To run basic tests in your current environment:

python -m unittest

To check code coverage as well, use this command instead:

coverage run -m unittest

Then generate the coverage report:

coverage report

Note: Test coverage is still very low at the moment. You are welcome to contribute new tests!

To run the full test suite under different environments (using the various Python interpreters available on your machine):

tox

Submitting code changes

Pull requests may be submitted against Kerko's repository. Please consider the following guidelines:

  • Before submitting, run the tests and make sure they pass. Add tests relevant to your change (those should fail if ran without your patch).
  • Use Yapf to autoformat your code (with option --style='{based_on_style: facebook, column_limit: 100}'). Many editors provide Yapf integration.
  • Include a string like "Fixes #123" in your commit message (where 123 is the issue you fixed). See Closing issues using keywords.
  • If a Jinja2 template represents a page fragment or a collection of macros, prefix its file name with the underscore character.

Submitting a translation

Some guidelines:

  • The PO file encoding must be UTF-8.
  • The header of the PO file must be filled out appropriately.
  • All messages of the PO file must be translated.

Please submit your translation as a pull request against Kerko's repository, or by e-mail, with the PO file included as an attachment (do not copy the PO file's content into an e-mail's body, since that could introduce formatting or encoding issues).

Supporting the project

Nurturing an open source project such as Kerko, following up on issues and helping others in working with the system is a lot of work, but hiring the original developers of Kerko can do a lot in ensuring continued support and development of the project.

If you need professional support related to Kerko, have requirements not currently implemented in Kerko, want to make sure that some Kerko issue important to you gets resolved, or if you just like our work and would like to hire us for an unrelated project, please e-mail us.

Changelog

For a summary of changes by release version, see the changelog.

Project background

Kerko was inspired by two prior projects:

Later on, it became clear that other organizations needed a similar solution. However, software from the prior projects had to be rewritten so it could more easily be configured for different bibliographies from organizations with different needs. That led to Kerko, whose development was made possible through the following project:

Etymology

The name Zotero reportedly derives from the Albanian word zotëroj, which means "to learn something extremely well, that is to master or acquire a skill in learning" (Source: Mark Dingemanse, 2008, Etymology of Zotero).

The name Kerko is a nod to Zotero as it takes a similar etymological route: it derives from the Albanian word kërkoj, which means "to ask, to request, to seek, to look for, to demand, to search" and seems fit to describe a search tool.

Powered by Kerko

The following online bibliographies are powered by Kerko:

If you wish to add your Kerko-powered online bibliography to this list, please e-mail us or submit a pull request.

Changelog

0.8 (2021-11-16)

Warning: Upgrading from version 0.7.x or earlier will require that you clean and re-sync your existing search index. Use the following commands, then restart the application:

flask kerko clean index
flask kerko sync

Features:

  • Allow full-text search of PDF attachments. This can be disabled by setting KERKO_FULLTEXT_SEARCH to False. Since this feature relies on Zotero's full-text indexing, you must make sure that it works in Zotero first; see Zotero's documentation.
  • Add new search scopes "Everywhere" (to search both metadata fields and the text content of attached documents) and "In documents" (to search the text content of attached documents). The scope "In all fields" allows to search all metadata fields, but not the text content of attached documents.
  • Display "View on {hostname}" links under search result items, for quick access to the items' URLs. These can be disabled by setting KERKO_RESULTS_URL_LINKS to False.
  • Move the "Read" buttons under search result items, as "Read document" links. These can now be disabled by setting KERKO_RESULTS_ATTACHMENT_LINKS to False.
  • Display DOI field values as hyperlinks (both in DOI fields, and in the Extra field when lines are prefixed with 'DOI:').
  • Add support for imported file attachments, e.g., PDF files imported in your Zotero library through the Zotero Connector. Previously, only "attached copies of files" were supported.
  • Standalone notes and file attachments are now allowed into the search index. Kerko filters them out of search results, but custom applications could search them. A new view, standalone_attachment_download, lets one retrieve a standalone file attachment.
  • Add configuration options for truncating long abstracts in search results (KERKO_RESULTS_ABSTRACTS_MAX_LENGTH and KERKO_RESULTS_ABSTRACTS_MAX_LENGTH_LEEWAY).
  • Embed Highwire Press tags in item pages. This is enabled by default but can be disabled by setting KERKO_HIGHWIREPRESS_TAGS to False.
  • Allow tracking with Google Analytics (optional).
  • Allow relations in child notes to be specified as HTML links, i.e., in the href attribute of <a> elements.
  • Allow inclusion or exclusion of items based on multiple tags (previously, only a single pattern could be checked).

Bug fixes:

  • Fix irrelevant sync warnings, from extractors running on attachment items.
  • Fix empty prev/next links in search pages metadata.

Other changes:

  • Make synchronization from Zotero much more efficient through incremental updates. Instead of performing a full synchronization each time, Kerko now retrieves just the newly added or updated items. This dramatically reduces the number of Zotero API calls (and time) required to update Kerko's search index. Note: More work is planned to eliminate some Zotero API calls that Kerko still makes early in the synchronization process and that could be avoided when its cache is already up-to-date.
  • Add a sync cache command to the command line interface.
  • On narrow screens, stack search form controls for better usability.
  • Respond with an HTTP 503 (Service Unavailable) when the search index is empty or unreadable.
  • Make sorts more efficient by setting the sortable Whoosh flag on relevant fields.
  • Leading and trailing underscore characters (_) are now trimmed from facet value labels. This happens after sorting the values, which means that the underscore can still be used as a prefix to alter the alphabetical order.
  • Support more timezone names. Timezone names such as 'US/Eastern' or 'Europe/London' previously did not work, and times could not be converted to daylight saving times.
  • Change labels:
    • "Print this citation" → "Print this record" (on item pages)
    • "Download this citation" → "Download this record" (on item & search pages)
  • Inject blocks in item Jinja2 template to facilitate theming.
  • Slightly increase some top/bottom margins.
  • Add the type HTML attribute to record download links.
  • Add the rel="alternate" HTML attribute to record download links on item pages. Also add a corresponding link element to the page head.
  • Added utilities for running automated integration tests. This will allow testing many areas of Kerko that previously could hardly be tested.

Backwards incompatible changes:

  • Remove deprecated kerko index CLI command (use kerko sync instead).

Possibly backwards incompatible changes (more or less internal API changes):

  • Upgrade many dependencies, including new major versions of Flask (2.x), Jinja2 (3.x), Werkzeug (2.x), Click (8.x).
  • The default list for the KERKO_RESULTS_FIELDS setting now includes the 'url' field. If you have overridden that setting in your application and KERKO_RESULTS_URL_LINKS is enabled, you'll probably have to add 'url' too.
  • The schema field item_type has been renamed to item_type_label. If you have custom templates, please review any use of item.item_type.
  • The structure of the kerko/_search-result.html.jinja2 template has changed somewhat. If you have overridden it, you'll need to review the changes.
  • The ItemContext class has been eliminated. The Extractor.extract() method now receives an item's dictionary instead of an ItemContext object, and if an item has children these are now available directly in the item (with the children key). If you have created custom extractor classes, their extract() method will need to be adapted accordingly.
  • Some extractor classes have been renamed:
    • BaseAttachmentsExtractorBaseChildAttachmentsExtractor
    • BaseNotesExtractorBaseChildNotesExtractor
    • LinkedURIAttachmentsExtractorChildLinkedURIAttachmentsExtractor
    • NotesTextExtractorChildNotesTextExtractor
    • RawNotesExtractorRawChildNotesExtractor
    • RelationsInNotesExtractorRelationsInChildNotesExtractor
    • StoredFileAttachmentsExtractorChildFileAttachmentsExtractor
  • A view has been renamed:
    • item_attachment_downloadchild_attachment_download
  • A default field has been renamed:
    • alternateIdalternate_id

0.7.1 (2021-02-04)

Security fixes:

  • Fix unescaped date fields, causing a vulnerability to XSS attacks. This vulnerability was introduced in version 0.7.

Bug fixes:

  • Fix wrong locale separator in the HTML lang attribute.

Other changes:

  • Remove unwanted spacing after dropdown labels.

Documentation changes:

  • Fix missing info about library groupID in configuration docs. Thanks @drmikeuk for reporting the issue.

0.7 (2021-01-08)

Warning: Upgrading from version 0.6 or earlier will require that you clean and re-sync your existing search index. Use the following commands, then restart the application:

flask kerko clean index
flask kerko sync

Features:

  • Allow users to toggle the display of abstracts on search results pages.
  • Allow inclusion or exclusion of items based on their tags (#4).
  • Show attached links to URIs on item pages.
  • Show relations on item pages. The relation types provided by default are:
    • Related, based on Zotero's Related field.
    • Cites, managed through child notes containing Zotero URIs and tagged with the _cites tag.
    • Cited by, automatically inferred from Cites relations.
  • The Extra field is now searched when searching "in any fields".
  • Items that have a DOI, ISBN or ISSN identifier can be referenced by appending their identifier to your Kerko site's base URL.
  • Requests for the older URL of an item whose ID has changed are now automatically redirected to the item's current URL. This relies on the dc.replaces relation that's managed internally by Zotero on some operations such as item merges.
  • Help users who might mistakenly bookmark a search result's URL rather than the item's permanent URL: Add an id parameter to the search result URLs, and redirect the user to that item's permanent URL if the search result no longer matches because of database changes.
  • Redirect to the parent item's page when the user tries to request an attachment that no longer exists.
  • Improve accessibility based on WCAG recommendations and WAI-ARIA standards:
    • Add labels to search form elements.
    • Add landmark role search to the search form.
    • Make the purpose of various links more obvious through improved or added labels.
    • Add the aria-label attribute to many elements.
    • Add text to indicate the current value of widgets.
    • Add the aria-current attribute to indicate the current value of widgets.
    • Remove useless link to the current page from the pagination widget.

Bug fixes:

  • Fix crash when trying to sync a link attachment (#3).
  • Fix unhandled exception during sync when an attachment cannot be downloaded.
  • Fix page numbers greater than the page count in search URLs generating wrong page numbers for search result item URLs.
  • Fix secondary keys getting sorted in reverse order with some sort options, e.g., when sorting by newest first, results having the same date were then sorted by creator name in reverse alphabetical order instead of alphabetical order.
  • Fix empty HTML element taking up horizontal space when there are no badges.

Other changes:

  • Display ISO 8601 calendar dates in a more readable format, using the formatting style of the locale.
  • Show a timezone abbreviation along with time of last update from Zotero.
  • Add German translation. Thanks @mmoole.
  • Fix broken "Getting started" example in README.
  • Migrate most package distribution options and metadata from setup.py to setup.cfg.
  • Migrate project to a src layout.
  • Use Flask-Babel instead of its fork Flask-BabelEx, now that is has merged the translation domain features from Flask-BabelEx.

Backwards incompatible changes:

  • Drop support for Python 3.6. Kerko is no longer being tested under Python 3.6. Known issue with 3.6 at this point: some ISO 8601 dates cannot be parsed and reformatted; instead of being displayed in a locale-sensitive manner, these get displayed as is. More issues might arise in the future with Python 3.6 as Kerko continues to evolve.
  • All values of the pager dict passed to the _pager.html.jinja2 template are now lists. Previously, only the values at keys 'before' and 'after' were lists; now the values at keys 'previous', 'first', 'current', 'last', and 'next' are lists as well.
  • The words 'blacklist' and 'whitelist' in variable names are replaced with 'exclude' and 'include'.
  • The KERKO_RESULTS_ABSTRACT configuration variable is replaced by two variables, KERKO_RESULTS_ABSTRACTS (note the now plural form) and KERKO_RESULTS_ABSTRACTS_TOGGLER.
  • Citation download URLs now have the form {url_prefix}/{itemID}/export/{format} for individual items ('export' has been inserted), and {url_prefix}/export/{format}/ for search result pages ('download' has been replaced by 'export').
  • The Extractor class' interface has changed, improving consistency and separation of concerns:
    • All arguments to __init__() must now be specified as keyword arguments.
    • The extract() method no longer have a document argument, and the spec argument is now the last one. The method now returns a value instead of assigning it to the document.
    • The new extract_and_store() method handles extraction, encoding, and assignment to the document, assigning the value only when it is not None.
  • The AttachmentsExtractor class has been renamed to StoredFileAttachmentsExtractor.
  • InCollectionExtractor now extends collection membership to subcollections. To preserve the previous behavior, set the check_subcollections parameter to False when initializing the extractor.

Possibly backwards incompatible changes (more or less internal API changes):

  • The search_results variable passed to the search.html.jinja2 template is now an iterator of tuples, where the first element of each tuple is a result, and the second element the URL of the result.

0.6 (2020-06-15)

Security fixes:

  • Fix multiple vulnerabilities to XSS attacks. All previous versions of Kerko were vulnerable, thus an upgrade is highly recommended.

Backwards incompatible changes:

  • Remove default value for the KERKO_DATA_DIR configuration variable. KerkoApp users don't need to worry about this as KerkoApp takes care of it, but custom apps that did not already set this variable now have to.

Features:

  • Open PDF documents in the browser's built-in PDF viewer (instead of opening the browser's file download popup).
  • Add buttons for opening documents directly from search result pages (these replace the previous paperclip badges).
  • Add button at the top of item pages for opening documents (makes the availability of such documents much more obvious).
  • Add the KERKO_DOWNLOAD_ATTACHMENT_NEW_WINDOW configuration variable to control whether to open documents in a new window or in the same window.
  • Display the date and time of the last successful synchronization from Zotero at the bottom of search results.

Bug fixes:

  • Preserve newlines when displaying the value of the Extra field.
  • Preserve newlines when displaying abstracts in search result pages.
  • Fix filters missing on search pages that have no results.
  • Avoid empty box in print media when there is no search criteria.
  • Avoid empty box when the search index is missing.
  • Fix pluralization in CLI time elapsed messages.

Other changes:

  • Refer to attachments as "documents" in the interface, and replace the paperclip icon with a file icon.
  • Remove CSRF token from search form. Token expiration can impede legitimate users, and the token is unnecessary as the form does not change the application's state.
  • Add a proper message when none of the filters provided in the URL are recognized.
  • Improve documentation.
  • Add INFO-level log message to report successful synchronization from Zotero.
  • Add blocks in templates to facilitate theming.

Possibly backwards incompatible changes (more or less internal API changes):

  • Rename the content_with_badges template macro as badges, and leave it to the caller to display content.
  • Remove badges that are related to attachments.

0.5 (2019-11-19)

Warning: Upgrading from version 0.4 or earlier will require that you clean and re-sync your existing search index. Use the following commands:

flask kerko clean index
flask kerko sync

Features:

  • Add support for Zotero attachments.
  • Allow configuration of badges on items. The 'attachment' badge is provided by default, displaying an icon on items that have one or more attachments.
  • Add help modal.
  • Improve customizability:
    • Add KERKO_TEMPLATE_* configuration variables for page template names.
    • Use configurable, separate templates to render facets and badges (see the renderer argument to kerko.specs.FacetSpec, kerko.specs.BadgeSpec).
    • Add the KERKO_RESULTS_FIELDS configuration variable to specify which fields to retrieve with search queries.
  • Add building blocks for creating boolean facets based on collection membership (new class kerko.extractors.InCollectionExtractor, new parameters for kerko.codecs.BooleanFacetCodec).

Bug fixes:

  • Fix facets not ordered by weight on item page.
  • Preserve newlines in abstract display.
  • Fix incorrect use of bookmark link on item pages, set canonical link instead.
  • Prevent text overflow in some browsers on citations containing long URLs.

Other changes:

  • Deprecate CLI command kerko index in favor of new command kerko sync.
  • Change title of the "Refine" panel to "Explore".
  • Change labels of the "Print" and "Download" buttons to "Print this citation" and "Download this citation", to prevent any confusion with attachment downloading.
  • Show the facets in a more robust and accessible Bootstrap modal, on small screens, instead of the home-built drawer.
  • Use compact pagination widget on small screens.
  • Tweak sizing, positioning, and spacing of various UI elements.
  • Improve accessibility of various UI elements.
  • Make citation stand out more in item page.
  • Hide some elements and decorations in print media.
  • Make search query more efficient on item page.

Possibly backwards incompatible changes (more or less internal API changes):

  • Force keyword arguments with kerko.composer.Composer.__init__().
  • Rename kerko.composer.Composer.__init__() arguments default_note_whitelist_re as default_child_whitelist_re, default_note_blacklist_re as default_child_blacklist_re.
  • Rename method kerko.views.item() as kerko.views.item_view().
  • Rename template file _facet.html.jinja2 as _facets.html.jinja2.
  • Replace argument checkboxes in template macro field() with add_link_icon and remove_link_icon.

0.4 (2019-09-28)

Features:

  • Allow search term boosting in relevance score calculation, e.g. faceted^2 search browsing^0.5.

Security fixes:

  • Update minimum Werkzeug version to 0.15.3. See CVE-2019-14806: "Pallets Werkzeug before 0.15.3, when used with Docker, has insufficient debugger PIN randomness because Docker containers share the same machine id."

Other changes:

  • Update jQuery version to 3.4.1.
  • Update French translations (translate boolean search operators).
  • Improve search form validation and error display.
  • Disable not-so-intuitive boolean search operators (AndNot, AndMaybe, Require were unwanted but enabled by default by Whoosh's OperatorsPlugin).
  • Improve documentation.
  • Code cleanup.

0.3 (2019-07-29)

Features:

  • Exporting: users may export individual citations as well as complete bibliographies corresponding to search results. By default, download links are provided for the RIS and BibTeX formats, but applications may be configured to export any format supported by the Zotero API.

Bug fixes:

  • Fix bad alignment of field names in print mode.
  • Remove warning when indexing an item with no authors (#1).

Other changes:

  • Move print button to bottom of search pages (next to the new download dropdown).
  • Improve documentation.
  • Compile message catalog before building sdist and wheel.

Possibly backwards incompatible changes (more or less internal API changes):

  • Method kerko.composer.Composer.get_ordered_specs() replaces get_ordered_scopes(), get_ordered_facets() and get_ordered_sorts().

0.3alpha1 (2019-07-17)

  • Fix broken links in documentation.

0.3alpha0 (2019-07-16)

  • First PyPI release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Kerko-0.8.tar.gz (186.2 kB view hashes)

Uploaded Source

Built Distribution

Kerko-0.8-py3-none-any.whl (148.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page