A Pelican plugin to provide a support of similar articles, allowing users to access a list of articles linked to each article by a similarity calculation on their tags.
Project description
Similar articles for Pelican
A Pelican plugin to provide a support of similar articles, allowing users to access a list of articles linked to each article by a similarity calculation on their tags.
Installation
pip install pelican-similar-articles-light
# Or locally
python setup.py develop
Template integration
Bare version:
{% if article.similar_articles %}
<ul>
{% for sub_article in article.similar_articles %}
<li><a href="{{ SITEURL }}/{{ sub_article.url }}">{{ sub_article.title }}</a></li>
{% endfor %}
</ul>
{% endif %}
With bootstrap and translations support:
{% if article.similar_articles %}
<div class="alert alert-warning text-left" role="alert">
<p><strong>{{ _("You might be interested in") ~ ' ' ~ ngettext("the following article:", "the following articles:", article.similar_articles|count) }}</strong></p>
<ul>
{% for sub_article in article.similar_articles %}
<li><a href="{{ SITEURL }}/{{ sub_article.url }}" class="alert-link">{{ sub_article.title }}</a></li>
{% endfor %}
</ul>
</div>
{% endif %}
Pelican configuration
In your pelicanconf.py
, please add/update these lines:
PLUGINS += ['pelican.plugins.similar_articles_light',]
You you can customize certain features of the plugin.
You will find below the default values which can be overwritten by a statement
in thepelicanconf.py
file.
The maximum number of similar articles:
SIMILAR_ARTICLES_MAX_COUNT = 2
The the minimal score to consider an article as similar:
SIMILAR_ARTICLES_MIN_SCORE = 0.0001
About the implementation
The plugin computes a similarity score based on the tags of the articles. It consists in building a global bag of words (dictionary), and a bag of words for each article, representing this article as an n-dimensional vector.
The terms are weighted using the TF-IDF method, according to their rareness within the corpus formed by all the tags of the site.
The vector of each article is then compared to all the others via the calculation of the cosine simiarity widely used in text mining. It consists in determining the angle formed between 2 vectors. The maximum similarity obtained is 1 (the documents have all their important tags in common), while the minimum is 0 (the documents have no tag in common).
Comparison with Similar Posts plugin
The Similar Posts plugin uses exactly the same technique, I don't think you will have any difference in the the result obtained. However, the dependencies used are a bit too large and somewhat oversized for the intended purpose: a few words (tags) summarizing an article among a handful of articles from a Pelican blog.
The implementation of Similar Articles Light is in pure Python. In any case, reinventing the wheel should never be a reason to sell a technology; therefore please consider this plugin as a proof of concept of a few dozen lines of code, fully functional and without dependencies; so probably slightly faster to run than Similar Posts.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for pelican-similar-articles-light-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2feaa33b32f45be9692e5efb1e13895aa5645a32556f68d48b9c70dc1ccaa06d |
|
MD5 | 9ade7b21bc51829bad7152de00b8235a |
|
BLAKE2b-256 | cf238e316a7f104f03e7e70c565c513eb83a947685d19e3525663e6c8076c6e2 |