Skip to main content

Creates a complete full text historical archive for an RSS or ATOM feed.

Project description

history4feed

codecov

Before you begin...

We use history4feed in the web version of Obstracts which includes many additional features over those in this codebase. You can find out more about the web version here.

Overview

It is common for feeds (RSS or XML) to only include a limited number of posts. I generally see the latest 3 - 5 posts of a blog in a feed. For blogs that have been operating for years, this means potentially thousands of posts are missed.

There is no way to page through historic articles using an RSS or ATOM feed (they were not designed for this), which means the first poll of the feed will only contain the limited number of articles in the feed. This limit is defined by the blog owner.

history4feed can be used to create a complete history for a blog and output it as an RSS feed.

history4feed offers an API interface that;

  1. takes an RSS / ATOM feed URL
  2. downloads a Wayback Machine archive for the feed
  3. identified all unique blog posts in the historic feeds downloaded
  4. downloads a HTML version of the article content on each page
  5. stores the post record in the databases
  6. exposes the posts as JSON or XML RSS

Install

Download and configure

# clone the latest code
git clone https://github.com/muchdogesec/history4feed

Configuration options

history4feed has various settings that are defined in an .env file.

To create a template for the file:

cp .env.example .env

To see more information about how to set the variables, and what they do, read the .env.markdown file.

Build the Docker Image

sudo docker compose build

Start the server

sudo docker compose up

Access the server

The webserver (Django) should now be running on: http://127.0.0.1:8002/

You can access the Swagger UI for the API in a browser at: http://127.0.0.1:8002/api/schema/swagger-ui/

Useful supporting tools

Support

Minimal support provided via the DOGESEC community.

License

Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

history4feed-1.2.1.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

history4feed-1.2.1-py3-none-any.whl (50.1 kB view details)

Uploaded Python 3

File details

Details for the file history4feed-1.2.1.tar.gz.

File metadata

  • Download URL: history4feed-1.2.1.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for history4feed-1.2.1.tar.gz
Algorithm Hash digest
SHA256 37ca140604a395d903fe21c0883f6c1bc6a766f8f203c9b0db378b9fbb7abc6e
MD5 856e22c3770bb1bda8e4162f704a3f8c
BLAKE2b-256 59c928af46510a35e8995787aed4a3b28affb326ba5190b68b3473fe1d2004a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for history4feed-1.2.1.tar.gz:

Publisher: create-release.yml on muchdogesec/history4feed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file history4feed-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: history4feed-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 50.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for history4feed-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cd0ddb4013cb6d55e4534f8615f8e07b197e178c4a8b03710a8abe51ad1c0e42
MD5 81e0d55e8ff08a0809ea2e260699cc96
BLAKE2b-256 d049851654d6c1464d2a56ebf44eb41326b2c8a7d64e05bbb7885e230a66a759

See more details on using hashes here.

Provenance

The following attestation bundles were made for history4feed-1.2.1-py3-none-any.whl:

Publisher: create-release.yml on muchdogesec/history4feed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page