Creates a complete full text historical archive for an RSS or ATOM feed.
Project description
history4feed
Before you begin...
We use history4feed in the web version of Obstracts which includes many additional features over those in this codebase. You can find out more about the web version here.
Overview
It is common for feeds (RSS or XML) to only include a limited number of posts. I generally see the latest 3 - 5 posts of a blog in a feed. For blogs that have been operating for years, this means potentially thousands of posts are missed.
There is no way to page through historic articles using an RSS or ATOM feed (they were not designed for this), which means the first poll of the feed will only contain the limited number of articles in the feed. This limit is defined by the blog owner.
history4feed can be used to create a complete history for a blog and output it as an RSS feed.
history4feed offers an API interface that;
- takes an RSS / ATOM feed URL
- downloads a Wayback Machine archive for the feed
- identified all unique blog posts in the historic feeds downloaded
- downloads a HTML version of the article content on each page
- stores the post record in the databases
- exposes the posts as JSON or XML RSS
Install
Download and configure
# clone the latest code
git clone https://github.com/muchdogesec/history4feed
Configuration options
history4feed has various settings that are defined in an .env file.
To create a template for the file:
cp .env.example .env
To see more information about how to set the variables, and what they do, read the .env.markdown file.
Build the Docker Image
sudo docker compose build
Start the server
sudo docker compose up
Access the server
The webserver (Django) should now be running on: http://127.0.0.1:8002/
You can access the Swagger UI for the API in a browser at: http://127.0.0.1:8002/api/schema/swagger-ui/
Useful supporting tools
- Full Text, Full Archive RSS Feeds for any Blog
- An up-to-date list of threat intel blogs that post cyber threat intelligence research
- Donate to the Wayback Machine
Support
Minimal support provided via the DOGESEC community.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file history4feed-1.0.9.tar.gz.
File metadata
- Download URL: history4feed-1.0.9.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef0ae7b4f11c99770157f8779bfc6bf4ed5b318b525b7c6b523d3f25e97f0431
|
|
| MD5 |
e5849d31bac9c030da61f406f9a49192
|
|
| BLAKE2b-256 |
52f8587d91505ac41f7bfb1c505efb35bcfd482861d17f9759505df0ab0492b4
|
Provenance
The following attestation bundles were made for history4feed-1.0.9.tar.gz:
Publisher:
create-release.yml on muchdogesec/history4feed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
history4feed-1.0.9.tar.gz -
Subject digest:
ef0ae7b4f11c99770157f8779bfc6bf4ed5b318b525b7c6b523d3f25e97f0431 - Sigstore transparency entry: 737968175
- Sigstore integration time:
-
Permalink:
muchdogesec/history4feed@c73837a594201205055a1a10e4354989956b27fb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/muchdogesec
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
create-release.yml@c73837a594201205055a1a10e4354989956b27fb -
Trigger Event:
push
-
Statement type:
File details
Details for the file history4feed-1.0.9-py3-none-any.whl.
File metadata
- Download URL: history4feed-1.0.9-py3-none-any.whl
- Upload date:
- Size: 45.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
becc96e37e228b598fdcc14b52b361f478403a801d3b023772e66023e4ea08e5
|
|
| MD5 |
4956b156651ff6774b7e95aff79a34d4
|
|
| BLAKE2b-256 |
58b61c414caf3787298b4f541039ffe09df9377e5c8a445dfef3349ab7bab820
|
Provenance
The following attestation bundles were made for history4feed-1.0.9-py3-none-any.whl:
Publisher:
create-release.yml on muchdogesec/history4feed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
history4feed-1.0.9-py3-none-any.whl -
Subject digest:
becc96e37e228b598fdcc14b52b361f478403a801d3b023772e66023e4ea08e5 - Sigstore transparency entry: 737968180
- Sigstore integration time:
-
Permalink:
muchdogesec/history4feed@c73837a594201205055a1a10e4354989956b27fb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/muchdogesec
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
create-release.yml@c73837a594201205055a1a10e4354989956b27fb -
Trigger Event:
push
-
Statement type: