Skip to main content

A personal feed/website

Project description

A personal feed/website using HPI

Live at https://purarue.xyz/feed/

This uses:

  • python: to get my data using HPI, and to cleanup/enrich it with some local data/cached API requests. my_feed index is called in the feed_index script, which syncs a JSON file up to the server which backend can combine into the sqlite database
  • golang: basic REST API to let frontend paginate through the data, authenticated endpoints for updating the sqlite database
  • typescript: public-facing frontend; requests to the backend, lets user filter/order/search the data

Data Sources:

If not mentioned its likely a module in HPI

I periodically index all my data in the background:

Extracting my_feed.sources.listens.history...
Extracting my_feed.sources.listens.history: 5388 items (took 0.14 seconds)
Extracting my_feed.sources.games.steam...
Extracting my_feed.sources.games.steam: 285 items (took 0.01 seconds)
Extracting my_feed.sources.games.osrs...
Extracting my_feed.sources.games.osrs: 924 items (took 0.03 seconds)
Extracting my_feed.sources.games.game_center...
Extracting my_feed.sources.games.game_center: 141 items (took 0.02 seconds)
Extracting my_feed.sources.games.grouvee...
Extracting my_feed.sources.games.grouvee: 243 items (took 0.15 seconds)
Extracting my_feed.sources.games.chess...
Extracting my_feed.sources.games.chess: 681 items (took 2.98 seconds)
Extracting my_feed.sources.trakt.history...
Extracting my_feed.sources.trakt.history: 15327 items (took 11.51 seconds)
Extracting my_feed.sources.mpv.history...
Extracting my_feed.sources.mpv.history: 13807 items (took 13.67 seconds)
Extracting my_feed.sources.nextalbums.history...
Extracting my_feed.sources.nextalbums.history: 1938 items (took 2.36 seconds)
Extracting my_feed.sources.mal.history...
Extracting my_feed.sources.mal.history: 20865 items (took 3.58 seconds)
Total: 59599 items
Writing to 'backend/data/1644267551.json'

... which then gets synced up and combined into the sqlite database on the backend; all handled by feed_index

That has a front-end so I can view/filter/sort stuff and view the data as an infinite scrollable list

Served with nginx in prod, like:

location /feed/ {
  proxy_pass http://127.0.0.1:4500/feed;
}

location /feed/_next/ {
  # required since the above proxy pass doesn't end with '/'
  proxy_pass http://127.0.0.1:4500/feed/_next/;
}

location /feed_api/ {
  proxy_pass http://127.0.0.1:5100/;
}

Install/Config:

For the python library:

git clone https://github.com/purarue/my_feed
pip install -e ./my_feed

... installs my_feed (or python3 -m my_feed)

This uses the HPI config structure (which you'd probably already have setup if you're using this)

To install dependencies for the servers, check the frontend and backend directories.

So, in ~/.config/my/my/config/feed.py, create a top-level sources function, which returns each function:

from typing import Iterator, Callable, TYPE_CHECKING

if TYPE_CHECKING:
    from my_feed.sources.model import FeedItem


def sources() -> Iterator[Callable[[], Iterator["FeedItem"]]]:
    # yields functions, when which called yield FeedItem
    from my_feed.sources import games

    yield games.steam
    yield games.osrs
    yield games.game_center
    yield games.grouvee
    yield games.chess

    from my_feed.sources import (
        trakt,
        listens,
        nextalbums,
        mal,
        mpv,
        facebook_spotify_listens,
    )

    yield trakt.history
    yield listens.history
    yield nextalbums.history
    yield mal.history
    yield mpv.history
    yield facebook_spotify_listens.history

The feed_index script in this repo:

  • warms the my.time.tz.via_location cache, so that timezones can be estimated for some of the data sources here
  • does an rsync for some images hosted here
  • requests the /data/ids endpoint on the server, which returns a list of known IDs (those are used to filter out duplicates before syncing)
  • runs an my_feed index to save json objects to a local file
  • Syncs the json up to my server with scp
  • Server is pinged (at /check), which makes the server process the json files, updating the local sqlite database

To blur images, my_feed index accepts a -B flag, which lets you match against the id, title, or image_url with an fnmatch or a regex. Those are placed in a file, one per line, for example:

id:*up_2009_*
title:*up_2009_*
image_url:*up_2009_*
id_regex:.*up_2009_.*
title_regex:.*up_2009_.*
image_url_regex:.*up_2009_.*

my_feed has a couple options that have developed over time, to let me ignore specific IDs (if I know they're already in the database), ignore sources which take a while to process (only do those once a week or so):

Usage: my_feed index [OPTIONS] [OUTPUT]

Options:
  --echo / --no-echo           Print feed items as they're computed
  -i, --include-sources TEXT   A comma delimited list of substrings of sources
                               to include. e.g. 'mpv,trakt,listens'
  -e, --exclude-sources TEXT   A comma delimited list of substrings of sources
                               to exclude. e.g. 'mpv,trakt,listens'
  -E, --exclude-id-file PATH   A json file containing a list of IDs to
                               exclude, from the /data/ids endpoint. reduces
                               amount of data to sync to the server
  -C, --write-count-to PATH    Write the number of items to this file
  -B, --blur-images-file PATH  A file containing a list of image URLs to blur,
                               one per line
  --help                       Show this message and exit.

feed_check

feed_check updates some of my data which is updated more often (music (both mpv and listenbrainz), tv shows (trakt), chess, albums), by comparing the IDs of the latest items in the remote database to the corresponding live data sources.

This is pretty personal as it relies on my anacron-like bgproc tool to handle updating data periodically.

So all of these follow some pattern like (e.g. for chess)

  • get the end_time of the last couple items from the my_feed database (using the same JSON endpoints the frontend uses)
  • get the first page of my chess games from the chess.com API using chess_export
  • if there's new data (the last end_time is not in the first page of the API), then:
  • If anything was printed by the script:
    • I know at least one thing has expired, so I run bgproc_on_machine to update all the expired data
    • Run scripts/feed_index to update the my_feed database on my server

feed_check runs once every 15 minutes, so my data is never more than 15 minutes out of date.

Example output:

[I 230921 15:44:15 feed_check:213] Checking 'check_albums'
[I 230921 15:44:18 feed_check:42] Requesting https://purarue.xyz/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=album
[I 230921 15:44:18 feed_check:213] Checking 'check_trakt'
[D 230921 15:44:18 export:32] Requesting 'https://api-v2launch.trakt.tv/users/purplepinapples/history?limit=100&page=1'...
[D 230921 15:44:20 export:46] First item: {'id': 9230963378, 'watched_at': '2023-09-21T08:03:23.000Z', 'action': 'watch', 'type': 'episode', 'episode': {'season': 1, 'number': 1, 'title': 'ROMANCE DAWN', 'ids': {'trakt': 5437335, 'tvdb': 8651297, 'imdb': 'tt11748904', 'tmdb': 2454621, 'tvrage': None}}, 'show': {'title': 'ONE PIECE', 'year': 2023, 'ids': {'trakt': 184618, 'slug': 'one-piece-2023', 'tvdb': 392276, 'imdb': 'tt11737520', 'tmdb': 111110, 'tvrage': None}}}
[I 230921 15:44:20 feed_check:42] Requesting https://purarue.xyz/feed_api/data/?offset=0&order_by=when&sort=desc&limit=10&ftype=trakt_history_movie,trakt_history_episode
[I 230921 15:44:21 feed_check:213] Checking 'check_chess'
[I 230921 15:44:21 feed_check:42] Requesting https://purarue.xyz/feed_api/data/?offset=0&order_by=when&sort=desc&limit=10&ftype=chess
Requesting https://api.chess.com/pub/player/purarue/games/archives
Requesting https://api.chess.com/pub/player/purarue/games/2023/09
[I 230921 15:44:22 feed_check:213] Checking 'check_mpv'
[I 230921 15:44:23 feed_check:42] Requesting https://purarue.xyz/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=listen
[I 230921 15:44:23 feed_check:213] Checking 'check_listens'
[I 230921 15:44:23 feed_check:42] Requesting https://purarue.xyz/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=listen
[D 230921 15:44:25 export:62] Requesting https://api.listenbrainz.org/1/user/purarue/listens?count=100
[D 230921 15:44:25 export:84] Have 100, now searching for listens before 2023-09-11 04:39:08...
[I 230921 15:44:25 feed_check:213] Checking 'check_mal'
[I 230921 15:44:25 feed_check:42] Requesting https://purarue.xyz/feed_api/data/?offset=0&order_by=when&sort=desc&limit=50&ftype=anime,anime_episode
Expired: mpv.history
removed '/home/username/.local/share/evry/data/my-feed-index-bg'
2023-09-21T15-44-35:bg-feed-index:running my_feed index...
Indexing...

This also has the upside of updating my local data whenever there are any changes to the data sources, which means any scripts using the corresponding HPI modules also stay up to date.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hpi_feed-0.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hpi_feed-0.1.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file hpi_feed-0.1.0.tar.gz.

File metadata

  • Download URL: hpi_feed-0.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for hpi_feed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3d659822d622d8d002f6c8b932beb6ec05f9ebc562a690a30f2d4f63c4f48295
MD5 51ef0b9465f845aa71f5481e1e72f080
BLAKE2b-256 52b6b241f57482b96c169ee9a2193e0ebbeaadb732d50cb4c188bfdcc1caee94

See more details on using hashes here.

File details

Details for the file hpi_feed-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hpi_feed-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for hpi_feed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f7ea3f0d614d3de4788b8e83d5c19a2fc8872588065d8eed9c1269c0707542b
MD5 18aa6996efe4b976d373b8e2de93e244
BLAKE2b-256 050847802586726d13e4dff20be6e853ae43c60898396eb5434b6b34527435fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page