mutimon

Mutimon — a config-driven web scraper that monitors websites for changes and sends email notifications

These details have not been verified by PyPI

Project links

Project description

A generic, config-driven web scraper that monitors websites for changes and sends email notifications. Define what to scrape using CSS selectors in a JSON config file, and format notifications with Liquid templates.

Designed to run as a cron job. Each rule has its own schedule (cron expression), so the script can be invoked frequently (e.g. every hour) and each rule runs only when its schedule is due.

Installation

From PyPI

pip install mutimon

This installs the mon command.

From source

git clone https://github.com/jcubic/mutimon.git
cd mutimon
pip install .

This installs the mon command from the local source, including all dependencies.

First run

On the first run, the tool creates ~/.mutimon/ with a skeleton config and example rules (Hacker News + Bitcoin price alerts):

mon
# Config not found at /home/user/.mutimon/config.json
# Creating skeleton configuration in /home/user/.mutimon...
# Done. Edit /home/user/.mutimon/config.json to configure your scraping rules.

Edit ~/.mutimon/config.json with your SMTP credentials and scraping rules, then run again.

Usage

mon                    # process rules; only prints notifications and errors
mon --force            # ignore schedules, run all rules now
mon --force <rule>     # ignore schedule, run only the named rule
mon --dry-run          # fetch and display data, bypass schedules, no state changes
mon --save-email       # save email to file instead of sending via SMTP
mon --validate         # validate config against schema and exit
mon --list             # list all rule names (usable with --force <rule>)
mon --ai-guide         # print the path to the AI instruction file for adding websites
mon --cron             # print a cron entry with resolved path (default: every 5 min)
mon --cron "0 8 * * *" # print a cron entry with a custom schedule
mon -v, --verbose      # show detailed progress (page fetches, counts, skipped rules)
mon -q, --quiet        # suppress all output including errors

Cron setup

Use --cron to generate a cron entry with the correct resolved path (works with pyenv, virtualenvs, etc.):

mon --cron                  # default: every 5 minutes
mon --cron "0 * * * *"      # custom: every hour

Install it directly:

(crontab -l 2>/dev/null; mon --cron) | crontab -

Each rule's schedule field controls when it actually executes, so running mon frequently (e.g. every 5 minutes) is safe — rules only fire when their cron expression matches.

File structure

~/.mutimon/
  config.json              # main configuration
  templates/               # Liquid email templates
    hackernews
  data/                    # state files (tracked items per rule)
    hackernews
    .lastrun_hackernews    # last run timestamp for schedule tracking
    emails/                # saved copies of sent emails

Configuration

A JSON Schema is provided for editor autocompletion and validation. Add "$schema": "./config.schema.json" to your config file, or point to the raw URL if hosted on GitHub.

The config is validated against the schema on every run. If the config is invalid, an error email with all validation errors is sent to all rule recipients and the script exits.

The config file (~/.mutimon/config.json) has three sections:

`email` -- SMTP server

"email": {
  "server": {
    "host": "smtp.example.com",
    "port": 587,
    "password": "your-password",
    "email": "you@example.com"
  }
}

`defs` -- Reusable scraping definitions and commands

Each definition describes how to fetch and parse data from a website. The optional commands key defines reusable Liquid tag commands (see Commands).

"hackernews": {
  "url": "https://news.ycombinator.com",
  "pagination": { ... },
  "query": {
    "type": "list",
    "selector": "tr.athing.submission",
    "id": { ... },
    "filter": { ... },
    "variables": { ... }
  }
}

Fields:

Field	Required	Description
`url`	yes	URL to fetch. Supports Liquid variables from rule params, e.g. `https://example.com?q={{query}}`
`format`	no	`"html"` (default) or `"xml"`. Use `"xml"` for RSS/Atom feeds or any XML document. Switches BeautifulSoup to the lxml XML parser (requires `lxml`).
`userAgent`	no	Custom User-Agent header. If omitted, a default browser-like User-Agent is used. Useful for RSS feeds or APIs that require a specific User-Agent.
`params`	no	List of parameter names used in the URL template
`pagination`	no	Pagination config (see below)
`query.type`	yes	`"list"` (multiple items) or `"single"` (one item)
`query.selector`	yes	CSS selector for item container(s). For XML, use XML element names (e.g. `item` for RSS, `entry` for Atom).
`query.id`	no	How to extract a unique ID per item (see below)
`query.filter`	no	Filter to exclude items (see below)
`query.expect`	no	List of CSS selectors that must exist on the page (see Expected structure). Sends error email if missing.
`query.reject`	no	List of CSS selectors that indicate no real results (see Reject selectors). Returns 0 items if any match.
`query.variables`	yes	Named fields to extract (see below)

`rules` -- What to run

Each rule references a definition and can override params, email recipient, template, etc.

{
  "ref": "hackernews",
  "name": "hackernews",
  "schedule": "0 */6 * * *",
  "subject": "Hacker News: {{count}} new stories",
  "template": "./templates/hackernews",
  "email": "you@example.com"
}

Fields:

Field	Required	Description
`ref`	yes	Name of the definition in `defs`
`name`	yes	Unique rule name. Used for state file (`~/.mutimon/data/<name>`)
`schedule`	no	Cron expression or array of expressions (see Schedule). If omitted, runs every time.
`subject`	yes	Liquid template for the email subject line
`template`	yes	Path to the Liquid template file (relative to `~/.mutimon/`)
`email`	yes	Recipient email address
`params`	no	Values for the definition's URL template variables. Used when `input` is not specified.
`input`	no	One or more input entries with params and optional validators (see Multiple inputs). Overrides `params`.

Variable extraction

Each variable in query.variables defines how to extract a value from a matched element:

"title": {
  "selector": ".titleline > a",
  "value": {
    "type": "text"
  }
}

Value types

Type	Description	Extra fields
`text`	Inner text of the element
`attribute`	HTML attribute value	`name` -- attribute name (e.g. `"href"`)

Optional value modifiers

Field	Description
`regex`	Extract a capture group from the raw value. Uses group(1) if available.
`prefix`	String prepended to the final value. Useful for turning relative URLs into absolute.
`parse`	Convert the extracted string to a typed value. `"number"`: plain numeric parsing for integers and floats, strips commas as thousands separators (e.g. `"1,234"` -> `1234`, `"3.14"` -> `3.14`). `"money"`: locale-aware currency parsing via babel, auto-detects page language from `<html lang>` or `Content-Language` header, strips currency symbols and percent signs, handles US (`$70,528.40`), European (`11,8000 zł`), and mixed (`11.800,50 €`) formats. `"list"`: split the value into a list using the `delimiter` regex (default `\s,\s`), use `{% for x in item.field %}` in templates. `"json"`: parse the value as JSON, then optionally extract structured data with `query` (see JSON extraction). Parsed values are used by validators.
`delimiter`	Regex pattern used to split the value when `parse` is `"list"`. Defaults to `\s,\s` (comma with optional surrounding whitespace).
`query`	Only for `parse: "json"`. Defines how to navigate and extract variables from the parsed JSON using JMESPath (see JSON extraction).

Optional variable fields

Field	Description
`default`	Fallback value if the selector doesn't match or the value is empty
`sibling`	When `true`, search the next sibling element instead of within the matched element. Needed when data is split across adjacent HTML elements (e.g. Hacker News stores title and score in separate `<tr>` rows).
`collect`	When `true`, collect ALL matching elements (using `select()` instead of `select_one()`). Returns a list that can be iterated in templates with `{% for skill in item.skills %}`. Useful for extracting lists of tags, skills, or categories from repeated elements.

Special selectors

Selector	Description
`:self`	References the container element itself instead of searching for a child. Useful when the container is an `<a>` tag and you need its `href` attribute.

Example with all options

"url": {
  "selector": "a.job__title-link",
  "value": {
    "type": "attribute",
    "name": "href",
    "regex": "^(/.*)",
    "prefix": "https://useme.com"
  }
}

This selects the href attribute from a.job__title-link, extracts the path with a regex, then prepends the domain.

Collecting multiple values

When an item contains repeated elements (e.g. skill tags, categories), use collect: true to extract all matches as a list:

"skills": {
  "selector": ".skill-tag",
  "value": { "type": "text" },
  "collect": true
}

This finds all .skill-tag elements inside the item container and returns a list like ["TypeScript", "React", "Node.js"]. Use a loop in the template:

{% for skill in item.skills %}{{ skill }}{% unless forloop.last %}, {% endunless %}{% endfor %}

Self-referencing the container

When the container element itself holds the data you need (e.g. an <a> tag with an href), use :self:

"url": {
  "selector": ":self",
  "value": {
    "type": "attribute",
    "name": "href",
    "prefix": "https://example.com"
  }
}

JSON extraction

Some websites embed structured data as JSON inside <script> tags (e.g. Next.js apps use <script id="__NEXT_DATA__">). When the HTML elements don't contain all the data you need, you can extract it from the embedded JSON instead.

Use parse: "json" combined with a query to navigate the JSON structure using JMESPath expressions.

Basic structure

"locations": {
  "selector": "script#__NEXT_DATA__",
  "value": {
    "type": "text",
    "parse": "json",
    "query": {
      "type": "list",
      "path": "props.pageProps.data.items[?id == `{{id}}`].offers[]",
      "variables": {
        "city": { "path": "displayWorkplace" },
        "url": { "path": "offerAbsoluteUri" }
      }
    }
  }
}

How it works:

selector selects the element containing JSON (e.g. a <script> tag) — standard CSS selector
type: "text" extracts the text content — same as any other variable
parse: "json" parses the text as a JSON object
query navigates the parsed JSON and extracts variables:

Field	Required	Description
`type`	yes	`"list"` (returns array of objects) or `"single"` (returns one object)
`path`	no	JMESPath expression to navigate the JSON. Supports Liquid variables (`{{id}}`, `{{name}}`, etc.) rendered against the current item's data. If omitted, the root JSON object is used.
`variables`	yes	Named fields to extract from each result. Each has a `path` (JMESPath sub-expression).

The path supports Liquid variable interpolation, so you can match JSON entries to the current HTML item. For example, {{id}} is replaced with the item's extracted ID before the JMESPath query runs.

JMESPath syntax

JMESPath is a query language for JSON. Common patterns:

Expression	Description
`foo.bar.baz`	Navigate nested objects
`items[0]`	Array index
`items[*].name`	Get `name` from all array entries
`items[?id == \`123`]`	Filter: entries where `id` equals `123`
`items[?score > \`50`]`	Filter: entries where `score` > 50
`items[].offers[]`	Flatten nested arrays

Note: literal values in JMESPath filters use backticks (`), not quotes. See the JMESPath tutorial for full syntax.

Template usage

When query.type is "list", the variable is a list of objects accessible in templates:

{% for loc in item.locations %}
* {{ loc.city }}: {{ loc.url }}
{% endfor %}

When query.type is "single", the variable is a flat object:

{{ item.metadata.author }} - {{ item.metadata.date }}

Works with attributes too

JSON can also appear in HTML attributes. Use type: "attribute" with parse: "json":

"config": {
  "selector": "[data-config]",
  "value": {
    "type": "attribute",
    "name": "data-config",
    "parse": "json",
    "query": {
      "type": "single",
      "variables": {
        "status": { "path": "status" },
        "count": { "path": "meta.count" }
      }
    }
  }
}

Example: Next.js multi-location job offers

Pracuj.pl (a Next.js app) lists job offers with multi-location variants. The HTML card only shows the title, but the city-specific URLs are in __NEXT_DATA__:

"url_list": {
  "selector": "script#__NEXT_DATA__",
  "value": {
    "type": "text",
    "parse": "json",
    "query": {
      "type": "list",
      "path": "props.pageProps.dehydratedState.queries[0].state.data.groupedOffers[?offers[0].partitionId == `{{id}}`].offers[]",
      "variables": {
        "city": { "path": "displayWorkplace" },
        "url": { "path": "offerAbsoluteUri" }
      }
    }
  }
}

The {{id}} in the path is the item's ID extracted from the HTML (data-test-offerid attribute). JMESPath filters the groupedOffers array to find the matching entry, then flattens its offers[] sub-array. Each offer's displayWorkplace and offerAbsoluteUri are extracted as city and url.

Item identity (deduplication)

The id field in the query spec controls how the scraper identifies items it has already seen.

From a variable with regex

"id": {
  "source": "url",
  "regex": ",(\\d+)/$"
}

Takes the url variable value and extracts the ID using a regex. The source can reference either a variable name (from variables) or a param name (from input/params). When using input, params are merged into items before ID extraction, so "source": "symbol" works if symbol is a param.

From an HTML attribute

"id": {
  "type": "attribute",
  "name": "id"
}

Reads the id attribute directly from the matched element (e.g. <tr id="47415919">).

Fallback

If no id spec is provided, the url variable is used as the identity. If there's no url either, a hash of all variables is used.

Filtering

The filter field excludes items based on CSS class:

"filter": {
  "selector": ".job__header-details--date",
  "exclude_class": "job__header-details--closed"
}

This finds .job__header-details--date within each item and skips the item if it has the class job__header-details--closed. Items where the filter selector doesn't match any element are also excluded.

Expected structure

The expect field on a query spec lists CSS selectors that must exist on the page. If any are missing, the scraper sends an error email about HTML structure changes instead of silently producing empty results.

"query": {
  "expect": [".text-center img[alt='Linux']", ".pagination"],
  "selector": "...",
  ...
}

This is checked on the first page only. Useful for detecting when a website redesigns and your selectors break.

Reject selectors

The reject field is the inverse of expect — it lists CSS selectors that indicate the page has no real results. If any selector matches, the page returns 0 items. This is useful for sites that show recommended or unrelated content when there are no actual matches for the search query.

"query": {
  "reject": ["nfj-no-offers-found-header"],
  "selector": "...",
  ...
}

For example, nofluffjobs.com shows a "Brak wyników wyszukiwania" message and recommended jobs when a language has no remote offers. The reject selector detects the no-results element and prevents those recommendations from being treated as real results.

Multiple inputs

The input field allows a single rule to scrape multiple pages with different parameters and combine the results into one email. This is useful for monitoring multiple items on the same website (e.g. multiple stock symbols).

input can be a single object or an array:

{
  "ref": "bankier",
  "name": "akcje",
  "subject": "[bankier.pl] Zmiany Akcji",
  "template": "./templates/bankier",
  "email": "you@example.com",
  "input": [
    { "params": { "symbol": "BIOMAXIMA" }, "validator": { "test": "{{price}} > 10" } },
    { "params": { "symbol": "AGORA" }, "validator": { "test": "{{price}} > 9.5" } },
    { "params": { "symbol": "ASSECOPOL" } },
    { "params": { "symbol": "POLTREG" } }
  ]
}

Each entry fetches the URL with its own params. If input is omitted, the rule's params field is used directly (backward compatible).

Params from each input entry are merged into the extracted items, so they're available in templates (e.g. {{symbol}}).

Validators

Each input entry can have a validator object that filters extracted items. The validator supports two condition types. If both are present, both must pass (AND logic).

`test` -- Numeric expression

A numexpr expression with Liquid variable placeholders. Variables should use "parse": "number" or "parse": "money" in the definition so they're available as floats.

"validator": {
  "test": "{{price}} > 9.5"
}

Supported operations:

Operator	Example
Comparison	`{{price}} > 10`, `{{change_pct}} <= -5`
AND	`({{price}} > 10) & ({{change_pct}} < 0)`
OR	`({{price}} < 5) \| ({{price}} > 100)`
Arithmetic	`{{price}} * {{quantity}} > 1000`
Functions	`abs({{change_pct}}) > 3`

Use parentheses to group compound expressions. See the numexpr documentation for the full list of supported operations.

`match` -- Regex match

Matches a variable value against a regex pattern. Uses re.search() so the pattern matches anywhere unless anchored with ^ or $.

"validator": {
  "match": {
    "var": "title",
    "regex": "^Ask HN"
  }
}

Match condition fields:

Field	Required	Description
`var`	one of `var` or `value`	Direct variable name — returns the raw value, preserving lists from `collect: true`
`value`	one of `var` or `value`	Liquid template string rendered against item variables (always produces a string)
`regex`	one of `regex`, `include`, or `exclude`	Regex pattern tested with `re.search()` (matches anywhere unless anchored). For list values, elements are joined with `", "` before matching.
`include`	one of `regex`, `include`, or `exclude`	Array of strings — passes if any string is found (see below)
`exclude`	one of `regex`, `include`, or `exclude`	Array of strings — passes if none are found (see below)
`strict`	no	When `true`, `include`/`exclude` use exact string equality instead of substring match. Only affects string values — list values always use exact element matching. Default `false`.
`exist`	no	Whether the regex pattern should exist. Default `true`. Set to `false` to pass when the regex does NOT match. Not needed with `exclude`.

Set "exist": false to pass when the pattern is not found. This is useful for detecting when something disappears from a page:

"validator": {
  "match": {
    "var": "status",
    "regex": "Coming soon",
    "exist": false
  }
}

`include` / `exclude` -- String list match

Use include or exclude instead of regex when checking against a list of plain strings. Use var to reference the variable directly — when the variable is a list (from collect: true), each element is compared as an exact match, so "Java" will match the skill "Java" but not "JavaScript". For plain string values, substring matching is used by default (use strict: true for exact matching).

"validator": {
  "match": {
    "var": "skills",
    "exclude": ["Angular", "C#", ".NET", "Java"]
  }
}

match can also be an array of match objects (AND logic — all must pass):

"validator": {
  "match": [
    { "var": "platform", "regex": "Linux" },
    { "var": "status", "regex": "Coming soon", "exist": false }
  ]
}

Combined example

Both conditions must pass (AND logic within a single object):

"validator": {
  "test": "{{price}} > 80",
  "match": {
    "var": "company",
    "regex": "Asseco"
  }
}

Array of validators (OR logic)

The validator can also be an array. The item is included if any validator in the array passes. This is useful for defining price thresholds or notification steps:

"validator": [
  { "test": "{{price}} > 8" },
  { "test": "{{price}} > 9" },
  { "test": "{{price}} > 9.5" }
]

Each entry in the array is a full validator object that can use test, match, or both.

Required validators (`require`)

In a validator array, set "require": true to make a validator mandatory. Required validators must ALL pass (AND logic), while the remaining validators use OR logic (at least one must pass). If only required validators exist, the OR check is skipped.

This is useful for combining a baseline filter with threshold alerts:

"validator": [
  { "require": true, "test": "{{score_num}} > 50" },
  { "test": "{{price}} > 75000" },
  { "test": "{{price}} > 80000" },
  { "test": "{{price}} > 100000" }
]

The require validator acts as a gate — items must pass it before the OR thresholds are even considered.

Reusable validators (`@id`)

Define shared validators in defs.validators and reference them by name using {"@id": "name"}. This eliminates duplication when multiple rules use the same filter:

"defs": {
  "validators": {
    "job-board": {
      "require": true,
      "match": [
        {"var": "title", "exclude": ["Angular", "C#", ".NET"]},
        {"var": "skills", "exclude": ["Angular", "C#", ".NET", "Java"]}
      ]
    }
  }
}

Then reference it in rules:

"input": {
  "validator": {"@id": "job-board"}
}

@id references work anywhere a validator is expected — as a standalone validator, or as an element in a validator array:

"validator": [
  {"@id": "job-board"},
  { "require": true, "match": { "var": "salary", "regex": "Undisclosed", "exist": false } }
]

Commands

Commands are reusable Liquid tags defined in defs.commands. Each command becomes a custom {% tag %} that can be used in validator test and match expressions, replacing verbose Liquid expressions with short, readable tags.

Defining commands

Commands are defined in the commands key under defs:

"defs": {
  "commands": {
    "fresh": {
      "args": ["field", "seconds"],
      "template": "{{ field | date: \"%s\" }} > {{ \"now\" | date: \"%s\" | minus: seconds }}"
    },
    "today": {
      "args": ["field"],
      "template": "{{ field | date: \"%Y%m%d\" }} == {{ \"now\" | date: \"%Y%m%d\" }}"
    }
  },
  ...
}

Field	Required	Description
`args`	no	Ordered list of argument names. Values are passed positionally when the tag is used.
`template`	yes	Liquid template string rendered with bound arguments. Argument names are available as variables.

Using commands

Use commands as {% name arg1 arg2 %} in any validator test or match expression:

"validator": {
  "test": "{% fresh date 604800 %}"
}

This is equivalent to writing the full Liquid expression:

"test": "{{ date | date: \"%s\" }} > {{ \"now\" | date: \"%s\" | minus: 604800 }}"

Arguments are matched positionally to the args list in the command definition. Word arguments (like date) are resolved as variables from the item context. Numeric arguments (like 604800) are passed as literal values.

Built-in commands in skeleton

The skeleton config includes two commands:

{% fresh <field> <seconds> %} — checks whether a date field is newer than a given number of seconds. Useful for filtering stale items from feeds that return non-deterministic results:

"input": {
  "validator": {
    "test": "{% fresh date 604800 %}"
  }
}

This filters out any items where the date field is older than 7 days (604800 seconds).

{% today <field> %} — checks whether a date field matches today's date:

"input": {
  "validator": {
    "require": true,
    "test": "{% today date %}"
  }
}

Filters

Custom Liquid filters defined in defs.filters. Each key becomes a filter usable as {{ value | name }} in templates. Filters are defined using standard Liquid filter expression syntax — the input value is piped through the expression chain.

Defining filters

"defs": {
  "filters": {
    "clean": "replace_regex: '\\s+', ' ' | strip"
  }
}

The expression uses standard Liquid pipe syntax. Built-in Liquid filters (strip, downcase, replace, etc.) and the additional replace_regex filter are available:

Filter	Description
`replace_regex: pattern, replacement`	Regex substitution (supports backreferences `\1`, `\2`, etc.)

Filters can be chained with | — the output of one becomes the input of the next. Custom filters can also reference other custom filters defined earlier.

Using filters

Use filters with the standard Liquid pipe syntax in any template:

{{ item.snippet | clean }}

The clean filter above collapses all whitespace (newlines, tabs, spaces) into a single space and trims leading/trailing whitespace.

Pagination

Two pagination types are supported:

`next_link` -- Follow a "next" link

For sites with a single "More" or "Next" link (e.g. Hacker News):

"pagination": {
  "type": "next_link",
  "selector": "a.morelink",
  "base_url": "https://news.ycombinator.com/",
  "max_pages": 2
}

`numbered` -- Follow numbered page buttons

For sites with numbered pagination (e.g. useme.com):

"pagination": {
  "type": "numbered",
  "selector": ".pagination .pagination__page",
  "active_class": "pagination__page--active",
  "base_url": "https://useme.com/pl/jobs/",
  "max_pages": 5
}

Finds the active page button and follows the link of the next one.

Common fields

Field	Required	Description
`max_pages`	no	Maximum number of pages to fetch (default: 1)
`base_url`	no	Base URL for resolving relative `href` values

Schedule

Each rule can have a schedule field with a standard cron expression or an array of expressions (any match triggers the rule). The script is designed to be invoked frequently (e.g. every 5 minutes via system cron), and it decides internally which rules are due based on their schedule.

The schedule uses croniter to parse standard 5-field cron expressions:

 ┌───────────── minute (0-59)
 │ ┌───────────── hour (0-23)
 │ │ ┌───────────── day of month (1-31)
 │ │ │ ┌───────────── month (1-12)
 │ │ │ │ ┌───────────── day of week (0-7, 0 and 7 are Sunday)
 │ │ │ │ │
 * * * * *

Examples

Expression	Meaning
`0 8 * * *`	Daily at 8:00
`0 /6 * *`	Every 6 hours (0:00, 6:00, 12:00, 18:00)
`0 9 * * 1`	Every Monday at 9:00
`/30 * * *`	Every 30 minutes
`0 8,20 * * *`	Twice daily at 8:00 and 20:00

Array of schedules

When a single cron expression can't cover your needs, use an array. The rule runs if any expression matches:

"schedule": ["0,30 9 * * *", "0 16 * * *"]

This runs at 9:00, 9:30, and 16:00 — something not expressible in a single 5-field cron string.

How it works

The script is designed to be invoked periodically by system cron (e.g. every 5 minutes or every hour). On each invocation:

The current time is truncated to the start of the minute (e.g. 14:03:27 becomes 14:03:00)
Each rule's cron expression is checked against that time using croniter.match
If it matches and the rule hasn't already run in this minute window, it executes
After a successful run, a timestamp is saved to ~/.mutimon/data/.lastrun_<rule_name> to prevent duplicate runs if the script is triggered again within the same minute
If no schedule is set, the rule runs every time
Use --force to bypass all schedules

Email templates

Templates use Liquid syntax via python-liquid. The following variables are available:

Variable	Description
`{{ count }}`	Number of new items
`{{ now }}`	Current date and time
`{{ search_url }}`	The rendered URL from the definition
`{% for item in items %}`	Loop over new items
`{{ item.index }}`	1-based position within the items list
Any rule `params`	e.g. `{{ query }}`
Any extracted variable	e.g. `{{ item.title }}`, `{{ item.url }}`, `{{ item.score }}`

Liquid supports conditionals, filters, and logic — see the Liquid docs.

Example template

Hacker News - New Stories
Checked at: {{ now }}

Number of new stories: {{ count }}
============================================================
{% for item in items %}

{{ item.rank }} {{ item.title }}
     Score: {{ item.score }} point{% if item.score != 1 %}s{% endif %} | {{ item.age }}
     URL:   {{ item.url }}
     HN:    {{ item.comments_url }}
{% endfor %}

============================================================

The subject field in a rule is also a Liquid template with access to the same variables.

How it works

On each run, all rules in the config are processed sequentially
For each rule, the scraper fetches the URL (with pagination) and extracts items using CSS selectors
For parse: "money", the page language is detected from <html lang> or the Content-Language header, and used for locale-aware currency parsing via babel
Items are compared against the saved state in ~/.mutimon/data/<rule_name>
New items, or items that crossed a validator threshold (previously failed, now pass), trigger an email notification
ALL items are saved in state with a _valid flag, so threshold crossings are detected on subsequent runs

Threshold crossing detection

When a rule has validators, the scraper tracks whether each item passed or failed on the previous run. This enables re-notifications when a value crosses a threshold boundary:

Price rises to $75k → validator >= 75000 passes → notify, save _valid: true
Price drops to $72k → validator fails → no notification, save _valid: false
Price rises to $76k → validator passes, previous _valid was false → notify again
Price stays at $76k → validator passes, previous _valid was true → no notification

This works for both upward thresholds (>=) and downward thresholds (<=). The state file stores all fetched items (not just those passing the validator) with a _valid boolean.

Error handling

The scraper sends error emails for four types of failures. The error email function (send_error_email) uses only Python's standard library (no third-party deps), so it works even when the error is caused by a missing dependency.

Error	Email subject	Behavior
Missing dependency (e.g. `import liquid` fails)	`[mutimon] Missing dependency`	Sends traceback, exits
Invalid config (schema validation fails)	`[mutimon] Invalid configuration`	Sends all validation errors, exits
HTML structure change (`expect` selectors missing)	`[mutimon] HTML structure changed for '<rule>'`	Sends missing selectors, skips that input, continues other rules
Fatal runtime crash (unhandled exception in `main()`)	`[mutimon] Fatal error`	Sends full traceback

Error emails are sent to all unique recipient addresses found across all rules in the config.

Examples

The skeleton/ directory contains two ready-to-use examples that are copied to ~/.mutimon/ on first run.

Hacker News — New stories

Monitors the Hacker News front page for new stories. Uses pagination to fetch 2 pages (60 stories), sibling element extraction for scores, and data-test attribute-based IDs.

Files: skeleton/config.json (hackernews def + rule), skeleton/templates/hackernews

Bitcoin price alerts — CoinMarketCap

Monitors Bitcoin price on CoinMarketCap with threshold-based alerts. Demonstrates:

Price going up: notify when Bitcoin crosses above $75k, $80k, $90k, $100k
Price going down: notify when Bitcoin drops below $60k, $50k, $40k
Threshold crossing detection: if price rises above $75k (notify), drops to $72k (no notify), then rises back above $75k (notify again)
Locale-aware money parsing: $70,528.40 is correctly parsed as 70528.40 using parse: "money" (US English format detected from <html lang="en">)
Structure validation: expect field checks that [data-test='text-cdp-price-display'] exists on the page

Files: skeleton/config.json (coinmarketcap def + rule), skeleton/templates/coinmarketcap

The bitcoin rule uses two input entries — one for upward thresholds (>=), one for downward thresholds (<=):

"input": [
  {
    "params": { "coin": "bitcoin" },
    "validator": [
      { "test": "{{price}} >= 75000" },
      { "test": "{{price}} >= 80000" },
      { "test": "{{price}} >= 100000" }
    ]
  },
  {
    "params": { "coin": "bitcoin" },
    "validator": [
      { "test": "{{price}} <= 60000" },
      { "test": "{{price}} <= 50000" }
    ]
  }
]

Reddit subreddit — RSS/Atom feed

Monitors a Reddit subreddit via its Atom feed (Reddit serves .rss URLs as Atom XML). Demonstrates:

XML format: format: "xml" switches from HTML to XML parsing, so CSS selectors target XML elements (entry, title, link) instead of HTML
Custom User-Agent: Reddit blocks default scrapers, so a Liferea RSS reader User-Agent is used
Parameterized subreddit: the subreddit param lets the same definition monitor any subreddit
Atom-specific selectors: entry for items, link[href] for URLs (Atom uses <link href="..."/> instead of <link>text</link>)

Files: skeleton/config.json (reddit-atom def + rule), skeleton/templates/reddit

"reddit-atom": {
  "params": ["subreddit"],
  "format": "xml",
  "userAgent": "Liferea/1.15.6 (Linux; https://lzone.de/liferea/) AppleWebKit (KHTML, like Gecko)",
  "url": "https://www.reddit.com/r/{{subreddit}}.rss",
  "query": {
    "type": "list",
    "selector": "entry",
    "id": { "source": "entry_id" },
    "variables": {
      "title":    { "selector": "title", "value": { "type": "text" } },
      "url":      { "selector": "link", "value": { "type": "attribute", "name": "href" } },
      "entry_id": { "selector": "id", "value": { "type": "text" } },
      "date":     { "selector": "updated", "value": { "type": "text" }, "default": "" },
      "author":   { "selector": "author name", "value": { "type": "text" }, "default": "" }
    }
  }
}

Configuring with AI

Mutimon ships with an AI instruction file that teaches any AI assistant how to add websites. Get its path with:

mon --ai-guide

Use it with Claude Code in batch mode:

claude -p "$(cat $(mon --ai-guide)) Add https://github.com/trending to mutimon. Extract repo name, description, URL, language, and stars. Email me daily at 8am at user@example.com."

Or with any AI assistant — just paste the contents of the file as context along with your request.

Example prompts

Monitor a website for new content

Add a rule to monitor Hacker News (https://news.ycombinator.com) for new stories. Extract the title, URL, score, and age. Send me an email every 6 hours at user@example.com with the new stories. Read the README.md, config.schema.json, and skeleton/config.json for reference.

Price alerts with thresholds

Add Bitcoin price monitoring using https://coinmarketcap.com/currencies/bitcoin/. Notify me when the price crosses above $75,000 or drops below $60,000. Check every 4 hours. Send alerts to user@example.com. Read the README.md, config.schema.json, and skeleton/config.json for reference.

Monitor for a feature release

Monitor https://soloterm.com/download for Linux support. The page currently shows "Coming soon" next to Linux. Notify me when that label disappears (use the match validator with exist: false). Also add an expect check so I get an error email if the page structure changes. Read the README.md and config.schema.json for reference.

Monitor an RSS/Atom feed

Add a rule to monitor the r/scheme subreddit via its RSS feed at https://www.reddit.com/r/scheme.rss. Reddit serves Atom XML, so use format "xml" and a Liferea User-Agent. Extract the title, URL, author, and date. Check every 6 hours and email me at user@example.com. Read the README.md, config.schema.json, and skeleton/config.json for reference.

Filter content with regex

Add a rule to monitor Hacker News for "Ask HN" posts only. Use the existing hackernews definition with a match validator that filters titles starting with "Ask HN". Read the README.md and config.schema.json for reference.

Filter RSS feed with a command

Add a rule to monitor the r/scheme subreddit via its Atom feed. Use the {% fresh date 604800 %} command to filter out posts older than 7 days, since Reddit's feed sometimes returns stale posts. Read the README.md, config.schema.json, and skeleton/config.json for reference.

Extract data from Next.js JSON (embedded JSON)

Add a rule to monitor job offers on https://it.pracuj.pl. The site is a Next.js app — some data (like city-specific URLs for multi-location offers) is only in the <script id="__NEXT_DATA__"> JSON, not in the HTML. Use parse: "json" with a JMESPath query to extract city and URL from the embedded JSON. Read the README.md and config.schema.json for reference.

Acknowledge

The logo was created as a combination of clipart from OpenClipart:

Time Attendance by petr.gajdusek
Notification logo by laftello
Internet Graphic Chart 02 by cset_paper

It also uses Lovelo font.

Name Origin

Mutimon is a concise Latin portmanteau formed from mutare (“to change”) + monere (“to warn / monitor”).

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 20, 2026

0.1.0

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutimon-0.2.0.tar.gz (81.1 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mutimon-0.2.0-py3-none-any.whl (44.8 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file mutimon-0.2.0.tar.gz.

File metadata

Download URL: mutimon-0.2.0.tar.gz
Upload date: Apr 20, 2026
Size: 81.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for mutimon-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`af027e2f2a239a3a69f515c0346d03b58c6df309b5463ef885ab2b831f52ae6d`
MD5	`45f909ed1f6711026a7236b0b55df435`
BLAKE2b-256	`10612ef758e1a7ce0fae294fd6b30580a52e8a99a1b4a7a6a716dfa6658aaf81`

See more details on using hashes here.

File details

Details for the file mutimon-0.2.0-py3-none-any.whl.

File metadata

Download URL: mutimon-0.2.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 44.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for mutimon-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8db6aafc9345e3c086d94cdb8036de4de57cfb680df1c384c1d9150d7b0ff734`
MD5	`0fe382c72c3ff86f23f38f03253605ef`
BLAKE2b-256	`20d8e3e5485f453e42b83aa8ea9745f8b2593b040a2573fd0623cfdde018bd01`

See more details on using hashes here.

mutimon 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

From PyPI

From source

First run

Usage

Cron setup

File structure

Configuration

email -- SMTP server

defs -- Reusable scraping definitions and commands

rules -- What to run

Variable extraction

Value types

Optional value modifiers

Optional variable fields

Special selectors

Example with all options

Collecting multiple values

Self-referencing the container

JSON extraction

Basic structure

JMESPath syntax

Template usage

Works with attributes too

Example: Next.js multi-location job offers

Item identity (deduplication)

From a variable with regex

From an HTML attribute

Fallback

Filtering

Expected structure

Reject selectors

Multiple inputs

Validators

test -- Numeric expression

match -- Regex match

include / exclude -- String list match

Combined example

Array of validators (OR logic)

Required validators (require)

Reusable validators (@id)

Commands

Defining commands

Using commands

Built-in commands in skeleton

Filters

Defining filters

Using filters

Pagination

next_link -- Follow a "next" link

numbered -- Follow numbered page buttons

Common fields

Schedule

Examples

Array of schedules

How it works

Email templates

Example template

How it works

Threshold crossing detection

Error handling

Examples

Hacker News — New stories

Bitcoin price alerts — CoinMarketCap

Reddit subreddit — RSS/Atom feed

Configuring with AI

Example prompts

Monitor a website for new content

Price alerts with thresholds

Monitor for a feature release

Monitor an RSS/Atom feed

`email` -- SMTP server

`defs` -- Reusable scraping definitions and commands

`rules` -- What to run

`test` -- Numeric expression

`match` -- Regex match

`include` / `exclude` -- String list match

Required validators (`require`)

Reusable validators (`@id`)

`next_link` -- Follow a "next" link

`numbered` -- Follow numbered page buttons