Skip to main content

Tools for performing content audits of Wagtail sites

Project description

wagtail-content-audit

Content audit utilities for Wagtail. Still a work in progress.

For Wagtail sites with deeply-nested blocks and a large amount of potentially old content, it can be helpful to inspect block usage and be able to search through the content as it exists in the database. This library is intended to help with these and other challenges of auditing the content in Wagtail.

Dependencies

  • Python 3.10+
  • Django 4.2 (LTS)+
  • Wagtail 5.2 (LTS)+

It should be compatible at all intermediate versions, as well. If you find that it is not, please file an issue.

Installation

  1. Install wagtail-content-audit:
pip install wagtail-content-audit
  1. Add wagtail_content_audit as an installed app in your Django settings.py:
INSTALLED_APPS = (
    ...
    "wagtail_content_audit",
    ...
)

Usage

wagtail-content-audit provides two primary audit tools at present:

  • Block usage auditing
  • Page field searching

For both, it provides a QuerySet-like object using queryish that returns instances of a dataclass with relevant result data.

Block usage

Block usage is intended to audit deeply-nested Wagtail Blocks to discover how much these blocks might be used, and wwithin which other blocks and fields that usage occurs.

Block usage management command

wagtail-content-audit provides a management command to run the block usage audit and output CSV results:

./manage.py block_usage

The resulting CSV can be redirected to a file:

./manage.py block_usage > block_usage_audit.csv

The command takes the following arguments:

--pagetype PAGETYPE_AND_FIELD, -p PAGETYPE_AND_FIELD

Limits the audit to the particular page type(s) and Wagtail StreamField as a dotted path. For example,

./manage.py block_usage --pagetype myapp.PageWithContent.content

Will output the blocks used in all myapp.PageWithContent pages' content field.

Block usage QuerySet

from wagtail_content_audit.query import BlockUsageQuerySet

The underlying queryish QuerySet can be used outside of the management management command as well. This QuerySet behaves like any queryish QuerySet, with a limited set of available options.

It can be filtered for page types:

filtered_queryset = BlockUsageQuerySet().filter(page_model="myapp.PageWithContent")

It can be filtered for Wagtail StreamFields:

filtered_queryset = BlockUsageQuerySet().filter(field="content")

And these can be combined:

filtered_queryset = BlockUsageQuerySet().filter(page_model="myapp.PageWithContent", field="content")

The queryset can also be sliced:

sliced_queryset = BlockUsageQuerySet()[:5]

The resulting objects in the queryset are wagtail_content_audit.query.AuditedBlock objects with the following schema:

@dataclass
class AuditedBlock:
    page_model: type
    field: str
    path: str
    block: type
    pages: list
    total_occurrences: int = 0
    pages_count: int = 0
    pages_live_count: int = 0
    pages_in_default_site_count: int = 0

Page search

Page search is intended to enable searching for specific patterns (using regular expressions) in text content in all Wagtail Page model fields.

For StreamFields specifically, it returns explicit block paths within a StreamField (i.e., 0.list.item.1.richtext for a result found in the second child list item in the first child block in the field) as well as the general block path (i.e., list.item.richtext) so that the blocks can be targetted using Wagtail StreamField migrations.

Page search management command

wagtail-content-audit provides a management command to run the page search audit and output CSV results:

./manage.py page_search -s '[tT]est'

The resulting CSV can be redirected to a file:

./manage.py page_search -s '[tT]est' > page_search_test.csv

The command takes the following arguments:

--pagetype PAGETYPE_AND_FIELD, -p PAGETYPE_AND_FIELD

Limits the search to the particular page type(s) and model field as a dotted path. For example,

./manage.py page_search -s '[tT]est' --pagetype myapp.PageWithContent.content

Will only search within the content field of myapp.PageWithContent pages.

Page search QuerySet

from wagtail_content_audit.query import PageSearchQuerySet

The underlying queryish QuerySet can be used outside of the management management command as well. This QuerySet behaves like any queryish QuerySet, with a limited set of available options.

It can be searched with any regular expression string:

search_queryset = PageSearchQuerySet().filter(search=r"[tT]est")

It can be filtered for page types:

filtered_queryset = PageSearchQuerySet().filter(search=r"[tT]est", page_model="myapp.PageWithContent")

It can be filtered for model fields:

filtered_queryset = PageSearchQuerySet().filter(search=r"[tT]est", field="content")

And these can be combined:

filtered_queryset = PageSearchQuerySet().filter(search=r"[tT]est", page_model="myapp.PageWithContent", field="content")

The queryset can also be sliced:

sliced_queryset = BlockUsageQuerySet()[:5]

The resulting objects in the queryset are wagtail_content_audit.query.pagesearch.PageMatch objects with the following schema:

@dataclass
class PageMatch:
    page_model: type
    page: Page
    field_name: str
    field_type: str
    stream_field_path: list
    block_type: type
    result_path: list
    matches: list

Getting help

Please add issues to the issue tracker.

Getting involved

General instructions on how to contribute can be found in CONTRIBUTING.

Licensing

  1. TERMS
  2. LICENSE
  3. CFPB Source Code Policy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wagtail_content_audit-0.2.1.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wagtail_content_audit-0.2.1-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file wagtail_content_audit-0.2.1.tar.gz.

File metadata

  • Download URL: wagtail_content_audit-0.2.1.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for wagtail_content_audit-0.2.1.tar.gz
Algorithm Hash digest
SHA256 ec4be37c9a7d7b20616c2e0cfa900388e92df9c249f5065be309470fea541c06
MD5 78d6137cb1541dccc7f270ce292417c3
BLAKE2b-256 9b3be67c7389da6cc3cdc38e5c0895d762fc89d067ff6eda4cd388244382b20c

See more details on using hashes here.

Provenance

The following attestation bundles were made for wagtail_content_audit-0.2.1.tar.gz:

Publisher: release.yml on cfpb/wagtail-content-audit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wagtail_content_audit-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for wagtail_content_audit-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 56a8891d8bfdfc870202ec86a65d66644f7f13c8a9cb601f7c88616129e6d23d
MD5 349550842a5d6a5ab0243c3825f8683d
BLAKE2b-256 259c8270157d0bd5f77fd6a3c5fd006715767895bd6871ea1f4e898b12603509

See more details on using hashes here.

Provenance

The following attestation bundles were made for wagtail_content_audit-0.2.1-py3-none-any.whl:

Publisher: release.yml on cfpb/wagtail-content-audit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page