Tools for performing content audits of Wagtail sites
Project description
wagtail-content-audit
Content audit utilities for Wagtail. Still a work in progress.
For Wagtail sites with deeply-nested blocks and a large amount of potentially old content, it can be helpful to inspect block usage and be able to search through the content as it exists in the database. This library is intended to help with these and other challenges of auditing the content in Wagtail.
Dependencies
- Python 3.10+
- Django 4.2 (LTS)+
- Wagtail 5.2 (LTS)+
It should be compatible at all intermediate versions, as well. If you find that it is not, please file an issue.
Installation
- Install wagtail-content-audit:
pip install wagtail-content-audit
- Add
wagtail_content_auditas an installed app in your Djangosettings.py:
INSTALLED_APPS = (
...
"wagtail_content_audit",
...
)
Usage
wagtail-content-audit provides two primary audit tools at present:
- Block usage auditing
- Page field searching
For both, it provides a QuerySet-like object using queryish that returns instances of a dataclass with relevant result data.
Block usage
Block usage is intended to audit deeply-nested Wagtail Blocks to discover how much these blocks might be used, and wwithin which other blocks and fields that usage occurs.
Block usage management command
wagtail-content-audit provides a management command to run the block usage audit and output CSV results:
./manage.py block_usage
The resulting CSV can be redirected to a file:
./manage.py block_usage > block_usage_audit.csv
The command takes the following arguments:
--pagetype PAGETYPE_AND_FIELD, -p PAGETYPE_AND_FIELD
Limits the audit to the particular page type(s) and Wagtail StreamField as a dotted path. For example,
./manage.py block_usage --pagetype myapp.PageWithContent.content
Will output the blocks used in all myapp.PageWithContent pages' content field.
Block usage QuerySet
from wagtail_content_audit.query import BlockUsageQuerySet
The underlying queryish QuerySet can be used outside of the management management command as well. This QuerySet behaves like any queryish QuerySet, with a limited set of available options.
It can be filtered for page types:
filtered_queryset = BlockUsageQuerySet().filter(page_model="myapp.PageWithContent")
It can be filtered for Wagtail StreamFields:
filtered_queryset = BlockUsageQuerySet().filter(field="content")
And these can be combined:
filtered_queryset = BlockUsageQuerySet().filter(page_model="myapp.PageWithContent", field="content")
The queryset can also be sliced:
sliced_queryset = BlockUsageQuerySet()[:5]
The resulting objects in the queryset are wagtail_content_audit.query.AuditedBlock objects with the following schema:
@dataclass
class AuditedBlock:
page_model: type
field: str
path: str
block: type
pages: list
total_occurrences: int = 0
pages_count: int = 0
pages_live_count: int = 0
pages_in_default_site_count: int = 0
Page search
Page search is intended to enable searching for specific patterns (using regular expressions) in text content in all Wagtail Page model fields.
For StreamFields specifically, it returns explicit block paths within a StreamField (i.e., 0.list.item.1.richtext for a result found in the second child list item in the first child block in the field) as well as the general block path (i.e., list.item.richtext) so that the blocks can be targetted using Wagtail StreamField migrations.
Page search management command
wagtail-content-audit provides a management command to run the page search audit and output CSV results:
./manage.py page_search -s '[tT]est'
The resulting CSV can be redirected to a file:
./manage.py page_search -s '[tT]est' > page_search_test.csv
The command takes the following arguments:
--pagetype PAGETYPE_AND_FIELD, -p PAGETYPE_AND_FIELD
Limits the search to the particular page type(s) and model field as a dotted path. For example,
./manage.py page_search -s '[tT]est' --pagetype myapp.PageWithContent.content
Will only search within the content field of myapp.PageWithContent pages.
Page search QuerySet
from wagtail_content_audit.query import PageSearchQuerySet
The underlying queryish QuerySet can be used outside of the management management command as well. This QuerySet behaves like any queryish QuerySet, with a limited set of available options.
It can be searched with any regular expression string:
search_queryset = PageSearchQuerySet().filter(search=r"[tT]est")
It can be filtered for page types:
filtered_queryset = PageSearchQuerySet().filter(search=r"[tT]est", page_model="myapp.PageWithContent")
It can be filtered for model fields:
filtered_queryset = PageSearchQuerySet().filter(search=r"[tT]est", field="content")
And these can be combined:
filtered_queryset = PageSearchQuerySet().filter(search=r"[tT]est", page_model="myapp.PageWithContent", field="content")
The queryset can also be sliced:
sliced_queryset = BlockUsageQuerySet()[:5]
The resulting objects in the queryset are wagtail_content_audit.query.pagesearch.PageMatch objects with the following schema:
@dataclass
class PageMatch:
page_model: type
page: Page
field_name: str
field_type: str
stream_field_path: list
block_type: type
result_path: list
matches: list
Getting help
Please add issues to the issue tracker.
Getting involved
General instructions on how to contribute can be found in CONTRIBUTING.
Licensing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wagtail_content_audit-0.2.1.tar.gz.
File metadata
- Download URL: wagtail_content_audit-0.2.1.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec4be37c9a7d7b20616c2e0cfa900388e92df9c249f5065be309470fea541c06
|
|
| MD5 |
78d6137cb1541dccc7f270ce292417c3
|
|
| BLAKE2b-256 |
9b3be67c7389da6cc3cdc38e5c0895d762fc89d067ff6eda4cd388244382b20c
|
Provenance
The following attestation bundles were made for wagtail_content_audit-0.2.1.tar.gz:
Publisher:
release.yml on cfpb/wagtail-content-audit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wagtail_content_audit-0.2.1.tar.gz -
Subject digest:
ec4be37c9a7d7b20616c2e0cfa900388e92df9c249f5065be309470fea541c06 - Sigstore transparency entry: 707873969
- Sigstore integration time:
-
Permalink:
cfpb/wagtail-content-audit@a39247e2809673e10dd848d9a5c74f462914249e -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/cfpb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a39247e2809673e10dd848d9a5c74f462914249e -
Trigger Event:
release
-
Statement type:
File details
Details for the file wagtail_content_audit-0.2.1-py3-none-any.whl.
File metadata
- Download URL: wagtail_content_audit-0.2.1-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56a8891d8bfdfc870202ec86a65d66644f7f13c8a9cb601f7c88616129e6d23d
|
|
| MD5 |
349550842a5d6a5ab0243c3825f8683d
|
|
| BLAKE2b-256 |
259c8270157d0bd5f77fd6a3c5fd006715767895bd6871ea1f4e898b12603509
|
Provenance
The following attestation bundles were made for wagtail_content_audit-0.2.1-py3-none-any.whl:
Publisher:
release.yml on cfpb/wagtail-content-audit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wagtail_content_audit-0.2.1-py3-none-any.whl -
Subject digest:
56a8891d8bfdfc870202ec86a65d66644f7f13c8a9cb601f7c88616129e6d23d - Sigstore transparency entry: 707873977
- Sigstore integration time:
-
Permalink:
cfpb/wagtail-content-audit@a39247e2809673e10dd848d9a5c74f462914249e -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/cfpb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a39247e2809673e10dd848d9a5c74f462914249e -
Trigger Event:
release
-
Statement type: