Pydantic validators for mySociety democracy types
Project description
mysoc-validator
A set of pydantic-based validators and classes for common mySociety democracy formats.
Currently supports:
- Popolo database
- Transcript format (old-style XML and new json format)
- Interests format
XML based formats are tested to round-trip with themselves, but not to be string identical with the original source.
Can be installed with pip install mysoc-validator
To use as a cli validator:
python -m mysoc_validator popolo validate path-to-people.json
python -m mysoc_validator transcript validate path-to-transcript.xml
python -m mysoc_validator transcript validate transcripts/
python -m mysoc_validator transcript validate path-to-*.xml --glob
python -m mysoc_validator interests validate path-to-interests.xml
To see all options use python -m mysoc_validator --help or python -m mysoc_validator popolo tui.
Or if using uvx (don't need to install first):
uvx mysoc-validator popolo validate path-to-people.json
To validate and consistently format:
uvx mysoc-validator format people.json
Modification functions
See python -m mysoc_validator popolo --help for functions to change parties/whip and add alt names.
Popolo
A pydantic based validator for main mySociety people.json file (which mostly follows the popolo standard with a few extra bits).
Validates:
- Basic structure
- Unique IDs and ID Patterns
- Foreign key relationships between objects.
It also has support for looking up from name or identifying to person, and new ID generation for membership.
Using name or ID lookup
After first use, there is some caching behind the scenes to speed this up.
from mysoc_validator import Popolo
from mysoc_validator.models.popolo import Chamber, IdentifierScheme
from datetime import date
popolo = Popolo.from_parlparse()
keir_starmer_parl_id = popolo.persons.from_identifier(
"4514", scheme=IdentifierScheme.MNIS
)
keir_starmer_name = popolo.persons.from_name(
"keir starmer", chamber_id=Chamber.COMMONS, date=date.fromisoformat("2022-07-31")
)
keir_starmer_parl_id.id == keir_starmer_name.id
Transcripts
Python validator and handler for 'publicwhip' style transcript format.
from mysoc_validator import Transcript
from pathlib import Path
transcript_file = Path("data", "debates2023-03-28d.xml")
transcript = Transcript.from_xml_path(transcript_file)
Register of Interests
Python validator and handler for 'publicwhip' style interests format.
For new style generic json format.
from mysoc_validator import RegmemRegister
from pathlib import Path
register_file = Path("data", "commons-regmem-2025-01-20.json")
interests = RegmemRegister.from_path(register_file)
from mysoc_validator import XMLRegister
from pathlib import Path
register_file = Path("data", "regmem2024-05-28.xml")
interests = XMLRegister.from_xml_path(register_file)
Info fields
We have various XML files in parlparse that are loaded into TWFY as extra info for people or constituencies.
This library has two approaches for this - a general permissive model that can load any file, and tools to create models to add validation for particular files if needed.
Load any file
from mysoc_validator.models.info import InfoCollection, PersonInfo, ConsInfo
social_media_links = InfoCollection[PersonInfo].from_parlparse("social-media-commons")
constituency_links = InfoCollection[ConsInfo].from_parlparse("constituency-links")
And this is an example of creating a more bespoke model for a particular file.
Subclassing PersonInfo switches the 'extras' setting from 'allow' to 'forbid'.
from typing import Optional
from mysoc_validator.models.info import InfoCollection, PersonInfo, ConsInfo
class SocialInfo(PersonInfo):
facebook_page: Optional[str] = None
twitter_username: Optional[str]= None
bluesky_handle: Optional[str]= None
social_media_links = InfoCollection[SocialInfo].from_parlparse("social-media-commons")
If needing to pass dicts across the XML boundary (although this implies a change to how things are imported), do the following:
from mysoc_validator.models.info import InfoCollection, PersonInfo
from mysoc_validator.models.xml_base import XMLDict, AsAttrStr
class DemoDataModel(PersonInfo):
regmem_info: XMLDict
random_string: AsAttrStr
item = DemoDataModel(
person_id="uk.org.publicwhip/person/10001",
regmem_info={"hello": ["yes", "no"]},
random_string="banana",
)
items = InfoCollection[DemoDataModel](items=[item])
xml_data = items.model_dump_xml()
# Which looks like
"""
<twfy>
<personinfo id="uk.org.publicwhip/person/10001">
<regmem_info>{"hello": ["yes", "no"]}</regmem_info>
<random_string>banana</random_string>
</personinfo>
</twfy>
"""
# which can either be round-triped in the same model - or read by the generic model like this
generic_read = (
InfoCollection[PersonInfo].model_validate_xml(xml_data).promote_children()
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mysoc_validator-1.2.0.tar.gz.
File metadata
- Download URL: mysoc_validator-1.2.0.tar.gz
- Upload date:
- Size: 39.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e83cfd3ed336525e13bb40d00e9e615d73f0540351c0788a80ce5bc1d2b375d
|
|
| MD5 |
08841720d7ff86ddee4b823c9a8499d6
|
|
| BLAKE2b-256 |
b865bfae2617d116fdb575978bb83fdd216728600086727a84c7a2fa8d545057
|
Provenance
The following attestation bundles were made for mysoc_validator-1.2.0.tar.gz:
Publisher:
auto_publish.yml on mysociety/mysoc-validator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mysoc_validator-1.2.0.tar.gz -
Subject digest:
5e83cfd3ed336525e13bb40d00e9e615d73f0540351c0788a80ce5bc1d2b375d - Sigstore transparency entry: 534553457
- Sigstore integration time:
-
Permalink:
mysociety/mysoc-validator@5676c09cef262716dd6a1e0c801ec29b2f5db7db -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mysociety
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
auto_publish.yml@5676c09cef262716dd6a1e0c801ec29b2f5db7db -
Trigger Event:
push
-
Statement type:
File details
Details for the file mysoc_validator-1.2.0-py3-none-any.whl.
File metadata
- Download URL: mysoc_validator-1.2.0-py3-none-any.whl
- Upload date:
- Size: 45.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
536465440c000cd0da6cd076097cdd5e0eb694f75487074e11a8403d5a47b86c
|
|
| MD5 |
4a14e0f026ac6a4e436f3fc7f86ee19d
|
|
| BLAKE2b-256 |
26430d938ef1f2d4cc5b12a72a41d61477aa0da74b05876471a434f377cfd89f
|
Provenance
The following attestation bundles were made for mysoc_validator-1.2.0-py3-none-any.whl:
Publisher:
auto_publish.yml on mysociety/mysoc-validator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mysoc_validator-1.2.0-py3-none-any.whl -
Subject digest:
536465440c000cd0da6cd076097cdd5e0eb694f75487074e11a8403d5a47b86c - Sigstore transparency entry: 534553489
- Sigstore integration time:
-
Permalink:
mysociety/mysoc-validator@5676c09cef262716dd6a1e0c801ec29b2f5db7db -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mysociety
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
auto_publish.yml@5676c09cef262716dd6a1e0c801ec29b2f5db7db -
Trigger Event:
push
-
Statement type: