Canonical company-name normalisation, shared byte-for-byte across EasyBiz services (L-MDS-CR-08).
Project description
easybiz-companyname-normalisation
Canonical company-name normalisation, shared byte-for-byte by the MDS company-resolver and accounting-service (locked decision L-MDS-CR-08).
It is a library, not a service: normalise_name() is a pure, deterministic
function on the resolution hot path. A service would add a network hop and an
availability dependency for zero benefit.
from easybiz_companyname_normalisation import normalise_name, NORMALISER_VERSION
normalise_name("ACME S.à r.l.") # -> "acme sarl"
NORMALISER_VERSION # -> "1.0.0"
Why this is a shared package and not copy-pasted logic
Both services store normalised values (EntitySynonym.normalised_value,
embedding source text). Those stored values are a derived cache; the
raw value is the source of truth. If the two services normalised differently,
a name stored by one would silently fail to match a query from the other. So the
logic lives in exactly one place and both import it.
Versioning policy (the important part)
| Bump | When | Consequence |
|---|---|---|
| MAJOR | normalise_name() output changes for any input (e.g. adding a legal form) |
Requires a coordinated re-normalise backfill of stored values from raw_value, and a re-embed, in MDS. Both services upgrade in lockstep. |
| MINOR / PATCH | tests, perf, docs, adding fixture rows | Must never change output. |
NORMALISER_VERSION equals the package version. MDS persists it alongside each
stored value (EntitySynonym.normaliser_version, mirroring
EntityEmbedding.embedding_model) so stale rows are detectable; a
renormalise_stale management command re-derives rows whose version != current,
from raw_value. Skew only ever degrades a match to a miss (→ human review),
never to a wrong match — the safe failure direction.
The parity contract
tests/test_normalisation.py runs against fixtures/lu_names.csv, which ships
inside the package. Running this test in each consumer's CI proves that
consumer uses the canonical behaviour for the version it pinned.
Never edit an existing expected value without a MAJOR bump — that is, by
definition, an output change. Adding rows is fine (MINOR) and is the encouraged
way to harden coverage as real LU supplier-name shapes surface. The 20 starter
rows are illustrative, not a target; there is no requirement to reach any count.
Installation
Distribution is deliberately deferred while in solo development. The import path is identical across all options below — only the install source changes.
- Now (solo inner loop): clone next to the service repos and install editable.
Both services point at one working copy; edits are picked up instantly.
pip install -e ../easybiz-companyname-normalisation - When CI or a teammate appears — switch to a pinned git tag:
Pin a tag, not a branch — a branch ref lets two installs drift, which breaks the version-sync contract.pip install git+https://github.com/<org>/easybiz-companyname-normalisation@v1.0.0 - Target later (optional): AWS CodeArtifact private PyPI.
python -m build && twine upload --repository codeartifact dist/*; consume viaaws codeartifact login --tool pip ...and pin==1.0.0.
Scope
Company-name normalisation only. Identifier checksum validators (IBAN/VAT/RCS) stay accounting-side (the resolver trusts pre-validated input); they are not part of this package.
TODO (deferred, not yet built)
- Cross-service CI parity guard — once both services have CI, add a check that
fails the build if MDS and accounting pin different
easybiz-companyname-normalisationversions. Not needed while both use one editable install.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easybiz_companyname_normalisation-1.0.0.tar.gz.
File metadata
- Download URL: easybiz_companyname_normalisation-1.0.0.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d3dab288f582a373452c76fb7d771e1dc5a093bb028ec23e423c5daec2fbafc
|
|
| MD5 |
755e0e36e4cef6e2381666bcc526a9a6
|
|
| BLAKE2b-256 |
e2bde0e8b27a26a68ac3cbf6d29ce484704664e1d3efaa70e478f51965597951
|
File details
Details for the file easybiz_companyname_normalisation-1.0.0-py3-none-any.whl.
File metadata
- Download URL: easybiz_companyname_normalisation-1.0.0-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f499991eeaa3c62301db5dc386afaa3062058548ccb32028cd3dcb01e21bc19a
|
|
| MD5 |
e30d187cc8ec616324d0b598aff8b9d0
|
|
| BLAKE2b-256 |
7c97a533fa40c95ab7ff15baf3346d3950d0df0c7dee3439ea4299c6aded3784
|