Segmenting (classification + caching) for Keble positioning grid.

Project description

keble-segmenting

Segmenting, masking, cloning, aggregation, and agentic mutation runtime for Keble positioning grids.

Current package line:

version: 0.11.12
python: >=3.13,<3.14

Version 0.11.12 Coverage And Prompt Context Remediation

Coverage source signatures now include current mask definitions, mask options, and preset mask type values. Changing mask semantics invalidates stale coverage even when grid mask ids stay unchanged.
Coverage phase completion now records completed_envelope_ids, rejects duplicate completion events for the same envelope, and avoids incrementing already COMPLETE or FAILED phase rows.
Prompt context policy fields are enum-backed while preserving their serialized values, and market-channel source URLs stay in registered references instead of prompt prose.
Both Hatch and Poetry metadata now declare the same current package version.

Version 0.11.10 Dimension Option Rebucketing

UPDATE_DIMENSION_OPTIONS is documented as membership-preserving only. Use it for typo, copy, wording, niche, or role changes when existing item memberships remain valid.
Price-band split, merge, and rebucketing flows must retire old option keys, create fresh buckets, and then run AUTO_SEGMENT_AND_MASK for the affected dimension.
Deleting dimension options now preserves historical SegmentedResult rows in storage. Current aggregations ignore retired option keys, so old rows remain fetchable for audit/history without affecting current positioning cells.

Version 0.11.9 Prompt Context And References

Preset-mask prompts keep user-facing evidence in the requested output language while allowing external tool queries in English or the target marketplace language when that gives better source coverage.
Prompt-owned context now tells agents not to copy helper labels, dimension keys, option keys, cell signatures, or machine-style key=value statistics into evidence text.
Prior preset and parent-cell source digests now render as readable market context, with broad reusable references kept behind exact product/cell/source evidence.

Version 0.11.8 Rare Option Coverage

Bootstrap auto-segment planning now queues every discovered dimension for item segmentation, including custom and niche axes, so rare options receive SegmentedResult rows during the first run.
Preset-mask work remains pricing anchored through the existing pricing+feature and pricing+scene target pairs; richer item coverage does not create extra preset-mask pair fanout.
Dimension discovery, dimension merge, and single-dimension segmentation prompts now share context-backed option discovery guidance. Concrete market-relevant one-item options should stay STANDARD with niche=true instead of being hidden in OTHER.

Version 0.11.7 Exact-Cell Metric Estimates

MARKET_DEMAND and POTENTIAL_SALES AI rows with numeric metric payloads must use area_type=CELLS with exactly one concrete cell. ROW and multi-cell CELLS outputs are rejected before storage expansion.
Numeric preset prompts now require one result per exact cell and forbid copying one broad numeric range across multiple cells. Each estimate must account for the cell's own price band, feature/use-case niche, observed sample, empty-cell state, comparable markets, and seller profile.
Normal custom masks and non-metric preset masks keep their existing ROW expansion behavior.
Mock handler deps now carry the current prompt and queue context fields so focused regressions match the real agent dependency surface.

Version 0.11.6 Preset Mask Reference Placement

Added MaskResultReferenceUsagePolicy so citation rules are owned by mask-result evidence and metric-range evidence, not generic prompt context.
Preset MARKET_DEMAND and POTENTIAL_SALES prompts now explicitly place directly used source keys on headline result evidence and exact numeric metric evidence while keeping empty references valid when no external source directly supports the claim.
Prompt-owned market-channel assumption sources are registered in the per-call SegmentingReferenceRegistry for preset mask prompts, so public assumptions can persist as normal web references instead of prose-only context.
Downstream preset context now carries readable source labels from stored references and avoids raw reference keys, URLs, provider internals, enum names, and JSON in prior-demand or prior-sales evidence lines.

Version 0.11.5 References And Locale Rule

Segmenting tool registrars now receive a per-call SegmentingReferenceRegistry. Tools register concrete references and return only ReferenceCandidate rows with reference_key, title, and snippet.
LLM output schemas accept reference_keys and resolve them after generation. Unknown keys are logged and dropped so fabricated citations are not persisted or displayed.
Mask results, metric ranges, relational reasonings, aggregate cells, and metric summaries now carry backward-compatible references: [] fields.
Reference dedupe uses normalized URL, product identity, segmenting cell signature, or a type/title fallback.
Preset MARKET_DEMAND, POTENTIAL_SALES, LIKELY_UNACHIEVABLE, and MARKET_OPPORTUNITY_HIGHLIGHT prompts include a strong output-language rule next to preset guidance so user-facing evidence follows the prompt locale.

Version 0.11.4 Auto-Mask Allowlist

AutoSegmentAndMaskAction, SegmentingActionConfig, and SegmentingClient.aauto_segment_and_mask(...) now accept auto_mask_types: list[PresetMaskType] | None.
None preserves the package default preset behavior, [] disables preset stages, and a non-empty list selects exactly those preset mask types.
Selected preset stages always run in canonical dependency-safe order: demand, feasibility, potential sales, then opportunity.
Opportunity context is best-effort from available demand and potential-sales rows. Feasibility remains supported and readable, but is included only when the resolved allowlist selected it and matching results exist.

Current Branch Prompt Context Contract

Branch fix/segmenting-prompt-context keeps package version metadata unchanged.
SegmentingPromptContext is the typed prompt-owned contract for user prompt, language, marketplace, reference style, pricing policy, and off-scope policy. Client entry points and queued auto-segment jobs thread it through dimension discovery, mask discovery, normal mask matching, preset demand, and preset sales.
Prompt rules now require off-scope products to route to OTHER or omission, pricing options to prefer numeric ranges, demand to mean target-market monthly demand across channels, and sales to mean launchable monthly sales for the target segment.
Evidence should be localized, markdown-renderable, and reference ASIN, brand, title, marketplace, or public source names rather than internal ids.
get_market_channel_assumption(...) records source names, URLs, publication dates, and access dates for supported marketplaces instead of using an unattributed fixed marketplace-share shortcut.
Item prompts use PromptItemReference aliases such as sample_1; backend stable item keys stay in mapping code and are restored only after typed model output returns.
Dimension discovery returns DimensionByAi rows directly. Alias restoration belongs only to item-classification outputs such as SegmentedResultsByAi and SegmentedResultsByAiWithNewOptions.
Dimension and mask worker loops emit safe batch-level AutoSegmentProgressEvent messages while work is still running. Messages must not contain package names, stable keys, raw ids, enum tokens, or JSON payloads, and progress counts come from the runtime progress ledger rather than per-batch indexes.

Version 0.11.3 Pair-Based Positioning Coverage

Auto-segment queue requests now separate work_dimension_keys from coverage_scopes. One bootstrap run can segment pricing, feature, and scene together while still claiming durable coverage rows for exact visible pairs.
Preset mask target order is pricing + feature first, then pricing + scene, so the primary seller decision table becomes useful before secondary views.
Coverage source signatures no longer include grid.updated or infos_id, because Infos generation belongs to the same worker run and must not invalidate the coverage row before the frontend can read it.
Coverage phase updates can filter by envelope id, so completing one pricing+feature preset job does not complete the sibling pricing+scene row.

Version 0.11.2 Update

Added durable segmenting coverage storage for one normalized grid dimension-key scope and one exact source signature.
Coverage uses enum-backed phase/status fields, with phase states for item segmentation, Infos generation, normal masks, and preset mask stages.
Added normalized scope/source signature helpers and an atomic aclaim_or_get_coverage(...) path so lazy positioning queues can create or reuse one active coverage row before worker envelopes are enqueued.
Added phase transition helpers for queued, processing, complete, and failed worker states. BackendAutoSegmentRunMongoObject remains the worker ledger; coverage is the public read-side state for displayed dimension combinations.
Follow-up CRUD coverage now proves processing, completion, and failure transitions update only the targeted phase and preserve sibling phase state.

Version 0.11.1 Update

Locked the aggregate display status rule for partially segmented visible axes: if any displayed dimension lacks terminal segmentation rows for the scoped items, synthetic cells remain PROCESSING.
EMPTY now stays documented as the authoritative terminal state only after every displayed dimension is fully terminal and the exact cell has no matched items.
Added two-axis aggregation regressions proving missing custom-axis rows render as processing while fully terminal no-match cells remain empty.

Version 0.11.0 Update

Removed Infos freshness/cache behavior. Infos no longer stores source_signature, generation context no longer stores source_revision, and shared grids update the same linked Infos row in place when generation is explicitly requested.
Added infos_generation_enabled beside mask_generation_enabled on action, client, and queue contracts. Auto segmenting now queues dimensions first, then one shared Infos generation phase when enabled, then normal and preset mask jobs.
Infos generation now uses dimensions, segmented option assignments, segmentable item prompt samples, and parent context. It intentionally does not load masks or mask results.
Infos context is reused by normal/custom masks, preset masks, and prompt-facing grid strings. Missing Infos renders an explicit empty marker instead of failing mask generation.
Renamed the public mask runtime to amask_cells_for_mask(...) and replaced preset-only Exa wiring with generic tool_registrars for discovery, segmentation, normal masks, and preset masks.

Version 0.10.4 Update

Preset mask stage workers now load existing mask results first and return a cheap zero-upsert completion event when every exact pair cell signature is already covered.
Auto-segment queue planning now filters preset market stages by missing exact coverage before queue insertion. Fully covered stages are skipped, and Infos refresh runs only when at least one queued preset stage needs market context.
Preset cell context providers now receive only pending cells, so prior covered cells do not trigger repeat context building, tool work, or masking.
No schema or storage contract changed. The active rule remains retained mask results plus exact (mask_key, cell signature) missing-coverage repair.

Version 0.10.3 Update

Changed dimension and dimension-option mutations to retain existing mask-result rows instead of deleting or staling them. Old results remain effective when their exact (mask_key, cell signature) still matches current work.
Kept _missing_combinations_for_mask(...) as the canonical coverage repair rule. New dimensions/options create new cell signatures, so auto mask workers generate only missing coverage.
Dimension/option semantic edits keep previous mask judgments by stable keys. This intentionally accepts the risk that old evidence can be less precise after wording changes, in exchange for continuity and lower rerun cost.
Dimension/option deletes still remove dependent segmented rows, while persisted mask-result rows are retained and ignored by current-grid readers when their removed references no longer fit the active grid.

Version 0.10.2 Update

Kept cloned grids sharing infos_id, but made Infos source signatures and generation prompts semantic-only. Dimension keys and option keys are no longer part of reusable Infos freshness or prompt context.
Added keyless InfosDimensionContext / InfosDimensionOptionContext payloads for Infos generation, preserving names, descriptions, preset roles, selection mode, niche flags, and option type.
Added a shared-Infos stale guard: when multiple grids reference the same Infos row and one grid's semantic context diverges, that grid creates and links a new Infos row instead of overwriting the shared row.
Added grid infosId counting/index support and focused clone/freshness tests proving stable key remaps do not stale shared Infos, while real semantic edits still trigger regeneration.

Version 0.10.1 Update

Preserved DimensionOptionType.OTHER and option niche when cloning grid dimensions, so cloned grids keep fallback buckets hidden and non-maskable.
Dimension-option create, update, and delete callbacks now replace runtime dimensions from the full persisted post-normalization payload. This prevents in-memory drift when OTHER is inserted, collapsed, moved last, or preserved after deletes.
Aggregated cell displays now expose structured metric summaries for preset demand and potential-sales rows. Prompt-facing grid strings render the customer-pool and monthly-sales ranges with evidence, so follow-up agents see the numeric basis instead of only mask labels.
Infos context remains enforced for preset market masks because demand, potential sales, low-feasibility, and opportunity stages need the shared market story and seller methodology. Custom masks intentionally stay bounded by their explicit mask description.
Added focused mock and IRL coverage for clone option roles, runtime dimension sync after option actions, metric aggregation output, prompt-facing metric labels, and the custom-mask no-Infos policy.

Version 0.10.0 Update

Replaced mask-result AI output cell with schema-owned Area (ROW or CELLS) while keeping persisted MaskedResultBase.cell as one concrete stored combination row. There is no backward-compatible dimension_and_option_list alias.
Added segmenting-owned Infos storage/generation with grid infos_id, source signatures, optional parent context, and stale refresh before direct and queued mask work.
Added cell_contains_other(...) and excluded DimensionOptionType.OTHER from normal/custom mask cells, preset cartesian cells, missing-mask planning, and prompt-facing grid dimensions/tables.
Changed preset market masks: MARKET_DEMAND and POTENTIAL_SALES are now single metric channels with stored numeric ranges; LIKELY_UNACHIEVABLE and MARKET_OPPORTUNITY_HIGHLIGHT remain boolean.
Added the seller cake-theory, replacement-cake, budget-as-cake, and 3-4-year lifecycle methodology as enforced demand/sales prompt context so estimates use Infos, positionable-item context, comparable markets, and tool evidence instead of sample-only counts.
Added preset stage groups: base demand/feasibility, dependent potential sales, then final opportunity. Dependent stages receive exact prior demand and sales evidence, and feasibility evidence only for allowlists that include LIKELY_UNACHIEVABLE.
Dimension creates/updates and dimension-option updates now invalidate affected stored mask rows so regenerated masks use current cell semantics.
Dimension-option deletion reuses the same option replacement helper, so the single OTHER invariant is preserved after deletes.
IRL coverage now includes Infos generation, current-Infos preset queueing through all four market stages, and HTTPX SOCKS proxy support for live tests.

Version 0.9.1 Update

Tightened the internal normalize_dimension_options(...) generic bound from broad BaseModel to a structural protocol exposing name, option_type, and pydantic model_copy(...).
This keeps the Phase 1 typed OTHER behavior unchanged while removing Pylance/Pyright attribute-access diagnostics in the normalizer helper.

Version 0.9.0 Update

Added DimensionOptionType with STANDARD and OTHER roles across AI, persisted, create, and update option schemas.
Added normalize_dimension_options(...) as the canonical option normalizer: fallback labels such as unknown, unmentioned, not provided, German fallback labels, and Chinese 未提及 / 其他 labels collapse into one OTHER bucket placed last.
AI-created and AI-updated dimension paths now ensure exactly one OTHER option. Late _asegment_dimension(...) fallback new_options map to the existing OTHER bucket instead of creating duplicate fallback rows.
_asegment_dimension(...) now stores explicit unresolved in-scope AI unmatched_items in OTHER when the dimension has that bucket; OMITTED remains reserved for invalid model output, untouched batch items, or malformed legacy dimensions without OTHER.

Version 0.8.6 Update

SegmentingClient.apreset_mask_stage_job(...) accepts the optional AutoSegmentQueueContext used by queued auto-segment runs and forwards it into abuild_agent_deps(...).
Preset mask cell context providers now receive owner_type from the outer SegmentingAgentDeps.auto_segment_queue_context, matching the actual runtime ownership boundary.
Preview flows remain unchanged: callers can still disable generated masks through mask_generation_enabled=False.

Version 0.8.5 Update

AutoSegmentAndMaskAction now accepts optional mask_generation_enabled. None preserves the client/global default; False lets host-owned preview flows build lean dimension/cell grids without any generated mask work.
SegmentingClient.aauto_segment_and_mask(...) forwards the general mask override through the existing canonical action path.
The old preset-only mask flag was removed rather than kept as an alias, so preview callers must use the general mask toggle.

Version 0.8.4 Update

DiscoverDimensionsAction can now carry optional preset_dimension_types so host flows can request only known axes such as pricing and features/functionality.
SegmentingClient.adiscover_dimensions(...) forwards the requested preset roles through the existing canonical action runtime; normal open-ended discovery is unchanged when the list is empty.
The LLM prompt and handler both enforce the requested preset roles, with a typed validation error if a required role is missing from new or existing dimensions.

Version 0.8.3 Update

AutoSegmentAndMaskAction added a preview-facing mask planning override through the existing canonical action path, avoiding a second preview-only queue API.
MARKET_OPPORTUNITY_HIGHLIGHT guidance now explicitly treats under-250 observed groups as limited sample context and asks evidence to say whether external or comparable-market evidence was used, unavailable, or intentionally not used.

Version 0.8.2 Update

MARKET_OPPORTUNITY_HIGHLIGHT evidence must now say whether external or comparable-market evidence was used.
Empty or small-sample cells can no longer be finalized from sample-only reasoning while a parent-owned market-evidence tool is available.
TRUE empty-cell opportunities must name the positive analogy, willingness-to-pay clue, low-competition clue, or unserved customer-sector reason behind the highlight.

Version 0.8.0 Update

Market-opportunity preset prompt guidance now treats observed samples as non-exhaustive evidence instead of the whole market.
Empty observed cells can be highlighted when comparable markets, web evidence, adjacent demand, pricing willingness, or seller-profile fit support a plausible white-space opportunity.
The prompt now asks for optimistic but selective opportunity reasoning that explains demand, competition, feasibility, incumbent barriers, and seller capability rather than rejecting cells solely because no sample product exists.

Version 0.7.1 Update

Auto-segment progress events now derive the generic AgenticActionEvent.status from the typed progress stage. Failed progress emits FAILED, terminal success emits SUCCEEDED, and intermediate work emits STARTED or PROGRESSED so SSE diagnostics cannot mistake failures for successful action events.

Version 0.7.0 Update

Added typed AUTO_SEGMENT_PROGRESS events for the auto-segment room stream. Payloads carry run/task/root/grid ids, stage, message, completion counts, normalized percent source value, and update time.
Strengthened preset-mask prompts so search-enabled demand, opportunity, and feasibility stages must use external/comparable-market evidence instead of treating Exa-style search as optional when local context seems sufficient.
Opportunity prompts now explicitly cover limited sample size, empty observed cells, adjacent customer sectors, unserved demand, competition, incumbent brand barriers, and entry barriers.

Version 0.6.1 Update

Removed the unused untyped CRUDSegmentedResult.aiter_item_dimension_maps(...) helper instead of carrying a second grouping path after the option_keys hard-break.
Cleaned CRUD/action/client tests so direct segmented-result payloads use option_keys everywhere. Dimension options and mask results still use their singular option_key fields because those are separate contracts.
The canonical read path for grouped placements remains aggregation; callers should use aggregate/view APIs instead of ad hoc per-item dimension maps.

Version 0.6.0 Update

Segmenting results now persist option_keys instead of the old singular option_key; this is a breaking current-line contract with no old-data fallback or migration.
Dimensions carry selection_mode: pricing is single-select, features and scenes are multi-select by default, and custom dimensions use AI output plus schema-owned conservative inference.
Aggregation expands matched multi-select rows into cartesian cell placements, so item_counts means placement count for the displayed cell.
Preset-mask prompts can receive bounded parent Exa tools for demand, feasibility, and opportunity stages; normal/custom mask prompts remain tool-free.

Version 0.4.9 Update

Default AUTO_SEGMENT_AND_MASK bootstrap now skips custom non-preset masks together with custom dimensions, keeping the first run display-critical only.
Explicit lazy dimension_keys requests still queue custom non-preset masks so selected non-default axes can receive mask analysis after the user chooses them.
This avoids heavy bootstrap work when grids already have custom masks but the positioning room has not selected those axes yet.

Version 0.4.8 Update

AutoSegmentAndMaskAction now accepts optional dimension_keys for lazy selected-axis segmentation.
Default auto-segment bootstrap queues only display-critical preset dimensions: pricing, features/functionality, and scene/use-case. Custom axes are left for explicit lazy room selection.
Preset mask target planning is scoped to the selected/default dimension set, and unknown explicit dimension keys raise typed client-side errors before queue fanout.

Version 0.4.7 Update

Aggregate reads now synthesize missing PROCESSING/EMPTY cartesian cells only when include_synthetic_status_cells=True.
Full-grid prompt/admin reads keep the default False behavior so large grids return only item-backed and stored mask/reasoning-backed cells.
The fallback-option ordering helper remains internal; public callers should use DimensionOptionByAi.merge_semantic_duplicates(...) or DimensionBase.build_from_dimensions_by_ai(...).

Version 0.4.6 Update

Dimension prompts and DimensionOptionByAi normalization allow one fallback/unclear/other option only when needed, then keep that fallback last across Chinese, English, and German naming variants.
Auto-segment dimension work now enqueues preset roles in priority order: pricing, features/functionality, scene/use-case, then custom dimensions in grid order.
CRUDMaskedResultRelationalReasoning.aupsert_multi(...) keeps created insert-only through $setOnInsert, matching the mask-result bulk upsert pattern and avoiding Mongo update-path conflicts.
Aggregate reads now expose aggregate_status on AggregatedCellDisplay with PROCESSING, EMPTY, AGGREGATED, and UNAVAILABLE cell states.

Version 0.4.5 Update

Dimension option mutations now preserve the parent dimension's preset_dimension_type, including create/update/reorder option actions and auto-segment-discovered option appends.
Preset-mask creation now links created or recovered preset mask ids into grid.masks, so queued preset-mask stage workers can rebuild runtime state and load the mask by key.
Queued mask workers can recover a persisted mask row by (grid_id, mask_key) when runtime scope is stale, then repair the grid mask scope for future runs.

Version 0.4.4 Update

Preset role inference now recognizes conservative Chinese/German pricing dimension wording such as price bands, budgets, costs, and German price levels.
Removed the broad standalone Chinese 使用 scene keyword so generic usage or instruction dimensions do not become scene/use-case dimensions.
Added focused normalizer regressions for non-English pricing and generic usage non-scene behavior.

Version 0.4.3 Update

Dimension discovery now normalizes preset dimension roles before persistence and preserves preset_dimension_type for both existing and newly discovered dimensions.
Preset role inference covers conservative Chinese/German scene/use-case and feature/functionality terms. Pricing remains conservative; no true pricing dimension means no pricing-anchored preset-mask work.
Added focused regressions for preset role persistence and the no-pricing preset-mask policy.

Version 0.4.2 Update

Added an explicit queue-stage contract regression proving AutoSegmentPresetMaskStage may share wire values with PresetMaskType while remaining a queue/progress stage enum, not the persisted semantic mask role field.

Version 0.4.0 Update

Aggregate reads now union item-backed cell signatures with stored mask results and preset reasoning signatures, so empty analyzed preset-mask cells remain visible in read models.
Preset-mask omission is documented as neutral unknown/unclassified work: missing model rows are completed for progress accounting, not failures, false classifications, or lowest-option outputs.
AggregatedCellDisplay.build_empty_masked_cell(...) owns empty-cell display construction, keeping aggregation code focused on signature collection.
Typed auto worker event exports cover dimension, mask, and preset-mask stages for backend/core/frontend room propagation.

Version 0.3.1 Update

MaskedResultBase.build_signature(...) accepts any sequence of DimensionAndOption; callers should not copy the list before signing.
CRUDMaskedResult.aupsert_multi(...) writes typed Mongo payloads and keeps created insert-only through $setOnInsert, avoiding bulk-write path conflicts while preserving combination-signature identity.
Focused mock/IRL CRUD tests use the current cell contract, not removed single-pair mask result fields.

Version 0.1.25 Update

Rejects blank normalized dimension and mask names before direct create/update writes and before AI discovery persistence.
Keeps duplicate discovery merging from 0.1.24, but now treats an empty semantic name as invalid instead of allowing it to fail later in runtime name maps.

Version 0.1.23 Update

Hard-breaks action events to canonical action_type only; old discriminator inputs and read aliases are removed from schemas and tests.
Keeps result storage bootstrapping focused on current grid-scoped indexes without obsolete malformed-row cleanup in CRUD startup.

Version 0.1.22 Update

Publishes the current gridless runtime and GridAgentContextRequest contract for downstream positioning/backend wheels.
Keeps prompt context grid-only; task graph and relation enrichment remain outside this package.

MS7 Branch Note

Branch feature/room-display-chat-contract tightens the room display contract. The frontend room consumes aggregate cell counts, aggregate item_keys, typed cell_display.description / cell_display.images, direct mutation events, and authoritative positioning view refetches. Result fanout remains tolerant, and grouped followers stay metadata-only.

Why

This library helps a parent application maintain a typed positioning grid over external items.

The parent module owns the item source. keble-segmenting owns:

grid, dimension, mask, and result storage
typed mutation actions
agent-facing context projection
one unified mutation tool
cloning and aggregation helpers

Canonical Runtime

Use these surfaces as the source of truth:

SegmentingClient: public async client
SegmentingClient.abuild_agent_deps(...): public grid-agnostic deps-builder for backend/runtime integration
keble_segmenting.agent.handler.aapply_actions(...): canonical action executor
keble_segmenting.agent.register_mutation_tools(...): singular agentic mutation tool registration
keble_segmenting.client.cloning: grid cloning workflow
keble_segmenting.client.aggregations: aggregated cell-display workflow

Do not build new orchestration around older client-side action engines or obsolete background mutation wrappers. The agent runtime is explicit per grid: SegmentingGridRuntime owns the loaded grid structure, structural indexes, Redis progress, lazy item cache, segmented result cache, and masked result cache for one grid. ActionObjs, active_grid_id, and active-grid proxy properties are not part of the runtime contract. Handler and worker helpers receive the resolved runtime explicitly, while Actions.grid_id remains the single public low-level selector for one ordered action batch.

Action events use the shared keble_helpers.AgenticActionEvent envelope through the package-local ActionEvent type. Segmenting still exposes EventCallbacks for callers, but callback failures now propagate to the action executor instead of being logged and ignored. Build callback containers with EventCallbacks.build(...) instead of standalone normalizer helpers.

AUTO_SEGMENT_AND_MASK is now explicit queue scheduling, not inline fanout completion. Callers that want auto work must include one terminal AutoSegmentAndMaskAction; backend supplies the queue scheduler while segmenting supplies typed queue requests, worker job APIs, and direct worker completion events such as SEGMENT_DIMENSION_JOB_COMPLETED.

Queued worker execution also emits batch-level mutation events as CRUD happens. Rooms should treat CREATE_DIMENSION_OPTIONS, UPSERT_RESULTS, and UPSERT_MASK_RESULTS as the store-mutation events; aggregate worker completion events are diagnostic. In 0.1.20, those mutation payloads include explicit room indexes such as grid_id, affected dimension_keys, item_keys, mask_keys, and option_keys so consumers do not infer routing only from row contents. Current room display contract keeps CellDisplay display-only: description is compact text, images are thumbnails, and AggregatedCellDisplay.item_keys carries the full item-key identity set for selection/detail sidebars. Parent item adapters own how those display fields are built; segmenting only carries the typed payload. Downstream TypeScript consumers should use keble-core 0.1.33+ TaskWorkspaceEvent.build(...) for these direct package payloads instead of frontend-local unknown payload normalization.

Core Concepts

SegmentableItemProtocol
- item interface owned by the parent module
- items are not persisted in this package
- the protocol provides key, prompt payloads, and representative/cell-display helpers
SegmentedGridCreate
- public metadata-only payload for creating a new empty grid
GridAgentContextRequest
- explicit prompt-context request for one grid
- requires compare_dimension_key
- optionally narrows visible dimensions with viewing_dimension_keys
Actions.grid_id
- explicit grid scope for one canonical action batch
ActionedResults
- ordered per-action results
- each concrete action result carries explicit grid_id
Dimension
- grid-bounded categorical axis with ordered DimensionOptions
Mask
- grid-bounded classifier with ordered MaskOptions
SegmentedResult
- (grid_id, item_key, dimension_key) -> option_keys plus evidence
MaskedResult
- (grid_id, mask_key, dimension_and_option) -> option_key plus evidence

Storage:

MongoDB stores grids, dimensions, masks, segmented results, and masked results
Redis stores SegmentingProgress for action execution progress and interruption

Startup

Use the async client startup hook:

from keble_segmenting import SegmentingClient

client = SegmentingClient(
    agentic_llm_list=[...],
    async_items_loader=...,
)

await client.aensure_indexes(amongo)

Notes:

agentic_llm_list is required for discovery and auto actions
async_items_loader is required for discovery and auto actions
aensure_indexes(...) is the canonical Mongo startup hook

Agent Context APIs

There are now two client context helpers:

aget_grid_for_agent(...)
- structured prompt-facing projection of one requested grid
- request-based: context=GridAgentContextRequest(...)
- includes finite compare-vs-each markdown tables built from real aggregated cells
aget_grid_context_string(...)
- lightweight string form of GridForAgent
aget_agent_context_string(...)
- canonical workspace-ready context string
- request-based: contexts=[GridAgentContextRequest(...), ...]
- includes schema meanings, action runtime rules, action type intentions, and one rendered grid section per request

Recommended pattern for upstream agents:

from keble_segmenting.agent import GridAgentContextRequest

context_text = await client.aget_agent_context_string(
    amongo,
    contexts=[
        GridAgentContextRequest(
            grid_id=grid_id,
            compare_dimension_key="benefit",
            viewing_dimension_keys=None,
        )
    ],
)

Important:

refresh agent context from the latest grid before each reasoning or mutation turn
do not keep a stale cached copy after actions mutate the grid
one prompt may describe multiple grids, but low-level mutation still stays single-grid per Actions batch
if you only need one rendered grid section, aget_grid_context_string(...) is the lighter helper

Actions

The canonical mutation contract is:

from keble_segmenting.schemas import Actions
from keble_helpers import AgenticActionWarningLevel

await client.aapply_actions(
    payload=Actions(
        message="Apply one typed segmenting batch.",
        warning_level=AgenticActionWarningLevel.SAFE,
        grid_id=grid_id,
        actions=[...],
    ),
    db_deps=db_deps,
    language=language,
)

Rules:

Actions is strongly typed; the client does not accept raw dict payloads
Actions.grid_id is required on the canonical action path
nested action/input payloads do not repeat grid_id
concrete ActionedResult payloads do carry explicit grid_id
progress_task is optional and parent-owned on AgentDbDeps.progress_task; segmenting only emits set_message(...) updates through it
action batches execute sequentially
AUTO_SEGMENT_AND_MASK is canonicalized to at most one terminal batch action
create-dimension, create-dimension-option, create-mask, create-mask-option, discover-dimensions, and discover-masks do not imply hidden auto work; callers must add AutoSegmentAndMaskAction explicitly
direct delete actions for segmented results and mask results are not part of the canonical action surface

Runtime notes:

low-level execution is still single-grid per Actions batch
agent deps now keep lazy per-grid runtimes internally, so one session can know multiple grids without eagerly loading all result state
SegmentingAgentDeps keeps auto_segment_queue_scheduler and auto_segment_queue_context on the deps root so explicit auto actions can queue backend worker jobs from the handler path
prompt context tables use aggregated cell summaries only: item count, cell_display title/description, and aligned mask labels

When a parent runtime wants human-readable stage updates during discovery or auto:

from keble_helpers import ProgressTask

segmenting_task = (
    request.resources.progress_task.new_subtask()
    if request.resources.progress_task is not None
    else None
)

results = await client.aapply_actions(
    payload=payload,
    db_deps=db_deps,
    language=language,
    progress_task=segmenting_task,
)

Progress-task contract:

pass the task or subtask explicitly through the public client argument; the built agent deps then expose it at deps.progress_task
keble-segmenting only calls set_message(...)
parent repos keep ownership of subtask creation and terminal success/failure
Redis SegmentingProgress stays numeric/interruption-focused and does not store these human messages
the human-readable message wording is intentionally varied; treat stage order and persisted state as the stable contract, not exact phrasing

Public direct client helpers map 1:1 to the canonical actions:

grid meta: aupdate_grid_meta(...)
dimensions: acreate_dimensions(...), aupdate_dimensions(...), adelete_dimensions(...)
dimension options: acreate_dimension_options(...), aupdate_dimension_options(...), adelete_dimension_options(...)
masks: acreate_masks(...), aupdate_masks(...), adelete_masks(...)
mask options: acreate_mask_options(...), aupdate_mask_options(...), adelete_mask_options(...)
direct rows: aupsert_results(...), aupsert_mask_results(...)
reorder: areorder_dimensions(...), areorder_masks(...), areorder_dimension_options(...), areorder_mask_options(...)
discovery: adiscover_dimensions(...), adiscover_masks(...)
auto: aauto_segment_and_mask(...)

Grouped positioning followers are parent-module metadata, not segmenting rows. When a parent loader skips follower keys to save tokens, keble-segmenting persists results only for the explicit item_keys it receives and must not copy main/group result rows to follower item keys. Worker dimension jobs also reject model-returned item keys outside the current batch, so model key typos cannot create orphan segmented rows.

Queued worker events are emitted as direct package events. Backend/SSE transports may fill root_id, object_id, and correlation_id, but they should not wrap or rename the package event.

event = AutoSegmentDimensionJobEvent(
    payload=AutoSegmentDimensionJobResult(
        dimension_key="price_tier",
        upserted_result_count=12,
    )
)

assert event.source == "keble-segmenting"
assert event.action_type == "SEGMENT_DIMENSION_JOB_COMPLETED"

Example:

from keble_segmenting.schemas import CreateDimensionInput, CreateDimensionOptionInput

result = await client.acreate_dimensions(
    db_deps=db_deps,
    language=language,
    grid_id=grid_id,
    dimensions=[
        CreateDimensionInput(
            name="Price Positioning",
            niche=False,
            options=[
                CreateDimensionOptionInput(
                    name="Budget",
                    description="Low-price entry option.",
                    niche=False,
                ),
                CreateDimensionOptionInput(
                    name="Premium",
                    description="Higher-price premium option.",
                    niche=False,
                ),
            ],
        )
    ],
)

Agent Tool Registration

The canonical agent tool surface is one singular mutation tool:

from keble_segmenting.agent import SegmentingAgentDeps, register_mutation_tools

agent = Agent[SegmentingAgentDeps, Any](...)

register_mutation_tools(agent)

Registered tool:

mutate_segmenting
- takes payload: Actions
- delegates directly into the unified action runtime

Deps shape:

SegmentingAgentDeps inherits keble_db.AgentDbDeps; do not pass Mongo/Redis as separate tool args.
Segmenting runtime state is under ctx.deps.segmenting.
Composite parent deps should inherit SegmentingAgentDeps and provide the same .segmenting namespace instead of manually re-registering backend-owned copies of this tool.

Optional tool customization:

from keble_segmenting.agent import register_mutation_tools
from keble_helpers import AgentToolConfig
from keble_segmenting.agent.schemas import MutationToolsConfig

register_mutation_tools(
    agent,
    tools_config=MutationToolsConfig(
        mutate_segmenting=AgentToolConfig(
            name="mutate_segmenting",
            description="Apply one typed segmenting action batch.",
            requires_approval=True,
        ),
    ),
)

Grid CRUD, Cloning, and Aggregation

Grid shell APIs:

acreate_grid(...) creates an empty metadata-only grid
aget_grid(...), aget_grid_by_id(...), aget_grids(...) read grids
aget_grid_detail(...) loads the grid plus ordered dimensions and masks
adelete_grid(...) cascades through dimensions, masks, segmented results, and masked results

Grid cloning:

clone = await client.aclone_grid(
    amongo,
    grid_id=grid_id,
)

aclone_grid(...) clones the full grid and returns GridCloneResult with the new grid plus key remapping maps.

Aggregation:

cell_displays = await client.aget_aggregated_cell_displays(
    amongo,
    grid_detail=grid_detail,
    items=items,
    include_synthetic_status_cells=False,
)

AggregatedCellDisplay is the canonical aggregation output:

full ordered cell
aligned flattened masks_and_options
typed mask_reasonings for preset mask decisions
item_counts
cell_display
optional synthetic PROCESSING/EMPTY status cells when the caller passes explicit display dimensions and sets include_synthetic_status_cells=True

Guidance For Other Agents

When another agent or workflow consumes this package:

load fresh context from aget_agent_context_string(...) before reasoning about the current grid
build mutations with typed Actions, not ad hoc dicts
use stable keys for update, delete, reorder, and direct result actions
assume the README and the client context API should stay aligned; if one changes, update the other in the same pass

Runtime Notes

The current lazy multi-grid runtime is explicit per grid:

SegmentingGridRuntime owns the loaded grid, dimensions, masks, indexes, optional progress, and lazy item/result caches for exactly one grid
deps.segmenting.loaded_grid_runtimes is the only runtime map for multi-grid sessions
deps.aget_or_load_grid_runtime(grid_id=...) loads structural context without creating execution progress
deps.aensure_executing_grid_runtime(grid_id=...) returns the target runtime and creates Redis progress only for that executing grid
handler/helper code receives runtime explicitly instead of reading an implicit active grid from deps

The nested deps.segmenting namespace owns only shared session state:

client/config adapters
loaded_grid_runtimes
accumulated actioned_results

Removed runtime surfaces:

do not target deps.action_objs, deps.progress, or other active-grid proxy properties in downstream tests or helpers
do not restore active_grid_id, aactivate_grid_runtime(...), or require_active_runtime(...)
do not restore the removed ActionObjs model; item loading is owned by SegmentingAgentDeps.aensure_all_items_loaded(runtime=...)

Prompt tables also follow two important correctness rules:

table cells are indexed by stable option keys, not by human labels
when multiple hidden cells collapse into one projected cell, the prompt copy is marked as merged instead of reusing one hidden cell's title/description

Latest Diagnostic Timing Note

The IRL client suite now includes one diagnostic-only timing probe:

tests/irl/agent/test_handler/test_stage_latency_breakdown.py

It measures:

adiscover_dimensions(...)
aauto_segment_and_mask(...)
discovered dimension and option counts
persisted segmented-result counts

This probe is intentionally diagnostic, not a hard latency budget gate. Use it to see whether current runtime cost is dominated by dimension discovery or by the downstream per-item segmentation pass.

Latest Discovery Name Validation Note

On 0.1.25, AI-discovered dimensions and masks are normalized before persistence:

same-name discovered dimensions merge into one first-name-preserving dimension
same-name discovered masks merge into one first-name-preserving mask
duplicate options under the merged dimension or mask keep the first display rule
dimension niche flags are merged with any(...)
blank normalized dimension and mask names fail before persistence

Direct create/update handlers now preflight the final grid namespace before any write, so direct actions cannot leave a grid with duplicate normalized dimension or mask names, including blank normalized names. build_unique_name_map(...) remains strict and should still fail if corrupted persisted state already exists.

0.2.0 Preset Dimensions And Combination Masks

keble-segmenting 0.2.0 intentionally breaks the old single dimension_and_option mask-result contract. Mask results now persist one full cell plus a stable dimension_option_signature, so masks can classify two-dimensional preset cells such as pricing by use case or pricing by feature family.

The package stays domain-generic. Parent services may inject typed SegmentingCellContextProvider context and preset-market scoped tool registrars, while this package owns only segmenting schemas, CRUD, aggregation, LLM prompts, and queue contracts. Preset mask roles cover market demand, opportunity highlight, and low feasibility, with enhanced reasoning stored in MaskedResultRelationalReasoningMongoObject and surfaced through aggregate cells.

0.2.1 Preset Mask Review Follow-Up

keble-segmenting 0.2.1 closes the first review gaps on top of the breaking combination-mask line:

Parent-owned SegmentingToolRegistrar hooks allow backend to provide generic tools such as market evidence search without importing backend code into this package.
SegmentingActionConfig.mask_generation_enabled controls whether package auto-segmenting plans generated mask work at all, not only whether parent cell context is available.
Preset dimension normalization still keeps first explicit AI tags, but it now infers obvious missing PRICING, SCENE_OR_USE_CASE, and FEATURES_OR_FUNCTIONALITY roles from generated names/descriptions so preset masks do not silently skip when the AI omitted a clear tag.

0.2.2 Preset Market Tool Scope

keble-segmenting 0.2.2 narrows the parent-tool boundary introduced in the previous review pass. SegmentingClient accepted the old market-named registrar boundary, and the runtime passed those tools only into the MARKET_DEMAND preset mask prompt. As of 0.6.0, the current boundary is tool_registrars, which can supply bounded Exa tools to demand, low-feasibility, and opportunity preset-mask prompts while keeping dimension discovery, dimension merge, and normal/custom masks tool-free.

Preset reasoning persistence also separates fields more strictly: reasoning stores the AI explanation from the mask result, while evidence stores the parent-provided cell source digest and evidence lines. When no parent source digest exists, the row says so explicitly instead of duplicating the AI reasoning text.

0.3.0 Preset Mask Contract Hardening

keble-segmenting 0.3.0 renames the system-owned preset role contract from display semantics to PresetMaskType. AI/user-created masks no longer accept a preset role field; only MaskBase.build_preset_mask(...) can create masks tagged with preset_mask_type. Normal update/delete/option/reorder actions now reject structural mutations on preset masks, while the auto preset pipeline can still persist combination mask results.

The cartesian preset-cell builder now lives on DimensionAndOptionCombination.build_cartesian(...), and mask-result signatures accept generic Sequence[DimensionAndOption] inputs.

0.4.0 Empty Preset Mask Cells

keble-segmenting 0.4.0 keeps preset-mask omission neutral: when the model does not emit a row for a cell, that cell is completed for progress accounting but remains unknown/unclassified rather than failed, FALSE, or the lowest option. Aggregate reads now union item-backed cells with stored mask and preset reasoning signatures, so an empty analyzed cell can render mask overlays and reasoning without copying item rows into that cell.

0.4.1 Preset Mask Naming Cleanup

keble-segmenting 0.4.1 keeps the 0.4.0 empty analyzed-cell behavior and cleans the remaining display-worded preset terminology from the current schema descriptions and docs. Public code should refer to PresetMaskType and preset mask roles.

0.5.0 Terminal Segmentation Omissions

keble-segmenting 0.5.0 persisted AI omissions as terminal SegmentedResultStatus.OMITTED rows. As of 0.9.0, explicit in-scope unresolved AI matches use the canonical DimensionOptionType.OTHER bucket when the dimension owns one. Omitted rows now mean invalid model output, untouched batch items, out-of-scope/safety failures, or malformed legacy dimensions without OTHER; omitted rows still do not participate in option-key aggregation.

Matched rows required one selected option in 0.5.0; as of 0.6.0, matched rows store non-empty option_keys. Aggregated cell reads use matched rows to build placements and use all terminal rows to decide whether synthetic cells are still processing or settled empty. This prevents one omitted item from keeping a lazy axis in perpetual processing.

0.6.0 Multi-Select Segmenting Results

keble-segmenting 0.6.0 hard-breaks the persisted result contract from singular option_key to option_keys.

DimensionSelectionMode.SINGLE_SELECT requires exactly one matched option key.
DimensionSelectionMode.MULTI_SELECT allows one item to belong to multiple options under the same dimension.
SegmentedResultStatus.OMITTED rows keep option_keys=[]; omitted remains terminal unknown/unclassified and is not Other/Unclear.
Aggregate reads expand multi-select display axes into cartesian cell placements while mask cell identity remains one DimensionAndOption per dimension.

0.6.1 Result Contract Cleanup

keble-segmenting 0.6.1 removes the unused untyped item/dimension-map iterator that duplicated aggregation behavior after the multi-select rewrite. Tests now exercise direct result writes through option_keys only; singular option_key remains valid only for dimension options, mask options, and mask result choices.

0.8.1 Opportunity Evidence Context

keble-segmenting 0.8.1 keeps the auto-segment queue context slim and carries only one new optional owner field:

AutoSegmentQueueContext.owner_type lets backend context providers load the correct user/org seller profile for preset masks.
Market-opportunity prompt guidance now treats an empty observed sample as missing direct sample evidence, not as proof that demand is absent.
When external tools are available, opportunity analysis should use comparable market and web evidence before final TRUE/FALSE decisions for empty or small-sample cells.

0.11.3 Positioning Evidence And Sample Metadata Correction

The current positioning correction keeps package version 0.11.3 but tightens the user-facing display contract:

CellImageDisplay may carry optional CellImageDisplayMetadata for public sample-image popovers: title, ASIN, brand, marketplace, price, rating, reviews, and monthly sold. Do not put item keys, stable keys, graph ids, or transport payloads in this metadata.
Evidence persisted on segmented results, mask results, metric ranges, and preset relational reasoning is sanitized before storage through sanitize_user_evidence(...).
Prompts should request JSON-only structured responses, but evidence string fields may use markdown-safe GFM prose. Do not emit markdown outside the JSON response boundary.

Project details

Release history Release notifications | RSS feed

This version

0.11.12

May 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keble_segmenting-0.11.12.tar.gz (717.6 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

keble_segmenting-0.11.12-py3-none-any.whl (196.3 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file keble_segmenting-0.11.12.tar.gz.

File metadata

Download URL: keble_segmenting-0.11.12.tar.gz
Upload date: May 22, 2026
Size: 717.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for keble_segmenting-0.11.12.tar.gz
Algorithm	Hash digest
SHA256	`b990bf8bed49189488cddd1f1bde78efd4268b3fbf5ca424b0318aa445800b93`
MD5	`7afbb424dbd536158760de617b22d9ec`
BLAKE2b-256	`83fc99a1ac04fa0a16ecaa87c54d01743ab7d166e216c3672a2df4a9a738cebe`

See more details on using hashes here.