Defines standardized interfaces and data structures for text processing plugins, enabling modular NLP operations like sentence splitting, tokenization, and chunking within the cjm-plugin-system ecosystem.
Project description
cjm-text-plugin-system
Install
pip install cjm_text_plugin_system
Project Structure
nbs/
├── core.ipynb # DTOs for text processing with character-level span tracking
└── plugin_interface.ipynb # Domain-specific plugin interface for text processing operations
Total: 2 notebooks
Module Dependencies
graph LR
core[core<br/>Core Data Structures]
plugin_interface[plugin_interface<br/>Text Processing Plugin Interface]
plugin_interface --> core
1 cross-module dependencies detected
CLI Reference
No CLI commands found in this project.
Module Overview
Detailed documentation for each module in the project:
Core Data Structures (core.ipynb)
DTOs for text processing with character-level span tracking
Import
from cjm_text_plugin_system.core import (
TextSpan,
TextProcessResult
)
Classes
@dataclass
class TextSpan:
"Represents a segment of text with its original character coordinates."
text: str # The text content of this span
start_char: int # 0-indexed start position in original string
end_char: int # 0-indexed end position (exclusive)
label: str = 'sentence' # Span type: 'sentence', 'token', 'paragraph', etc.
metadata: Dict[str, Any] = field(...) # Additional span metadata
def to_dict(self) -> Dict[str, Any]: # Dictionary representation
"Convert span to dictionary for serialization."
@dataclass
class TextProcessResult:
"Container for text processing results."
spans: List[TextSpan] # List of text spans from processing
metadata: Dict[str, Any] = field(...) # Processing metadata
Text Processing Plugin Interface (plugin_interface.ipynb)
Domain-specific plugin interface for text processing operations
Import
from cjm_text_plugin_system.plugin_interface import (
TextProcessingPlugin
)
Classes
class TextProcessingPlugin(PluginInterface):
"""
Abstract base class for plugins that perform NLP operations.
Extends PluginInterface with text processing requirements:
- `execute`: Dispatch method for different text operations
- `split_sentences`: Split text into sentence spans with character positions
"""
def execute(
self,
action: str = "split_sentences", # Operation to perform: 'split_sentences', 'tokenize', etc.
**kwargs
) -> Dict[str, Any]: # JSON-serializable result
"Execute a text processing operation."
def split_sentences(
self,
text: str, # Input text to split
**kwargs
) -> TextProcessResult: # Result with TextSpan objects containing character indices
"Split text into sentence spans with accurate character positions."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cjm_text_plugin_system-0.0.1.tar.gz.
File metadata
- Download URL: cjm_text_plugin_system-0.0.1.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93e0b1ea206de553d2084923477809f2b03e8474d271203cf68edd05cd9345f1
|
|
| MD5 |
52564999db6545780b79dcf50b8c7fce
|
|
| BLAKE2b-256 |
bf3a33f355b27938355dbfaf3c3a1b8fe614e271161a734d91d150f0a7ac4098
|
File details
Details for the file cjm_text_plugin_system-0.0.1-py3-none-any.whl.
File metadata
- Download URL: cjm_text_plugin_system-0.0.1-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4dc1b7464fc663f63d4148dd32ea69354a95b212ee95358c8aa96d7e200d6fe
|
|
| MD5 |
69ca0aa9f23b3663054fa19a63ef49e3
|
|
| BLAKE2b-256 |
31edc8a304eea35f892c6215b762e81ee7fd16c3d7a8c498f5c13c8d3e9ddf78
|