Skip to main content

Defines standardized interfaces and data structures for text processing plugins, enabling modular NLP operations like sentence splitting, tokenization, and chunking within the cjm-plugin-system ecosystem.

Project description

cjm-text-plugin-system

Install

pip install cjm_text_plugin_system

Project Structure

nbs/
├── core.ipynb             # DTOs for text processing with character-level span tracking
└── plugin_interface.ipynb # Domain-specific plugin interface for text processing operations

Total: 2 notebooks

Module Dependencies

graph LR
    core[core<br/>Core Data Structures]
    plugin_interface[plugin_interface<br/>Text Processing Plugin Interface]

    plugin_interface --> core

1 cross-module dependencies detected

CLI Reference

No CLI commands found in this project.

Module Overview

Detailed documentation for each module in the project:

Core Data Structures (core.ipynb)

DTOs for text processing with character-level span tracking

Import

from cjm_text_plugin_system.core import (
    TextSpan,
    TextProcessResult
)

Classes

@dataclass
class TextSpan:
    "Represents a segment of text with its original character coordinates."
    
    text: str  # The text content of this span
    start_char: int  # 0-indexed start position in original string
    end_char: int  # 0-indexed end position (exclusive)
    label: str = 'sentence'  # Span type: 'sentence', 'token', 'paragraph', etc.
    metadata: Dict[str, Any] = field(...)  # Additional span metadata
    
    def to_dict(self) -> Dict[str, Any]:  # Dictionary representation
        "Convert span to dictionary for serialization."
@dataclass
class TextProcessResult:
    "Container for text processing results."
    
    spans: List[TextSpan]  # List of text spans from processing
    metadata: Dict[str, Any] = field(...)  # Processing metadata

Text Processing Plugin Interface (plugin_interface.ipynb)

Domain-specific plugin interface for text processing operations

Import

from cjm_text_plugin_system.plugin_interface import (
    TextProcessingPlugin
)

Classes

class TextProcessingPlugin(PluginInterface):
    """
    Abstract base class for plugins that perform NLP operations.
    
    Extends PluginInterface with text processing requirements:
    - `execute`: Dispatch method for different text operations
    - `split_sentences`: Split text into sentence spans with character positions
    """
    
    def execute(
            self,
            action: str = "split_sentences",  # Operation to perform: 'split_sentences', 'tokenize', etc.
            **kwargs
        ) -> Dict[str, Any]:  # JSON-serializable result
        "Execute a text processing operation."
    
    def split_sentences(
            self,
            text: str,  # Input text to split
            **kwargs
        ) -> TextProcessResult:  # Result with TextSpan objects containing character indices
        "Split text into sentence spans with accurate character positions."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cjm_text_plugin_system-0.0.1.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cjm_text_plugin_system-0.0.1-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file cjm_text_plugin_system-0.0.1.tar.gz.

File metadata

  • Download URL: cjm_text_plugin_system-0.0.1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for cjm_text_plugin_system-0.0.1.tar.gz
Algorithm Hash digest
SHA256 93e0b1ea206de553d2084923477809f2b03e8474d271203cf68edd05cd9345f1
MD5 52564999db6545780b79dcf50b8c7fce
BLAKE2b-256 bf3a33f355b27938355dbfaf3c3a1b8fe614e271161a734d91d150f0a7ac4098

See more details on using hashes here.

File details

Details for the file cjm_text_plugin_system-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for cjm_text_plugin_system-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a4dc1b7464fc663f63d4148dd32ea69354a95b212ee95358c8aa96d7e200d6fe
MD5 69ca0aa9f23b3663054fa19a63ef49e3
BLAKE2b-256 31edc8a304eea35f892c6215b762e81ee7fd16c3d7a8c498f5c13c8d3e9ddf78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page