Skip to main content

A Python library for semantic text chunking using Sentence Transformers.

Project description

Semantic Chunking

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

Features

  • Splits text into semantic chunks using cosine similarity
  • Configurable chunk size and similarity thresholds
  • Based on sentence-transformers

Installation

pip install semantic-chunking

USAGE

from semantic_chunking import SemanticChunker

def test_semantic_chunker():
    text = """Machine learning is a subset of artificial intelligence that focuses on the use of data and algorithms to imitate the way that humans learn.
    Deep learning is a more specialized version of machine learning that uses neural networks with multiple layers.
    These advanced algorithms can process complex patterns and make intelligent decisions with minimal human intervention.
    The applications of machine learning are vast, ranging from image recognition to natural language processing."""

    chunker = SemanticChunker(model_name='all-MiniLM-L6-v2', max_chunk_size=128, similarity_threshold=0.3)
    chunks = chunker.semantic_chunk(text)

    print("Semantic Chunks:")
    for i, chunk in enumerate(chunks, 1):
        print(f"Chunk {i}: {chunk}")
        print(f"Words: {len(chunk.split())}\n")

# Uncomment to test
test_semantic_chunker()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_chunking-0.1.1.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_chunking-0.1.1-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file semantic_chunking-0.1.1.tar.gz.

File metadata

  • Download URL: semantic_chunking-0.1.1.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for semantic_chunking-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d5b203bd891ecd4b9e8575ce27f7bc6d8897941e2bcd7676165819221db35305
MD5 b9e9f7965624e454a6741d178c2dad37
BLAKE2b-256 a6035b3025afd998d0f674b52b88e6ee72481118d146df328696152175f12827

See more details on using hashes here.

File details

Details for the file semantic_chunking-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_chunking-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4b561181ec0165eb57f8cbea381dc404e240714399ba193b273d371cc8b41af1
MD5 b1f9aebef82b0e512578135a59939e0a
BLAKE2b-256 efaebc386a2d4bb25d193e2f5c6f75aa0e31b296ac64922bb5eb7589ccd9f644

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page