Lightweight Python library that uses sentence embeddings to create naturally coherent segments of text akin to paragraphs.
Project description
cohesive
cohesive is a lightweight segmenter that uses sentence embeddings to split documents into naturally coherent segments akin to paragraphs.
Installation
You can install 'cohesive' using pip:
pip install cohesive
Using cohesive
To start using the SDK, simply import the CohesiveTextSegmenter and create a new instance:
from cohesive import CohesiveTextSegmenter
# Instantiate the CohesiveTextSegmenter with the model that you want to use.
# By default, cohesive utilizes paraphrase-MiniLM-L6-v2, which has produced good results.
cohesive = CohesiveTextSegmenter("all-MiniLM-L6-v2")
# Then, all you need to do is call the generate_tiles method and pass in an array of sentences.
cohesive.generate_segments(sentences)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cohesive-0.1.1.tar.gz
(8.1 kB
view details)
Built Distribution
File details
Details for the file cohesive-0.1.1.tar.gz
.
File metadata
- Download URL: cohesive-0.1.1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | deac35fc15a340807729ffeee0ed708e12acee2cf7e18b389de0e8cdd876c353 |
|
MD5 | c0c77350af130e767fbe30cd0a7a89ae |
|
BLAKE2b-256 | 8fcd78824eaa502f2eb94cb2a6a2f9040602088a5f2e0e1dd603db452841f2aa |
File details
Details for the file cohesive-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: cohesive-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6832e865f485b1e97d13dd7ab002c5d45a8c4ef08490309e31da9b238189b621 |
|
MD5 | 1e0ba0a1906dc35220f2624d9e76c2bb |
|
BLAKE2b-256 | 8f4b1f004bc64f805161b5e6787de56f21aa61c97b8b71e207383ca56daf0de2 |