A Python package to convert strings to positions in a source text
Project description
Strings to Positions
This library takes in the source text and a list of strings (or chunks) that occurs in the source text, the library will return either a list of offsets or a list of positions, the same length of the input list.
How to Use
Install
pip install strings-to-positions
Usage Example
from strings_to_positions import to_offsets, to_strings, offset_to_position
# The Source Document
source_document = """# Introduction to Markdown
Markdown is a _lightweight **markup language** with plain-text_ formatting syntax. Its design allows it to be converted to many output formats, but the original tool by the same name only supports HTML.
Its design allows it to be converted to many output formats, but the original tool by the same name only supports HTML.
## Common Syntax
One common syntax is the one used by GitHub, which is a superset of the original Markdown syntax. It includes features such as tables, strikethrough, and task lists.
"""
# A list of strings, typically the output of a text splitter function
chunks = ["""# Introduction to Markdown
Markdown is a _lightweight **markup language** with plain-text_ formatting syntax. Its design allows it to be converted to many output formats, but the original tool by the same name only supports HTML.
Its design allows it to be converted to many output formats,""",
"""but the original tool by the same name only supports HTML.
## Common Syntax
One common syntax is the one used by GitHub, which is a superset of the original Markdown syntax. It includes features such as tables, strikethrough, and task lists.
"""
]
# Run the to_offsets function will output the offset positions of each chunk
offsets = to_offsets(source_document, chunks)
# offsets = [(0, 291), (292, 535)]
# Optionally you can set a third argument to configure the searching parameter.
# See below for details
# option = {
# "case_sensitive": True,
# "allow_overlap": True,
# "overlap_size": None
# }
# offset = to_offsets(source_document, chunks, option)
# Run the offset_to_position in a loop will give you the list of positions
positionsList = []
for offset in offsets:
if offset is None:
positionsList.append(None)
continue
position = offset_to_position(document, offset)
# position = {
# "start": {"line": 1, "column": 1, "offset": 0},
# "end": {"line": 4, "column": 61, "offset": 291},
# }
positionsList.append(position)
# ... later, after the offsets have been changed...
# new_offsets = [(0, 293), (294, 535)]
new_chunks = to_strings(source_document, new_offsets)
# new_chunks will be a new list of strings that correspond with the new_offsets
Input Options
case_sensitive (True | False)
If True, the function will consider casing during the search
allow_overlap (True | False)
If True, the searching function will consider the entire previous chunk in sub document to be searched.
If overlap_size is set, then overlap will only consider the set size
overlap_size (int | None)
Ignored if allow_overlap is False.
Sets the size of the maximum overlap size
Output Data Structure
Offset
Tuple[startOffset, endOffset]
Position
{
start: {
line: "1-index int",
column: "1-index int",
offset: int
},
end: {
line: "1-index int",
column: "1-index int",
offset: int
}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strings_to_positions-1.1.1.tar.gz.
File metadata
- Download URL: strings_to_positions-1.1.1.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb811aac06614d7aee263b7f2d93353f9fb3bfcc8c79772bc8141eb6bcfdd0ab
|
|
| MD5 |
4df0a2075cf3a36bf7675cac479d9e38
|
|
| BLAKE2b-256 |
8ccae30107b3d3a4e7a796c0951b884cf383033ab00cf88d6da2c5b099399a66
|
Provenance
The following attestation bundles were made for strings_to_positions-1.1.1.tar.gz:
Publisher:
publish-to-pypi.yml on howlowck/strings-to-positions
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strings_to_positions-1.1.1.tar.gz -
Subject digest:
bb811aac06614d7aee263b7f2d93353f9fb3bfcc8c79772bc8141eb6bcfdd0ab - Sigstore transparency entry: 178341293
- Sigstore integration time:
-
Permalink:
howlowck/strings-to-positions@c4ba1eb5ac7068ebabdaace83f90c0b6c5b44a00 -
Branch / Tag:
refs/tags/1.1.1 - Owner: https://github.com/howlowck
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@c4ba1eb5ac7068ebabdaace83f90c0b6c5b44a00 -
Trigger Event:
push
-
Statement type:
File details
Details for the file strings_to_positions-1.1.1-py3-none-any.whl.
File metadata
- Download URL: strings_to_positions-1.1.1-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a42444cb85c21e3e1da108fa660a3f85462e5915396beca75c83626674343fe0
|
|
| MD5 |
57aa4d4bcdba6eaee326acb10eeedf74
|
|
| BLAKE2b-256 |
27c12e6f801117a6c807cc6674a462d4a92cfc9b3c49ac8976a1396650fb4cb7
|
Provenance
The following attestation bundles were made for strings_to_positions-1.1.1-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on howlowck/strings-to-positions
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strings_to_positions-1.1.1-py3-none-any.whl -
Subject digest:
a42444cb85c21e3e1da108fa660a3f85462e5915396beca75c83626674343fe0 - Sigstore transparency entry: 178341294
- Sigstore integration time:
-
Permalink:
howlowck/strings-to-positions@c4ba1eb5ac7068ebabdaace83f90c0b6c5b44a00 -
Branch / Tag:
refs/tags/1.1.1 - Owner: https://github.com/howlowck
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@c4ba1eb5ac7068ebabdaace83f90c0b6c5b44a00 -
Trigger Event:
push
-
Statement type: