Skip to main content

A Python package to convert strings to positions in a source text

Project description

Strings to Positions

This library takes in the source text and a list of strings (or chunks) that occurs in the source text, the library will return either a list of offsets or a list of positions, the same length of the input list.

How to Use

Install

pip install strings-to-positions

Usage Example

from strings_to_positions import to_offsets, to_strings, offset_to_position

# The Source Document
source_document = """# Introduction to Markdown

Markdown is a _lightweight **markup language** with plain-text_ formatting syntax. Its design allows it to be converted to many output formats, but the original tool by the same name only supports HTML.
Its design allows it to be converted to many output formats, but the original tool by the same name only supports HTML.

## Common Syntax
One common syntax is the one used by GitHub, which is a superset of the original Markdown syntax. It includes features such as tables, strikethrough, and task lists.
"""

# A list of strings, typically the output of a text splitter function
chunks = ["""# Introduction to Markdown

Markdown is a _lightweight **markup language** with plain-text_ formatting syntax. Its design allows it to be converted to many output formats, but the original tool by the same name only supports HTML.
Its design allows it to be converted to many output formats,""",
"""but the original tool by the same name only supports HTML.

## Common Syntax
One common syntax is the one used by GitHub, which is a superset of the original Markdown syntax. It includes features such as tables, strikethrough, and task lists.
"""
]

# Run the to_offsets function will output the offset positions of each chunk
offsets = to_offsets(source_document, chunks)
# offsets = [(0, 291), (292, 535)]

# Optionally you can set a third argument to configure the searching parameter.
# See below for details
# option = {
#   "case_sensitive": True, 
#   "allow_overlap": True, 
#   "overlap_size": None
# }
# offset = to_offsets(source_document, chunks, option)


# Run the offset_to_position in a loop will give you the list of positions
positionsList = []
for offset in offsets:
    if offset is None:
        positionsList.append(None)
        continue
    position = offset_to_position(document, offset)
    # position = {
    #    "start": {"line": 1, "column": 1, "offset": 0},
    #    "end": {"line": 4, "column": 61, "offset": 291},
    # }
    positionsList.append(position)

# ... later, after the offsets have been changed...
# new_offsets = [(0, 293), (294, 535)]

new_chunks = to_strings(source_document, new_offsets)

# new_chunks will be a new list of strings that correspond with the new_offsets

Input Options

case_sensitive (True | False)

If True, the function will consider casing during the search

allow_overlap (True | False)

If True, the searching function will consider the entire previous chunk in sub document to be searched. If overlap_size is set, then overlap will only consider the set size

overlap_size (int | None)

Ignored if allow_overlap is False.
Sets the size of the maximum overlap size

Output Data Structure

Offset

Tuple[startOffset, endOffset]

Position

{
  start: {
    line: "1-index int",
    column: "1-index int",
    offset: int
  },
  end: {
    line: "1-index int",
    column: "1-index int",
    offset: int
  }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strings_to_positions-1.1.1.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strings_to_positions-1.1.1-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file strings_to_positions-1.1.1.tar.gz.

File metadata

  • Download URL: strings_to_positions-1.1.1.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for strings_to_positions-1.1.1.tar.gz
Algorithm Hash digest
SHA256 bb811aac06614d7aee263b7f2d93353f9fb3bfcc8c79772bc8141eb6bcfdd0ab
MD5 4df0a2075cf3a36bf7675cac479d9e38
BLAKE2b-256 8ccae30107b3d3a4e7a796c0951b884cf383033ab00cf88d6da2c5b099399a66

See more details on using hashes here.

Provenance

The following attestation bundles were made for strings_to_positions-1.1.1.tar.gz:

Publisher: publish-to-pypi.yml on howlowck/strings-to-positions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file strings_to_positions-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for strings_to_positions-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a42444cb85c21e3e1da108fa660a3f85462e5915396beca75c83626674343fe0
MD5 57aa4d4bcdba6eaee326acb10eeedf74
BLAKE2b-256 27c12e6f801117a6c807cc6674a462d4a92cfc9b3c49ac8976a1396650fb4cb7

See more details on using hashes here.

Provenance

The following attestation bundles were made for strings_to_positions-1.1.1-py3-none-any.whl:

Publisher: publish-to-pypi.yml on howlowck/strings-to-positions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page