Skip to main content

Splits large texts into smaller ones that can't exceed a certain limit, but need to be split where a certain regular expression matches

Project description

textwrapre

A Python package to split text into smaller blocks based on regular expressions.

It's useful to split large texts into smaller ones that can't exceed a certain limit,

but need to be split where a certain regular expression matches.

pip install textwrapre

Usage

from textwrapre import wrapre

import regex



text = r"""

Python was created in the early 1990s by Guido van Rossum at Stichting Mathematisch 

Centrum (CWI, see https://www.cwi.nl/) in the Netherlands as a successor of a 

language called ABC. Guido remains Python’s principal author, although it includes

many contributions from others.



In 1995, Guido continued his work on Python at the Corporation for 

National Research Initiatives 

(CNRI, see https://www.cnri.reston.va.us/) in Reston, 

Virginia where he released several versions of the software.



""".strip()



# Split the text using the default regex separator

splitted = wrapre(text, blocksize=150, raisewhenlonger=True, removenewlines_from_result=True)



for s in splitted:

    print(len(s), s)



# Split the text using a custom regex separator

splitted = wrapre(text, blocksize=50, regexsep=r"[\r\n\s]+", raisewhenlonger=True, removenewlines_from_result=False, flags=regex.I)



for s in splitted:

    print(len(s), s)



# Split the text and raise an exception when the blocks are bigger than the limit

splitted = wrapre(text, blocksize=20, regexsep=r"[\r\n\s]+", raisewhenlonger=True, flags=regex.I)







85 Python was created in the early 1990s by Guido van Rossum at Stichting Mathematisch  

79 Centrum (CWI, see https://www.cwi.nl/) in the Netherlands as a successor of a  

115 language called ABC. Guido remains Pythons principal author, although it includes many contributions from others. 

99 In 1995, Guido continued his work on Python at the Corporation for  National Research Initiatives  

115 (CNRI, see https://www.cnri.reston.va.us/) in Reston,  Virginia where he released several versions of the software.

---------------------

47 Python was created in the early 1990s by Guido 

46 van Rossum at Stichting Mathematisch 

Centrum 

38 (CWI, see https://www.cwi.nl/) in the 

49 Netherlands as a successor of a 

language called 

46 ABC. Guido remains Pythons principal author, 

45 although it includes

many contributions from 

46 others.

In 1995, Guido continued his work on 

49 Python at the Corporation for 

National Research 

24 Initiatives 

(CNRI, see 

44 https://www.cnri.reston.va.us/) in Reston, 

47 Virginia where he released several versions of 

13 the software.

---------------------

Traceback (most recent call last):

  File "C:\Program Files\JetBrains\PyCharm Community Edition 2022.3.3\plugins\python-ce\helpers\pydev\pydevconsole.py", line 364, in runcode

    coro = func()

  File "<input>", line 41, in <module>

  File "C:\ProgramData\anaconda3\envs\adda\textwrapre.py", line 73, in wrapre

    raise ValueError(

ValueError: Some blocks are bigger than the limit! Try again with another separator or a bigger limit!

Parameters

def wrapre(

    text: Union[str, bytes], 

    blocksize: int, 

    regexsep: str = r"[\r\n]", 

    raisewhenlonger: bool = True, 

    removenewlines_from_result: bool = False, 

    *args, 

    **kwargs

) -> List[Union[str, bytes]]:

    """

    Split a text into blocks of a given size using a regex expression.



    :param text: the text to be splitted

    :param blocksize: the maximum size of each block

    :param regexsep: the regex expression used as separator (default: r"[\r\n]")

    :param raisewhenlonger: whether or not to raise an exception when a block is bigger than the limit (default: True)

    :param removenewlines_from_result: whether or not to remove new lines from the result (default: False)

    :param *args: additional arguments passed to the regex.compile() function

    :param **kwargs: additional keyword arguments passed to the regex.compile() function

    :return: a list of blocks

    """

Project details


Release history Release notifications | RSS feed

This version

0.10

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textwrapre-0.10.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

textwrapre-0.10-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file textwrapre-0.10.tar.gz.

File metadata

  • Download URL: textwrapre-0.10.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for textwrapre-0.10.tar.gz
Algorithm Hash digest
SHA256 cbbdf1f89a37c333c2f087430f3cfe18880ffaa5935ec703f59dcb3168c037b9
MD5 cb435cabc64f4a6f4acd83a15f500da2
BLAKE2b-256 209fb139d5de9374218fa8345f1d732a5e30b8f93239ae8d19d2f1c775640d40

See more details on using hashes here.

File details

Details for the file textwrapre-0.10-py3-none-any.whl.

File metadata

  • Download URL: textwrapre-0.10-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for textwrapre-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 852e5af25d8fd1ead28f396e4c62cf848afa62a17ca24f60c1694ec81db08e18
MD5 702f7c3a3ae1f4f5332ce81650288681
BLAKE2b-256 6bf035eed9a7174c59d8d94630b2d30a96bf1ed019e2e3847eb23b5dcd978b13

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page