Skip to main content

Python package with CText C++ extension

Project description

CText

Advanced text processing library for C++ & Python

About

CText is a Modern C++ library that offers a wide range of text processing routines. It addresses many complex tasks that can be time-consuming in both C++ and Python. While features like line and word manipulation are readily available in higher-level languages such as C#, Java, and Python, they are often lacking in C++. CText fills this gap by providing those missing capabilities while preserving the low-level control that C++ offers. In addition to essential functions, it includes numerous optimized routines for efficient text handling. The library is highly flexible and scalable, making it easy to extend with custom processing routines. It’s well-suited for tackling preprocessing challenges in NLP and machine learning tasks, or simply for honing your Modern C++ skills.

Main Features

  • Modern C++ Template library: Simple to use, just include a single header file.
  • Unicode Support: - Seamlessly handle both UNICODE and ANSI in the same project.
  • Extensive Text Processing Features: - Includes hundreds of optimized methods for both standard and advanced operations, with many more planned.
  • Clean and Readable Codebase: - Designed to help you build complex text-processing applications quickly, abstracting away low-level details and optimizations.
  • Cross-Platform Compatibility: Tested with Visual Studio and GCC 7.4, easily portable to other environments.
  • No External Dependencies: CText do not depends on any other libraries, the only requirements are C++11 and STL
  • Easily Extensible: Text routines are designed to be scalable and adaptable across character types and platforms.
  • Python Integration: Compatible with all versions of Python

Have questions or suggestions? Feel free to reach out: email.

Python

To install CText:

pip install ctextlib

To test if CText is installed:

import ctextlib
a = ctextlib.Text("Hello World")
print(a)

Or:

from ctextlib import Text as text
a = text("Hello World")
print(a)

Python methods reference:

addToFileName

a = text("C:\\Temp\\Temp2\\File.bmp")
a.addToFileName("_mask")
print(a)
C:\Temp\Temp2\File_mask.bmp

append

a = text("Hello ")
a.append("World")
Hello World
a = text("123")
a.append('4',4)
1234444
a = text("")
a.append(['Hello', ' ', 'World'])
Hello World

appendRange

a = text()
a.appendRange('a','z').appendRange('0','9')
abcdefghijklmnopqrstuvwxyz0123456789

between

a = text('The quick brown fox jumps over the lazy dog')
a.between('q','d')
print(a)
uick brown fox jumps over the lazy
a = text('The quick brown fox jumps over the lazy dog')
a.between('quick','lazy')
print(a)
 brown fox jumps over the

contain

a = text('The quick brown fox jumps over the lazy dog')
if a.contain('quick') :
    print("contain 'quick'")
contain 'quick'

Case-incensitive

a = text('The quick brown fox jumps over the lazy dog')
if a.contain('Quick', False) :
    print("contain 'quick'")
contain 'quick'
a = text('The quick brown fox jumps over the lazy dog')
if a.contain(['slow','fast','quick']):
    print("contain 'quick'")
contain 'quick'

containAny

a = text('Hello World')
a.containAny('abcd')
True

containOnly

a = text('4365767')
a.containOnly('0123456789')
True

count

a = text('The quick brown fox jumps over the lazy dog')
a.count('the', False)
2

countWordFrequencies

from ctextlib import Text as text
a = text("The quick brown fox jumps over the lazy dog")
a.countWordFrequencies(False)
[(2, 'the'), (1, 'brown'), (1, 'dog'), (1, 'fox'), (1, 'jumps'), (1, 'lazy'), (1, 'over'), (1, 'quick')]

cutAfterFirst

s = text('The quick brown fox jumps over the lazy dog')
a.cutAfterFirst('o')
The quick br

cutAfterLast

s = text('The quick brown fox jumps over the lazy dog')
a.cutAfterLast('o')
The quick brown fox jumps over the lazy d

cutBeforeFirst

s = text('The quick brown fox jumps over the lazy dog')
a.cutBeforeFirst('o')
own fox jumps over the lazy dog

cutEnds

s = text('The quick brown fox jumps over the lazy dog')
a.cutEnds(4)
quick brown fox jumps over the lazy

cutLeft

s = text("Hello World")
s.cutLeft(6)
World

cutRight

s = text("Hello World")
s.cutRight(6)
Hello

enclose

a = text("Hello World")
a.enclose('<','>')
a.enclose('"')
<Hello World>
"Hello World"

endsWith

a = text("Hello World")
if a.endsWith('World'):
    print("ends with 'World'")
ends with 'World'

With case-insensitive search:

a = text("Hello World")
if a.endsWith('world', False):
    print("ends with 'world'")
ends with 'world'

endsWithAny

if(a.endsWithAny(['cat','dog'])):
    print('end to animal...')
end to animal...

erase

a = text('The quick brown fox jumps over the lazy dog')
a.erase(8, 10)
print(a)
The quicx jumps over the lazy dog

equal

a = text()
a.equal('A',10)
AAAAAAAAAA

find

a = text('The quick brown fox jumps over the lazy dog')
a.find('brown')
'brown fox jumps over the lazy dog'

With case-incensitive search:

a = text('The quick brown fox jumps over the lazy dog')
a.find('Brown', False)
'brown fox jumps over the lazy dog'

fromArray

a = text()
a.fromArray([1,2,3,4])
print(a)
1 2 3 4
a = text()
a.fromArray([1,2,3,4], '|')
print(a)
1|2|3|4
a = text()
a.fromArray([1,2,3,4], '')
print(a)
1234

Array of floats

a = text()
a.fromArray([1.1,2.2,3.3,4.4])
print(a)
1.1 2.2 3.3 4.4

Array of strings

a = text()
a.fromArray(['hello','world'])
print(a)
hello world
import numpy as np
a = text()
a.fromArray(np.array(["hello","world"]))
print(a)
hello world

fromArrayAsHex

a = text()
a.fromArrayAsHex([10,20,30,40])
print(a)
0A 14 1E 28

Use without separator

a.fromArrayAsHex([10,20,30,40],2,'')
print(a)
0A141E28
a = text()
a.fromArrayAsHex([1000,2000,3000,4000])
print(a)
3E8 7D0 BB8 FA0
a = text()
a.fromArrayAsHex([1000,2000,3000,4000], 4, ',')
print(a)
03E8,07D0,0BB8,0FA0

fromBinary

a = text()
a.fromBinary(12345)
print(a)
00000000000000000011000000111001

fromDouble

a = text()
a.fromDouble(3.333338478)
print(a)
a.fromDouble(3.33989, 4)
print(a)
a.fromDouble(3.333338478, 10)
3.333338
3.3399
3.3333384780

fromHex

a = text()
a.fromHex(1234567)
a.fromHex('a')
a.fromHex("48 65 6C 6C 6F 20 57 6F 72 6C 64")
0012D687
61
Hello World

fromInteger

a = text()
a.fromInteger(358764)
print(a)
358764

fromMatrix

from ctextlib import Text as text
import numpy as np
x = np.array([[10, 20, 30], [40, 50, 60]])
a = text()
a.fromMatrix(x)
print(a)
10 20 30
40 50 60
from ctextlib import Text as text
import numpy as np
x = np.array([[10, 20, 30], [40, 50, 60]])
a = text()
a.fromMatrix(x, ',')
10,20,30
40,50,60

fromMatrixAsHex

from ctextlib import Text as text
import numpy as np
x = np.array([[10, 20, 30], [40, 50, 60]])
a = text()
a.fromMatrixAsHex(x)
print(a)
0A 14 1E
28 32 3C
from ctextlib import Text as text
import numpy as np
x = np.array([[1000, 2000, 3000], [4000, 5000, 6000]])
a = text()
a.fromMatrixAsHex(x,4)
print(a)
03E8 07D0 0BB8
0FA0 1388 1770

getDir

a = text("D:\\Folder\\SubFolder\\TEXT\\file.dat")
a.getDir()
D:\Folder\SubFolder\TEXT\

getExtension

a = text("D:\\Folder\\SubFolder\\TEXT\\file.dat")
a.getExtension()
'.dat'

getFileName

a = text("D:\\Folder\\SubFolder\\TEXT\\file.dat")
a.getFileName()
'file.dat'

hash

s.hash()
9257130453210036571

indexOf

a = text("The quick brown fox jumps over the lazy dog.")
a.indexOf("brown")
10

indexOfAny

a = text("The quick brown fox jumps over the lazy dog.")
a.indexOfAny(["fox", "dog"])
16

indexOfAny

a = text("The quick brown fox jumps over the lazy dog.")
a.indexOfAny("abc")
7

insert

a = text("abc")
a.insert(1,'d',2)
addbc
a = text("The quick jumps over the lazy dog.")
a.insert(10,"fox ")
The quick fox jumps over the lazy dog.

insertAtBegin
insertAtEnd

a = text("Hello")
a.insertAtBegin("<begin>")
a.insertAtEnd("</begin>")
<begin>abc</begin>

isAlpha

a = text("Abcd")
a.isAlpha()
True

isBinary

a = text("01111011100001")
a.isBinary()
True

isEmpty

a = text()
a.isEmpty()
True

isHexNumber

a = text("12AB56FE")
a.isHexNumber()
True

isNumber

a = text("123456")
a.isNumber()
True

isLower

a = text("hello world")
a.isLower()
True

isUpper

a = text("HELLO WORLD")
a.isUpper()
True

isPalindrome

a = text("racecar")
a.isPalindrome()
True

keep

s = text("Hello World").keep(3,5)
lo Wo

keepLeft

a = text("The quick jumps over the lazy dog.")
a.keepLeft(10)
The quick

keepRight

a = text("The quick jumps over the lazy dog.")
a.keepRight(10)
 lazy dog.

lastIndexOf

s = text("Hello World")
s.lastIndexOf('l')
9

lines

a = text("L1\nL2\n\nL3\nL4\n  \n\nL5")
a.lines()
['L1', 'L2', 'L3', 'L4', 'L5']

linesCount

a = text("L1\nL2\n\nL3\nL4\n  \n\nL5")
a.linesCount()
7

linesRemoveEmpty

a = text("L1\nL2\n\nL3\nL4\n  \n\nL5")
a.linesRemoveEmpty()
print(a)
L1
L2
L3
L4
L5

Several per line methods
linesAppend
linesInsertAtBegin
linesSort
linesPaddRight
linesTrim
Example of opening a text file, sort all lines, and save it with another name

from ctextlib import Text as text
s = text()
s.readFile('Unordered.txt')
s.linesSort()
s.writeFile('Sorted_python.txt')

limit

s = text("Hello World")
s.limit(6)
Hello

lower

s = text("Hello World")
s.lower()
hello world

makeUnique

a = text()
a.appendRange('a','z').appendRange('a','z')
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
a.makeUnique()
print(a)
abcdefghijklmnopqrstuvwxyz

mid

a = text("Hello World").mid(3)
lo Wo

nextLine

# Example of iterating all lines
from ctextlib import Text as text
a = text("Line1\nLine2\nLine3")
line = text()
pos = 0
while(pos >= 0):
    pos = a.nextLine(pos,line)
    print(line)
Line1
Line2
Line3

nextWord

# Example of iterating all words
from ctextlib import Text as text
a = text('The quick brown fox jumps over the lazy dog')
word = text()
pos = 0
while(pos >= 0):
    pos = a.nextWord(pos,word)
    print(word)
The
quick
brown
fox
jumps
over
the
lazy
dog

paddLeft

s = text("Abra")
s.paddLeft('.', 16)
............Abra

paddRight

s = text("Abra")
s.paddRight('.', 16)
Abra............

pathCombine

a = text("C:\\Temp")
a.pathCombine("..\\Folder")
C:\Folder

quote

a = text("Hello")
a.quote()
"Hello"

random

a = text()
a.random()
"P1kAlMiG2Kb7FzP5"
a.sort()
"1257AFGKMPPbiklz"
a.shuffle()
"k2lF7KAPG5M1Pzbi"
a.random(32)
P1kAlMiG2Kb7FzP5tM1QBI6DSS92c31A

randomAlpha

s = text()
s.randomAlpha()
IkEffmzNiMKKASVW

randomNumber

s = text()
s.randomNumber()
3892795431
s.randomNumber(32)
33341138742779319865028602486509

readFile

# demontrates how to read a whole text file
from ctextlib import Text as text
a = text()
a.readFile('test.txt')
print(a)
Hello World

regexMatch

s = text("+336587890078")
if(s.regexMatch("(\\+|-)?[[:digit:]]+")):
    print("it is a number")
it is a number

regexLines

animals.txt
------------
Cat
Dog
Giraffe
Lion
Llama
Monkey
Mouse
Parrot
Poodle
Scorpion
Snake
Weasel
# collect all lines starting with given characters
from ctextlib import Text as text
a = text()
a.readFile("animals.txt")
a.regexLines("^[A-G][a-z]+")
['Cat', 'Dog', 'Giraffe']

regexReplace

from ctextlib import Text as text
a = text("there is sub-sequence in the sub-way string")
a.regexReplace("\\b(sub)([^ ]*)", "sub-$2")
there is sub--sequence in the sub--way string

regexSearch

# collect all words using regex
from ctextlib import Text as text
a = text("The quick brown fox jumps over the lazy dog")
a.regexSearch("\\w+")
'The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

regexWords

# collect all words starting with given characters
from ctextlib import Text as text
a = text("The quick brown fox jumps over the lazy dog")
a.regexWords("^[a-n][a-z]+")
['brown', 'fox', 'jumps', 'lazy', 'dog']   

remove

a = text('we few, we happy few, we band of brothers.')
a.remove('we')
a.reduceChain()
a.trim()
few happy few band of brothers

removeAny

from ctextlib import Text as text
a = text('The quick brown fox jumps over the lazy dog')
a.removeAny(['brown','quick','lazy'])
a.reduceChain()
The fox jumps over the dog

removeExtension

a = text("D:\\Folder\\SubFolder\\TEXT\\File.dat")
a.removeExtension()
D:\Folder\SubFolder\TEXT\File

removeFileName

a = text("D:\\Folder\\SubFolder\\TEXT\\File.dat")
a.removeFileName()
D:\Folder\SubFolder\TEXT\

removeWhileBegins

a = text("Some text ending with something")
a.removeWhileBegins("Some text ")
print(a)
ending with something

removeWhileEnds

a = text("Some text ending with something")
a.removeWhileEnds(" something")
print(a)
Some text ending with

replace

a = text("The quick brown fox jumps over the lazy dog")
a.replace("fox", "cat")
print(a)
The quick brown cat jumps over the lazy dog
a = text("The quick brown fox jumps over the lazy dog")
a.replace(["fox", "cat","dog","quick"], "-")
The ----- brown --- jumps over the lazy ---

replaceAny

a = text("The quick brown fox jumps over the lazy dog")
a.replaceAny(["fox", "cat","dog"], "***")
print(a)
The quick brown *** jumps over the lazy ***
a = text("The quick brown fox jumps over the lazy dog")
a.replaceAny(["fox", "dog"], ["dog", "fox"])
The quick brown dog jumps over the lazy fox

reverse

a = text("Hello")
a.reverse()
olleH

right

a = text("Hello World")
a.right(5)
World

rotate

a = text("Hello World")
a.rotateLeft(2)
a.rotateRight(4)

Output

llo WorldHe
ldHello Wor

split

# by default split uses the standard separators (" \t\r\n")
a = text("The quick brown fox jumps over the lazy dog")
a.split()
['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
# split can be used with any list of separator characters
a = text("The quick, brown....fox,,, ,jumps over,the  lazy.dog")
a.split(",. ")
['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

toBinary

bOk = False
a = text("100001")
a.toBinary(bOk)
33

toHex

a = text("Hello World")
a.toHex()
print(a)
48 65 6C 6C 6F 20 57 6F 72 6C 64

Using separator character.

a = text("Hello World")
a.toHex(',')
print(a)
48,65,6C,6C,6F,20,57,6F,72,6C,64

toHex

bOk = False
a = text("1E1E")
a.toHex(bOk)
7710

trim

a = text(" \t\n   lazy dog  \t\n   ")
a.trim()
lazy dog
a = text("000000000000101")
a.trimLeft("0")
101
a = ("101000000000000")
a.trimRight('0')
101
a = text("0000000101000000000")
a.trim("0")
101

upper

s = text("Hello World")
s.upper()
HELLO WORLD

words

a = text("The quick brown fox jumps over the lazy dog")
a.words()
['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
a = text("The|quick|brown|fox|jumps|over|the|lazy|dog")
a.words('|')
['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

wordsCapitalize

a = text("The quick brown fox jumps over the lazy dog")
a.wordsCapitalize()
The Quick Brown Fox Jumps Over The Lazy Dog

wordsCount

a = text('The quick brown fox jumps over the lazy dog')
a.wordsCount()
9

wordsEnclose

a = text("The quick brown fox jumps over the lazy dog")
a.wordsEnclose('[',']')
[The] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog]

wordsReverse

a = text("The quick brown fox jumps over the lazy dog")
a.wordsReverse()
ehT kciuq nworb xof spmuj revo eht yzal god

wordsSort

a = text('The quick brown fox jumps over the lazy dog')
a.wordsSort()

Output

The brown dog fox jumps lazy over quick the

writeFile

# demontrates how to write to a text file
from ctextlib import Text as text
a = text("Hello World")
a.writeFile('test.txt')
print(a)

Static methods

ReadFile

import ctextlib as CText
str = CText.ReadFile('test.txt')
print(str)

Output

The quick brown fox jumps over the lazy dog

Or to import to global space:

from ctextlib import *
str = ReadFile('test.txt')
print(str)

Output

The quick brown fox jumps over the lazy dog

ReadWords

import ctextlib as CText
words = CText.ReadWords('test.txt')
print(words)

Output

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

ReadLines

Line1
Line2
Line3
# demontrates how to read all lines from a text file
import ctextlib as text
lines = text.ReadLines("Lines.txt")

Output

['Line1', 'Line2', 'Line3']

WriteFile

import ctextlib as CText
CText.WriteFile('test.txt','The quick brown fox jumps over the lazy dog')

Output

test.txt

UNICODE for Python

Python is using UTF8 as strings representation. When using Python texts containing non-English Unicode characters it is recommended to use the Unicode version of CText as demonstrated below:

# demonstrate text processing of Swedish unicode text
from ctextlib import TextU as text
s = text('Den snabbbruna räven hoppar över den lata hunden')
>>> s.cutBeforeFirst('ö')
över den lata hunden
# demonstrate text processing of Russian unicode text
from ctextlib import TextU as text
s = text('Быстрая коричневая лиса прыгает на ленивую собаку')
s.cutAfterLast('ы')
Быстрая коричневая лиса пр
# demonstrate text processing of Czech unicode text
from ctextlib import TextU as text
s = text('Rychlá hnědá liška skočí přes líného psa')
s.cutAfterFirst('á', True)
Rychlá
# demonstrate text processing of Greek unicode text
from ctextlib import TextU as text
s = text('Η γρήγορη καφέ αλεπού πηδάει πάνω από το τεμπέλικο σκυλί')
s.cutAfterFirst('έ', True)
Η γρήγορη καφέ
# demonstrate text processing of Armenian unicode text
from ctextlib import TextU as text
s = text('Արագ շագանակագույն աղվեսը ցատկում է ծույլ շան վրա')
s.cutBeforeFirst('է')
է ծույլ շան վրա
# demonstrate text processing of Georgian unicode text
from ctextlib import TextU as text
s = text('სწრაფი ყავისფერი მელა გადაბმულია ზარმაცი ძაღლი')
s.cutBeforeFirst('მ')
მელა გადაბმულია ზარმაცი ძაღლი

For the full info type help(text).

Build CText Unit Test and Demo projects


To build the UnitTest project and the demos with CMake and Visual Studio:
open terminal in the folder \Apps and type
cmake .
Alternatively, you can load in VS2017 or later \Apps\CMakeLists.txt from File->Open->CMake.., after generates cache is completed, choose CMake->Build All


To compile with GCC in Debug or Release:
cmake -D CMAKE_BUILD_TYPE=Release .
cmake -D CMAKE_BUILD_TYPE=Debug .

This will build a console application that runs the Unit Tests.

Also there is a Visual Studio solution (CText.sln) with all projects. Run UnitTests project first to see if all tests pass.


## C++ Examples

For all examples how to use CText please see the Unit Test project.

Sort all lines in a text file

// this example reads a text file and sorts all lines in alphabeta order.
#include <iostream>
#include "../CTEXT/CText.h"
#include "tchar_utils.h"

int main()
{    
    const char* input_name = "/Unsorted.txt";
    const char* output_name = "/Sorted.txt";

    CText pathIn = getcwd(0, 0);
    CText pathOut = pathIn;
    pathIn += input_name;
    pathOut += output_name;
    
    CText str;
    if(!str.readFile(pathIn.str()))
    {
        std::cerr << "Error, can not open file: " << pathIn << std::endl;
        return 0;
    }
    str.linesSort();
    str.writeFile(pathOut.str(), CText::ENCODING_ASCII);

    return 0;
}

Replace words

    CText s = _T("The quick brown fox jumps over the lazy dog");
    s.replace(_T("brown"), _T("red"));
    cout << s << endl;

Output:

   The quick red fox jumps over the lazy dog 
    CText s = _T("The quick brown fox jumps over the lazy dog");
    const CText::Char* words[] = {_T("quick"), _T("fox"), _T("dog")};
    s.replaceAny(words, 3, _T('-'));
    cout << s << endl;

Output:

   The ----- brown --- jumps over the lazy ---     
    CText s = _T("The quick brown fox jumps over the lazy dog");
    s.replaceAny({_T("fox"), _T("dog")}, {_T("dog"), _T("fox")});
    cout << s << endl;
    CText s = _T("The quick brown Fox jumps over the lazy Dog");
    s.replaceAny({_T("fox"), _T("dog")}, {_T("dog"), _T("fox")}, false);
    cout << s << endl;

Output:

   The quick brown dog jumps over the lazy fox   
   CText s = _T("The quick brown fox jumps over the lazy dog");
   const CText::Char* words[] = {_T("quick"), _T("fox"), _T("dog")};
   s.replaceAny(words, 3, _T("****"));
   cout << s << endl;

Output:

   The **** brown **** jumps over the lazy ****  

Remove words, blocks and characters

   CText s = _T("This is a monkey job!");
   s.remove(_T("monkey"));
   s.reduceChain(' ');
   cout << s << endl;

Output:

   This is a job!
   CText s = _T("Text containing <several> [blocks] separated by {brackets}");
   s.removeBlocks(_T("<[{"), _T(">]}"));
   s.reduceChain(' ');
   s.trim()
   cout << s << endl;

Output:

   Text containing separated by
   s = _T("one and two or three and five");
   s.removeAny({_T("or"), _T("and")});
   s.reduceChain(' ');
   cout << s << endl;

Output:

   one two three five

File paths

CText filepath = _T("D:\\Folder\\SubFolder\\TEXT\\File.dat");
cout << filepath.getExtension() << endl;
cout << filepath.getFileName() << endl;
cout << filepath.getDir() << endl;
filepath.replaceExtension(_T(".bin"));
cout << filepath << endl;
filepath.removeExtension();
cout << filepath << endl;
filepath.replaceExtension(_T(".dat"));
cout << filepath << endl;
filepath.replaceFileName(_T("File2"));
cout << filepath << endl;
filepath.addToFileName(_T("_mask"));
cout << filepath << endl;
filepath.replaceLastFolder(_T("Temp"));
cout << filepath << endl;
filepath.removeAfterSlash();
cout << filepath << endl;

Output

.dat
File.dat
D:\Folder\SubFolder\TEXT\
D:\Folder\SubFolder\TEXT\File.bin
D:\Folder\SubFolder\TEXT\File
D:\Folder\SubFolder\TEXT\File.dat
D:\Folder\SubFolder\TEXT\File2.dat
D:\Folder\SubFolder\TEXT\File2_mask.dat
D:\Folder\SubFolder\Temp\File2_mask.dat
D:\Folder\SubFolder\Temp
CText path1(_T("C:\\Temp"));
CText path2(_T("..\\Folder"));
path1.pathCombine(path2.str());
cout << path1 << endl;

Output

C:\\Folder

Split and collection routines

    CText s = _T("The quick  brown fox jumps  over the lazy dog");
    vector<CText> words;
    if(s.split(words) < 9)
        cout << "Error!" << endl ;
    for(auto& s : words)
        cout << s << endl;
   CText s = _T("The,quick,brown,fox,jumps,over,the,lazy,dog");
   vector<std::string> words;
   if(s.split(words,false,_T(",")) != 9)
      cout << "Error!" << endl ;
   for(auto& s : words)
      cout << s << endl;

Output:

The
quick
brown
fox
jumps
over
the
lazy
dog
    CText s = "Line 1\r\nLine 2\n\nLine 3\n";
    vector<std::string> lines;
    s.collectLines(lines);
    for(auto& s : lines)
      cout << s << endl;

Output:

Line 1
Line 2
Line 3

Read sentences from text file

#include <iostream>
#include "../CTEXT/CText.h"
#include "tchar_utils.h"

int main()
{    
    const char* input_name = "/Columbus.txt";
    const char* output_name = "/Columbus_Sentences.txt";

    CText pathIn = getcwd(0, 0);
    CText pathOut = pathIn;
    pathIn += input_name;
    pathOut += output_name;
    
    CText str;
    if(!str.readFile(pathIn.str()))
    {
        std::cerr << "Error, can not open file: " << pathIn << std::endl;
        return 0;
    }
    std::vector<CText> sentences;

    str.collectSentences(sentences);

    str.fromArray(sentences, _T("\n\n") );

    str.writeFile(pathOut.str(), CText::ENCODING_UTF8);

    return 0;
}

Count characters and words

CText s = _T("12345678909678543213");
map<CText::Char, int> freq;
s.countChars(freq);
CText s = _T("Nory was a Catholic because her mother was a Catholic, and Nory’s mother was a Catholic because her father was a Catholic, and her father was a Catholic because his mother was a Catholic, or had been.");
std::multimap<int, CText, std::greater<int> > freq;
s.countWordFrequencies(freq);
s.fromMap(freq);
cout << s;

Output:

Catholic 6
a 6
was 6
because 3
her 3
mother 3
and 2
father 2
Nory 1
Nory's 1
been 1
had 1
his 1
or 1

Conversion routines

CText s = _T("1 2 3 4 5 6 7 8 9");
vector<int> v;
s.toArray<int>(v);

Output:

{1,2,3,4,5,6,7,8,9}
CText s = _T("1,2,3,4,5,6,7,8,9");
vector<int> v;
s.toArray<int>(v, _T(','));

Output:

{1,2,3,4,5,6,7,8,9}
CText s = _T("1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9");
vector<double> v;
s.toArray<double>(v, _T(','));

Output:

{1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9}

From hexadecimal numbers array:

CText s = _T("0A 1E 2A 1B");
vector<int> v;
s.toArray<int>(v, _T(' '), true);

Output:

{10, 30, 42, 27}
CText s = _T("1a:2b:3c:4d:5e:6f");
vector<int> v;
s.toArray<int>(v, _T(':'), true);

Output:

{26, 43, 60, 77, 94, 111}

Without separator:

CText s = _T("0A1E2A1B");
s.toArray<int>(v, 0, true);

Output:

{10, 30, 42, 27}
Convert hex to chars string 
CText s = _T("48 65 6C 6C 6F 20 57 6F 72 6C 64");
std::vector<int> bytes;
s.toChars<int>(bytes, true);
s.fromChars<int>(bytes);
cout << s << endl;

Output:

Hello World

Parse numerical matrix:

std::vector<std::vector<int>> m;
CText s = _T("1 2 3\n4 5 6\n7 8 9");
s.toMatrix<int>(m, _T(' '));

Output:

{
    {1, 2, 3},
    {4, 5, 6},
    {7, 8, 9},
};

Highlight words

Following will make bold all words starting with "Col", "Spa","Isa", ending to "an"), "as" or containing "pe" or "sea":

vector<CText> start = {_T("Col"), _T("Spa"), _T("Isa")};
vector<CText> end = {_T("an"), _T("as")};
vector<CText> contain = {_T("pe"), _T("sea")};
str.wordsEnclose(_T("<b>"), _T("</b>"), &start, &end, &contain);

Portugal had been the main European power interested in pursuing trade routes overseas. Their next-door neighbors, Castile (predecessor of Spain) had been somewhat slower to begin exploring the Atlantic because of the bigger land area it had to re-conquer (the Reconquista) from the Moors. It was not until the late 15th century, following the dynastic union of the Crowns of Castile and Aragon and the completion of the Reconquista, that the unified crowns of what would become Spain (although countries still legally existing) emerged and became fully committed to looking for new trade routes and colonies overseas. In 1492 the joint rulers conquered the Moorish kingdom of Granada, which had been providing Castile with African goods through tribute. Columbus had previously failed to convince King John II of Portugal to fund his exploration of a western route, but the new king and queen of the re-conquered Spain decided to fund Columbus's expedition in hopes of bypassing Portugal's lock on Africa and the Indian Ocean, reaching Asia by traveling west Columbus was granted an audience with them; on May 1, 1489, he presented his plans to Queen Isabella, who referred them to a committee. They pronounced the idea impractical, and advised the monarchs not to support the proposed venture

TODO List

  • More methods for words,lines,sentences and complex expressions: There are lots more methods that can be added to support diferent NLP and lexical tasks.
  • Further improve containers abstraction: CText needs more convertion routines to/from STL and other containers and generic data structures.
  • Regular Expressions: - Partial or full support to regular expressions.
  • Other char types: - Character types like char_32 can be also supported
  • Mini Text Editor: - This is a text editor based on CText that I plan to port on Modern C++.
  • Export to Python: - I want to export CText library to Python-3
  • Performance Test: - Add performance tests comparing with STL string.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctextlib-1.0.26.tar.gz (74.8 kB view details)

Uploaded Source

Built Distributions

ctextlib-1.0.26-cp312-cp312-win_amd64.whl (329.6 kB view details)

Uploaded CPython 3.12Windows x86-64

ctextlib-1.0.26-cp311-cp311-win_amd64.whl (329.7 kB view details)

Uploaded CPython 3.11Windows x86-64

ctextlib-1.0.26-cp310-cp310-win_amd64.whl (329.7 kB view details)

Uploaded CPython 3.10Windows x86-64

ctextlib-1.0.26-cp39-cp39-win_amd64.whl (330.1 kB view details)

Uploaded CPython 3.9Windows x86-64

ctextlib-1.0.26-cp38-cp38-win_amd64.whl (329.8 kB view details)

Uploaded CPython 3.8Windows x86-64

ctextlib-1.0.26-cp37-cp37m-win_amd64.whl (326.1 kB view details)

Uploaded CPython 3.7mWindows x86-64

ctextlib-1.0.26-cp36-cp36m-win_amd64.whl (326.0 kB view details)

Uploaded CPython 3.6mWindows x86-64

File details

Details for the file ctextlib-1.0.26.tar.gz.

File metadata

  • Download URL: ctextlib-1.0.26.tar.gz
  • Upload date:
  • Size: 74.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ctextlib-1.0.26.tar.gz
Algorithm Hash digest
SHA256 17ce2e9f8c84b86a4c860a940f687aef1413e8caf9e90043a5431beb16141b73
MD5 fb69aef94ebd3ccb642a53580fed3742
BLAKE2b-256 05d6fdc14a1ceb8e15a1bf9c277f151e3f7faf8abdafe6ed1d58f970cb65ae5b

See more details on using hashes here.

File details

Details for the file ctextlib-1.0.26-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ctextlib-1.0.26-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 329.6 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ctextlib-1.0.26-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d29ea47cf38d8c0994a618af6d7be4ba4d2d6adaa67d9770a018eef20a86f6b4
MD5 f234a493f4313e67c2efb79ecfa4815f
BLAKE2b-256 dcae40ef56385f1208a12410f52e82ac12c97c9420e891aecf381bad0aa90561

See more details on using hashes here.

File details

Details for the file ctextlib-1.0.26-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: ctextlib-1.0.26-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 329.7 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ctextlib-1.0.26-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 96a9b8f917ace4004d9cb1e43c0d4b17e619ba2bd2780f3a4b6a0778869e4736
MD5 787ed796b5a98fdc362d6729c6e116a2
BLAKE2b-256 7955fb17c0dcf96d9744b0af718bb273872c91905d270e18cabac96fa44d032f

See more details on using hashes here.

File details

Details for the file ctextlib-1.0.26-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: ctextlib-1.0.26-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 329.7 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ctextlib-1.0.26-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 641047bc879b383871144381e648ebc9966595d11c2c5071f40fef283bb27cd8
MD5 2b340b20ccc44fb38de25a2cdc20e6c2
BLAKE2b-256 fbcc9be24a489aba19b41d70a06e8bd9a218d17c2baf8b9c34bb5e5e814654d7

See more details on using hashes here.

File details

Details for the file ctextlib-1.0.26-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: ctextlib-1.0.26-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 330.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ctextlib-1.0.26-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 09a80dcd97e99f060c90b6fbacb21af42543bf6721ac18f914be25d277626199
MD5 2f40c93bd50aa0d22d885f45567129d9
BLAKE2b-256 5d9cee6c264dcc73fe81bda518a4717142e911c18ef1b87157cac930574c2218

See more details on using hashes here.

File details

Details for the file ctextlib-1.0.26-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: ctextlib-1.0.26-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 329.8 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ctextlib-1.0.26-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 04b4103d71c3ee46cb3471fae2fec43431a1ff7e5e794a53c03c58bccf5e356b
MD5 ffd2d2d3e853913ef19286b1c5463081
BLAKE2b-256 cdc49086a4c7029b87208f957d2844d864bc3a865d8a6ccc43281028d1936e6c

See more details on using hashes here.

File details

Details for the file ctextlib-1.0.26-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: ctextlib-1.0.26-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 326.1 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ctextlib-1.0.26-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 5dad6fd33d90bb02078a7215905f3c2f1f9c379e9d766397d1e91a32aa6e5d75
MD5 e6ebb16515b085b4d24a19b61900c5ae
BLAKE2b-256 32bd363359f7d9aabab4a82aa851dc541a2d9214d1fd70d74500187808f74f97

See more details on using hashes here.

File details

Details for the file ctextlib-1.0.26-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: ctextlib-1.0.26-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 326.0 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ctextlib-1.0.26-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 586c1b50121acdca6db9412ee831e95ddc85ae843d566d69a5d9aa64b68032d2
MD5 90f5f5714725f9a1806c5dfa8c8f200e
BLAKE2b-256 591e0cfb7326b21ad927ee1614bae5bb8fb233bac6bbf361d724cb8ebee40bd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page