4 projects
extractionstring
Basic tools to tokenize (i.e. to construct atomic-entities/sub-strings of) a string, for Natural Language Processing (NLP). Usefull also for annotation, tree parsing, entity linking, ... (in fact, anything that links a string or its sub-parts to an other object). Key concepts are versatility to other librairies, and freedom to define many concepts on top of a string.
iamtokenizing
Simple tokenizers: n-grams and chargrams splitting, white space splitting, or splitting using configurable REGEX expression, or detection into context tokenization. Based on ExtractionString object from the extractionstring package.
tokenspan
Basic tools to tokenize (i.e. to construct atomic-entities/sub-strings of) a string, for Natural Language Processing (NLP). Usefull also for annotation, tree parsing, entity linking, ... (in fact, anything that links a string or its sub-parts to an other object). Key concepts are versatility to other librairies, and freedom to define many concepts on top of a string.
substitutionstring
Manipulate substitution of string, as for instance deletion and insertion, without loss of information, and allow some algebra of the underneath Substitution object. Can be usefull for any manipulation of string, as version control system, natural language processing, or string comparison in a general sense. The simplest way of using this package is throw the SubstitutionString object, which handles the machinery of the Substitution applied to a given string.