Skip to main content

TextBase library to manipulate DBText style data files

Project description

textbase

A Python library to manipulate Inmagic/DBText style data files

Moving this project from https://github.com/epoz/textbase to the Brill Gitlab space.

What are textbase files?

A simple format separating data records with a single character delimiter, (all files we use have a $ character on a line. For each record the fieldname is the first word on the line, usually in upper case. Any text following the fieldname is that value for the field. Repeating values in a list for the fieldname can be specified on consecutive lines using a semicolon. If the text value for a field is very long and needs to wrap, start the line with one (or more) spaces.

Why did you re-invent the wheel?

We already have CSV files, or JSON files, or YAML, why did you make this? Well, I didn’t invent this. It is actually a format used by a suite of software from InMagic: http://www.inmagic.com/products/dbtext-library-suite/

We used the dbText software to create a boatload of data files since the early eighties, which is a LONG time ago in Internet-land. Those exact same data files are still used to drive a lot of software, and has proven to be remarkably useful over the years. Think of it as Markdown vs HTML, or as a simpler dataformat with über-simple Key:Value records that are human readable.

Example File:

FOO A Foo field
BAR A Baz field with mulitple entries
; Another
; and yet even more
$
FOO This is the FOO field for the next record
BAR Nothing

The main utitlity class is TextBase. It can be initialised with an open file, or a string buffer, named sourcefile. Sourcefile is iterated over, splitting the contents into chunks. Each chunk is parsed and added to an internal buffer list. The internal buffer contains a dict for each record. Each entry in the dict is keyed on the DBText record fieldname, the entry itself is a list of the values.

The TextBase object can be used as a generator to iterate over the contents, or the Textbase object can be index-addressed like a list.

Example Usage:

import textbase
t = textbase.TextBase(somebuf)

print len(t)

for x in t[10:20]:
    print x.keys()

print t[0]

If you do not want the records parsed into Python dictionaries and just want to muck about with the records as text blobs, initialise like this:

t = textbase.TextBase(somebuf, parse=False)

Running with Docker

You can automatically convert all .xlsx files from a directory to .dmp files by running the following command:

docker run --rm -ti -v $(pwd):/data registry.gitlab.com/brillpublishers/code/textbase:latest

This will check the current directory (and all directories below it) for .xlsx files and convert them to a .dmp file with the same filename. If a .dmp file fith that name already exists it is skipped.

The Excel file should conform to the following conventions:

  • The first row contains the fieldnames. Fieldsnames are converted to uppercase in the textbase objects

  • There MUST be a column named ID in the first row. All textbase records must have an ID. If a data row is encountered without an ID, it is skipped.

  • If there is no column called TYPE, the filename of the Excel file is automatically added as a type for those objects.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TextBase-0.15.tar.gz (4.8 kB view details)

Uploaded Source

File details

Details for the file TextBase-0.15.tar.gz.

File metadata

  • Download URL: TextBase-0.15.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.4

File hashes

Hashes for TextBase-0.15.tar.gz
Algorithm Hash digest
SHA256 fc853db3358b4a0eefeeef203887298f0467580c70e96185f266b3a449791efc
MD5 f25ca8cf35108c57f1762387b3076c57
BLAKE2b-256 590601ad7b7674252a1ce55360b828c9593b32b20d184955358453b983672b73

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page