Skip to main content

Simple SQL based tagging and the associated `sqltags` command line script, supporting both tagged named objects and tagged timestamped log entries.

Project description

Simple SQL based tagging and the associated sqltags command line script, supporting both tagged named objects and tagged timestamped log entries.

Latest release 20230217: SQLTagsORM.search: previous changes seem to have dropped SQTCriterion support.

Compared to cs.fstags and its associated fstags command, this is oriented towards large numbers of items not naturally associated with filesystem objects.

My initial use case is an activity log (unnamed timestamped tag sets) but I'm also using it for ontologies (named tag sets containing metadata).

Many basic tasks can be performed with the sqltags command line utility, documented under the SQLTagsCommand class below.

See the SQLTagsORM documentation for details about how data are stored in the database. See the SQLTagSet documentation for details of how various tag value types are supported.

Class BaseSQLTagsCommand(cs.cmdutils.BaseCommand, cs.tagset.TagsCommandMixin)

Common features for commands oriented around an SQLTags database.

Command line usage:

Usage: basesqltags [-f db_url] subcommand [...]
  -f db_url SQLAlchemy database URL or filename.
            Default from $SQLTAGS_DBURL (default '~/var/sqltags.sqlite').
  Subcommands:
    dbshell
      Start an interactive database shell.
    edit criteria...
      Edit the entities specified by criteria.
    export [-F format] [{tag[=value]|-tag}...]
      Export entities matching all the constraints.
      -F format Specify the export format, either CSV or FSTAGS.
    find [-o output_format] {tag[=value]|-tag}...
      List entities matching all the constraints.
      -o output_format
                  Use output_format as a Python format string to lay out
                  the listing.
                  Default: {localtime} {headline}
    help [-l] [subcommand-names...]
      Print the full help for the named subcommands,
      or for all subcommands if no names are specified.
      -l  Long help even if no subcommand-names provided.
    import [{-u|--update}] {-|srcpath}...
      Import CSV data in the format emitted by "export".
      Each argument is a file path or "-", indicating standard input.
      -u, --update  If a named entity already exists then update its tags.
                    Otherwise this will be seen as a conflict
                    and the import aborted.
    init
      Initialise the database.
      This includes defining the schema and making the root metanode.
    log [-c category,...] [-d when] [-D strptime] {-|headline} [tags...]
      Record entries into the database.
      If headline is '-', read headlines from standard input.
      -c categories
        Specify the categories for this log entry.
        The default is to recognise a leading CAT,CAT,...: prefix.
      -d when
        Use when, an ISO8601 date, as the log entry timestamp.
      -D strptime
        Read the time from the start of the headline
        according to the provided strptime specification.
    orm define_schema
      Runs the ORM's `define_schema()` method, which creates missing tables
      and entity 0 if missing.
    shell
      Run an interactive Python prompt with some predefined local names.
    tag {-|entity-name} {tag[=value]|-tag}...
      Tag an entity with multiple tags.
      With the form "-tag", remove that tag from the direct tags.
      A entity-name named "-" indicates that entity-names should
      be read from the standard input.

Function glob2like(glob: str) -> str

Convert a filename glob to an SQL LIKE pattern.

Function main(argv=None)

Command line mode.

Class PolyValue(PolyValue, builtins.tuple)

A namedtuple for the polyvalues used in an SQLTagsORM.

We express various types in SQL as one of 3 columns:

  • float_value: for floats and ints which round trip with float
  • string_value: for str
  • structured_value: a JSON transcription of any other type

This allows SQL indexing of basic types.

Note that because str gets stored in string_value this leaves us free to use "bare string" JSON to serialise various nonJSONable types.

The SQLTagSets class has a to_polyvalue factory which produces a PolyValue suitable for the SQL rows. NonJSONable types such as datetime are converted to a str but stored in the structured_value column. This should be overridden by subclasses as necessary.

On retrieval from the database the tag rows are converted to Python values by the SQLTagSets.from_polyvalue method, reversing the process above.

Class PolyValueColumnMixin

A mixin for classes with (float_value,string_value,structured_value) columns. This is used by the Tags and TagMultiValues relations inside SQLTagsORM.

Function prefix2like(prefix: str, esc='\\') -> str

Convert a prefix string to an SQL LIKE pattern.

Class SQLParameters(SQLParameters, builtins.tuple)

The parameters required for constructing queries or extending queries with JOINs.

Attributes:

  • criterion: the source criterion, usually an SQTCriterion subinstance
  • alias: an alias of the source table for use in queries
  • entity_id_column: the entities id column, alias.id if the alias is of entities, alias.entity_id if the alias is of tags
  • constraint: a filter query based on alias

Class SQLTagBasedTest(cs.tagset.TagBasedTest, cs.tagset.TagBasedTest, builtins.tuple, SQTCriterion, cs.tagset.TagSetCriterion, cs.deco.Promotable)

A cs.tagset.TagBasedTest extended with a .sql_parameters method.

Class SQLTagProxies

A proxy for the tags supporting Python comparison => SQLParameters.

Example:

sqltags.tags.dotted.name.here == 'foo'

Class SQLTagProxy

An object based on a Tag name which produces an SQLParameters when compared with some value.

Example:

>>> sqltags = SQLTags('sqlite://')
>>> sqltags.init()
>>> # make a SQLParameters for testing the tag 'name.thing'==5
>>> sqlp = sqltags.tags.name.thing == 5
>>> str(sqlp.constraint)
'tags_1.name = :name_1 AND tags_1.float_value = :float_value_1'
>>> sqlp = sqltags.tags.name.thing == 'foo'
>>> str(sqlp.constraint)
'tags_1.name = :name_1 AND tags_1.string_value = :string_value_1'

Class SQLTags(cs.tagset.BaseTagSets, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, collections.abc.MutableMapping, collections.abc.Mapping, collections.abc.Collection, collections.abc.Sized, collections.abc.Iterable, collections.abc.Container, cs.deco.Promotable)

A class using an SQL database to store its TagSets.

Class SQLTagsCommand(BaseSQLTagsCommand, cs.cmdutils.BaseCommand, cs.tagset.TagsCommandMixin)

sqltags main command line utility.

Command line usage:

Usage: sqltags [-f db_url] subcommand [...]
  -f db_url SQLAlchemy database URL or filename.
            Default from $SQLTAGS_DBURL (default '~/var/sqltags.sqlite').
  Subcommands:
    dbshell
      Start an interactive database shell.
    edit criteria...
      Edit the entities specified by criteria.
    export [-F format] [{tag[=value]|-tag}...]
      Export entities matching all the constraints.
      -F format Specify the export format, either CSV or FSTAGS.
    find [-o output_format] {tag[=value]|-tag}...
      List entities matching all the constraints.
      -o output_format
                  Use output_format as a Python format string to lay out
                  the listing.
                  Default: {localtime} {headline}
    help [-l] [subcommand-names...]
      Print the full help for the named subcommands,
      or for all subcommands if no names are specified.
      -l  Long help even if no subcommand-names provided.
    import [{-u|--update}] {-|srcpath}...
      Import CSV data in the format emitted by "export".
      Each argument is a file path or "-", indicating standard input.
      -u, --update  If a named entity already exists then update its tags.
                    Otherwise this will be seen as a conflict
                    and the import aborted.
    init
      Initialise the database.
      This includes defining the schema and making the root metanode.
    list [entity-names...]
      List entities and their tags.
    log [-c category,...] [-d when] [-D strptime] {-|headline} [tags...]
      Record entries into the database.
      If headline is '-', read headlines from standard input.
      -c categories
        Specify the categories for this log entry.
        The default is to recognise a leading CAT,CAT,...: prefix.
      -d when
        Use when, an ISO8601 date, as the log entry timestamp.
      -D strptime
        Read the time from the start of the headline
        according to the provided strptime specification.
    ls [entity-names...]
      List entities and their tags.
    orm define_schema
      Runs the ORM's `define_schema()` method, which creates missing tables
      and entity 0 if missing.
    shell
      Run an interactive Python prompt with some predefined local names.
    tag {-|entity-name} {tag[=value]|-tag}...
      Tag an entity with multiple tags.
      With the form "-tag", remove that tag from the direct tags.
      A entity-name named "-" indicates that entity-names should
      be read from the standard input.

Class SQLTagSet(cs.obj.SingletonMixin, cs.tagset.TagSet, builtins.dict, cs.dateutils.UNIXTimeMixin, cs.lex.FormatableMixin, cs.lex.FormatableFormatter, string.Formatter, cs.mappings.AttrableMappingMixin)

A singleton TagSet attached to an SQLTags instance.

As with the TagSet superclass, tag values can be any Python type. However, because we are storing these values in an SQL database it is necessary to provide a conversion facility to prepare those values for storage.

The database schema is described in the SQLTagsORM class; in short we directly support None, float and str, ints which round trip with float, and list, tuple and dict whose contents transcribe to JSON.

ints which are too large to round trip with float are treated as an extended "bigint" type using the scheme described below.

Because the ORM has distinct float and str columns to support indexing, there will be no plain strings in the remaining JSON blob column. Therefore we support other types by providing functions to convert each type to a str and back, and an associated "type label" which will be prefixed to the string; the resulting string is stored in the JSON blob.

The default mechanism is based on the following class attributes and methods:

  • TYPE_JS_MAPPING: a mapping of a type label string to a 3 tuple of (type,to_str,from_str) being the extended type, a function to convert an instance to str and a function to convert a str to an instance of this type
  • to_js_str: a method accepting (tag_name,tag_value) and returning tag_value as a str; the default implementation looks up the type of tag_value in TYPE_JS_MAPPING to locate the corresponding to_str function
  • from_js_str: a method accepting (tag_name,js) which uses the leading type label prefix from the js to look up the corresponding from_str function from TYPE_JS_MAPPING and use it on the tail of js

The default TYPE_JS_MAPPING has mappings for:

  • "bigint": conversions for int
  • "date": conversions for datetime.date
  • "datetime": conversions for datetime.datetime

Subclasses wanting to augument the TYPE_JS_MAPPING should prepare their own with code such as:

class SubSQLTagSet(SQLTagSet,....):
    ....
    TYPE_JS_MAPPING=dict(SQLTagSet.TYPE_JS_MAPPING)
    TYPE_JS_MAPPING.update(
      typelabel=(type, to_str, from_str),
      ....
    )

Class SQLTagsORM(cs.sqlalchemy_utils.ORM, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, cs.dateutils.UNIXTimeMixin)

The ORM for an SQLTags.

The current implementation uses 3 tables:

  • entities: this has a NULLable name and unixtime UNIX timestamp; this is unique per name if the name is not NULL
  • tags: this has an entity_id, name and a value stored in one of three columns: float_value, string_value and structured_value which is a JSON blob; this is unique per (entity_id,name)
  • tag_subvalues: this is a broken out version of tags when structured_value is a sequence or mapping, breaking out the values one per row; this exists to support "tag contains value" lookups

Tag values are stored as follows:

  • None: all 3 columns are set to NULL
  • float: stored in float_value
  • int: if the int round trips to float then it is stored in float_value, otherwise it is stored in structured_value with the type label "bigint"
  • str: stored in string_value
  • list, tuple, dict: stored in structured_value; if these containers contain unJSONable content there will be trouble
  • other types, such as datetime: these are converted to strings with identifying type label prefixes and stored in structured_value

The float_value and string_value columns allow us to provide indices for these kinds of tag values.

The type label scheme takes advantage of the fact that actual strs are stored in the string_value column. Because of this, there will be no actual strings in structured_value. Therefore, we can convert nonJSONable types to str and store them here.

The scheme used is to provide conversion functions to convert types to str and back, and an associated "type label" prefix. For example, we store a datetime as the ISO format of the datetime with "datetime:" prefixed to it.

The actual conversions are kept with the SQLTagSet class (or any subclass). This ORM receives the 3-tuples of SQL ready values from that class as the PolyValue namedtuple and does not perform any conversion itself. The conversion process is described in SQLTagSet.

Class SQTCriterion(cs.tagset.TagSetCriterion, cs.deco.Promotable)

Subclass of TagSetCriterion requiring an .sql_parameters method which returns an SQLParameters providing the information required to construct an sqlalchemy query. It also resets .CRITERION_PARSE_CLASSES, which will pick up the SQL capable criterion classes below.

Class SQTEntityIdTest(SQTCriterion, cs.tagset.TagSetCriterion, cs.deco.Promotable)

A test on entity.id.

Function verbose(msg, *a)

Emit message if in verbose mode.

Release Log

Release 20230217: SQLTagsORM.search: previous changes seem to have dropped SQTCriterion support.

Release 20230212.1: Mark SQLTags as promotable.

Release 20230212:

  • @promote support for SQLTags, promoting a filesystem path to a .sqlite db.
  • Simpler SQLTagsORM.search comparison implementation.
  • SQLTagSet: inherit format attributes from superclasses (TagSet).
  • New BaseSQLTagsCommand.cmd_shell method.
  • New BaseSQLTagsCommand.cmd_orm method with "define_schema" subcommand to update the db schema.
  • SQLTagsORM.init: drop case_sensitive, no longer supported?
  • SQLTagsORM.init: always call define_schema, it seems there are scenarios where this does some necessary sqlalchemy prep.

Release 20221228: SQLTagsCommand: update implementation of BaseCommand.run_context to use super().run_context().

Release 20220806:

  • Bugfix for SQLTagsORM.search(mode='entity').
  • SQLTags.find: new _without_tags=False parameter to allow fast searches omitting the entity tags.

Release 20220606:

  • New SQLTagsORM.Entities.add_new_tags method, use it in SQLTags.default_factory for bulk insert.
  • SQTCriterion: new .from_equality(tag_name,tag_value) factory to make an equality criterion.
  • SQLTags.find: accept criteria as positional parameters instead of a single iterable, accept new keyword parameters as equality criteria.
  • SQLTags.getitem: accept a slice to index the .unixtime tag.
  • SQLTagsORM: also turn on echo mode if "ECHO" in $SQLTAGS_MODES.

Release 20220311: Assorted updates.

Release 20211212:

  • Rename edit_many to edit_tagsets for clarity.
  • Small bugfixes.

Release 20210913:

  • SQLTagsCommand: rename cmd_ns to cmd_list,cmd_ls.
  • SQLTagsCommand.cmd_export: accept "-F export_format" for csv or fstags export, accept no criteria to mean all tagsets.
  • Encoding schema for nonJSONable types.
  • Rename the TagSets abstract base class to BaseTagSets.
  • BaseSQLTagsCommand.cmd_edit: implement rename.
  • Many other internal small changes.

Release 20210420:

  • New PolyValueMixin pulled out of Tags for common support of the (float_value,string_value,structured_value).
  • SQLTagsORM: new TagSubValues relation containing broken out values for values which are sequences, to support efficient lookup if sequence values such as log entry categories.
  • New BaseSQLTagsCommand.parse_categories static method to parse FOO,BAH into ['foo','bah'].
  • sqltags find: change default format to "{datetime} {headline}".
  • Assorted small changes.

Release 20210404:

  • SQLTags.getitem: when autocreating an entity, do it in a new session so that the entity is commited to the database before any further use.
  • SQLTagsCommand: new cmd_dbshell to drop you into the database.

Release 20210321: Drop logic now merged with cs.sqlalchemy_utils, use the new default session stuff.

Release 20210306.1: Docstring updates.

Release 20210306: Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cs.sqltags-20230217.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

cs.sqltags-20230217-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file cs.sqltags-20230217.tar.gz.

File metadata

  • Download URL: cs.sqltags-20230217.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for cs.sqltags-20230217.tar.gz
Algorithm Hash digest
SHA256 8b864167cd1ba41989f3072ce671cdda379803eff15ea95ffe453aa8002cae47
MD5 76179d84d6a48c6b6f79d2ef99aca0b7
BLAKE2b-256 a1ddd540a7ac9eab1c437dd3554446e000bae228e1d9d16d789f6998bd3fad29

See more details on using hashes here.

File details

Details for the file cs.sqltags-20230217-py3-none-any.whl.

File metadata

File hashes

Hashes for cs.sqltags-20230217-py3-none-any.whl
Algorithm Hash digest
SHA256 b11fa23e99ebf90db574855ec7a245fdeae52b1bf9357acf92c027ebcec030f9
MD5 61971cbd38366dc8a129c23de6c06f19
BLAKE2b-256 69585259a3d98826ad26286548457d79679a8034fc4e18aae06755de6a749923

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page