Skip to main content

Simple SQL based tagging and the associated `sqltags` command line script, supporting both tagged named objects and tagged timestamped log entries.

Project description

Simple SQL based tagging and the associated sqltags command line script, supporting both tagged named objects and tagged timestamped log entries.

Latest release 20240316: Fixed release upload artifacts.

Compared to cs.fstags and its associated fstags command, this is oriented towards large numbers of items not naturally associated with filesystem objects.

My initial use case is an activity log (unnamed timestamped tag sets) but I'm also using it for ontologies (named tag sets containing metadata).

Many basic tasks can be performed with the sqltags command line utility, documented under the SQLTagsCommand class below.

See the SQLTagsORM documentation for details about how data are stored in the database. See the SQLTagSet documentation for details of how various tag value types are supported.

Class BaseSQLTagsCommand(cs.cmdutils.BaseCommand, SQLTagsCommandsMixin, cs.tagset.TagsCommandMixin)

Common features for commands oriented around an SQLTags database.

Command line usage:

Usage: basesqltags [-f db_url] subcommand [...]
  -f db_url SQLAlchemy database URL or filename.
            Default from $SQLTAGS_DBURL (default '~/var/sqltags.sqlite').
  Subcommands:
    dbshell
      Start an interactive database shell.
    edit criteria...
      Edit the entities specified by criteria.
    export [-F format] [{tag[=value]|-tag}...]
      Export entities matching all the constraints.
      -F format Specify the export format, either CSV or FSTAGS.
    find [-o output_format] {tag[=value]|-tag}...
      List entities matching all the constraints.
      -o output_format
                  Use output_format as a Python format string to lay out
                  the listing.
                  Default: {localtime} {headline}
    help [-l] [subcommand-names...]
      Print help for subcommands.
      This outputs the full help for the named subcommands,
      or the short help for all subcommands if no names are specified.
      -l  Long help even if no subcommand-names provided.
    import [{-u|--update}] {-|srcpath}...
      Import CSV data in the format emitted by "export".
      Each argument is a file path or "-", indicating standard input.
      -u, --update  If a named entity already exists then update its tags.
                    Otherwise this will be seen as a conflict
                    and the import aborted.
    init
      Initialise the database.
      This includes defining the schema and making the root metanode.
    log [-c category,...] [-d when] [-D strptime] {-|headline} [tags...]
      Record entries into the database.
      If headline is '-', read headlines from standard input.
      -c categories
        Specify the categories for this log entry.
        The default is to recognise a leading CAT,CAT,...: prefix.
      -d when
        Use when, an ISO8601 date, as the log entry timestamp.
      -D strptime
        Read the time from the start of the headline
        according to the provided strptime specification.
    orm define_schema
      Runs the ORM's `define_schema()` method, which creates missing tables
      and entity 0 if missing.
    shell
      Run a command prompt via cmd.Cmd using this command's subcommands.
    tag {-|entity-name} {tag[=value]|-tag}...
      Tag an entity with multiple tags.
      With the form "-tag", remove that tag from the direct tags.
      A entity-name named "-" indicates that entity-names should
      be read from the standard input.

Function glob2like(glob: str) -> str

Convert a filename glob to an SQL LIKE pattern.

Function main(argv=None)

Command line mode.

Class PolyValue(PolyValue, builtins.tuple, PolyValued)

A namedtuple for the polyvalues used in an SQLTagsORM.

We express various types in SQL as one of 3 columns:

  • float_value: for floats and ints which round trip with float
  • string_value: for str
  • structured_value: a JSON transcription of any other type

This allows SQL indexing of basic types.

Note that because str gets stored in string_value this leaves us free to use "bare string" JSON to serialise various nonJSONable types.

The SQLTagSets class has a to_polyvalue factory which produces a PolyValue suitable for the SQL rows. NonJSONable types such as datetime are converted to a str but stored in the structured_value column. This should be overridden by subclasses as necessary.

On retrieval from the database the tag rows are converted to Python values by the SQLTagSets.from_polyvalue method, reversing the process above.

Class PolyValueColumnMixin(PolyValued)

A mixin for classes with (float_value,string_value,structured_value) columns. This is used by the Tags and TagMultiValues relations inside SQLTagsORM.

Class PolyValued

A mixin for classes with (float_value,string_value,structured_value) columns.

Function prefix2like(prefix: str, esc='\\') -> str

Convert a prefix string to an SQL LIKE pattern.

Class SQLParameters(SQLParameters, builtins.tuple)

The parameters required for constructing queries or extending queries with JOINs.

Attributes:

  • criterion: the source criterion, usually an SQTCriterion subinstance
  • alias: an alias of the source table for use in queries
  • entity_id_column: the entities id column, alias.id if the alias is of entities, alias.entity_id if the alias is of tags
  • constraint: a filter query based on alias

Class SQLTagBasedTest(cs.tagset.TagBasedTest, cs.tagset.TagBasedTest, builtins.tuple, SQTCriterion, cs.tagset.TagSetCriterion, cs.deco.Promotable)

A cs.tagset.TagBasedTest extended with a .sql_parameters method.

Class SQLTagProxies

A proxy for the tags supporting Python comparison => SQLParameters.

Example:

sqltags.tags.dotted.name.here == 'foo'

Class SQLTagProxy

An object based on a Tag name which produces an SQLParameters when compared with some value.

Example:

>>> sqltags = SQLTags('sqlite://')
>>> sqltags.init()
>>> # make a SQLParameters for testing the tag 'name.thing'==5
>>> sqlp = sqltags.tags.name.thing == 5
>>> str(sqlp.constraint)
'tags_1.name = :name_1 AND tags_1.float_value = :float_value_1'
>>> sqlp = sqltags.tags.name.thing == 'foo'
>>> str(sqlp.constraint)
'tags_1.name = :name_1 AND tags_1.string_value = :string_value_1'

Class SQLTags(cs.tagset.BaseTagSets, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, collections.abc.MutableMapping, collections.abc.Mapping, collections.abc.Collection, collections.abc.Sized, collections.abc.Iterable, collections.abc.Container, cs.deco.Promotable)

A class using an SQL database to store its TagSets.

Class SQLTagsCommand(BaseSQLTagsCommand, cs.cmdutils.BaseCommand, SQLTagsCommandsMixin, cs.tagset.TagsCommandMixin)

sqltags main command line utility.

Command line usage:

Usage: sqltags [-f db_url] subcommand [...]
  -f db_url SQLAlchemy database URL or filename.
            Default from $SQLTAGS_DBURL (default '~/var/sqltags.sqlite').
  Subcommands:
    dbshell
      Start an interactive database shell.
    edit criteria...
      Edit the entities specified by criteria.
    export [-F format] [{tag[=value]|-tag}...]
      Export entities matching all the constraints.
      -F format Specify the export format, either CSV or FSTAGS.
    find [-o output_format] {tag[=value]|-tag}...
      List entities matching all the constraints.
      -o output_format
                  Use output_format as a Python format string to lay out
                  the listing.
                  Default: {localtime} {headline}
    help [-l] [subcommand-names...]
      Print help for subcommands.
      This outputs the full help for the named subcommands,
      or the short help for all subcommands if no names are specified.
      -l  Long help even if no subcommand-names provided.
    import [{-u|--update}] {-|srcpath}...
      Import CSV data in the format emitted by "export".
      Each argument is a file path or "-", indicating standard input.
      -u, --update  If a named entity already exists then update its tags.
                    Otherwise this will be seen as a conflict
                    and the import aborted.
    init
      Initialise the database.
      This includes defining the schema and making the root metanode.
    list [entity-names...]
      List entities and their tags.
    log [-c category,...] [-d when] [-D strptime] {-|headline} [tags...]
      Record entries into the database.
      If headline is '-', read headlines from standard input.
      -c categories
        Specify the categories for this log entry.
        The default is to recognise a leading CAT,CAT,...: prefix.
      -d when
        Use when, an ISO8601 date, as the log entry timestamp.
      -D strptime
        Read the time from the start of the headline
        according to the provided strptime specification.
    ls [entity-names...]
      List entities and their tags.
    orm define_schema
      Runs the ORM's `define_schema()` method, which creates missing tables
      and entity 0 if missing.
    shell
      Run a command prompt via cmd.Cmd using this command's subcommands.
    tag {-|entity-name} {tag[=value]|-tag}...
      Tag an entity with multiple tags.
      With the form "-tag", remove that tag from the direct tags.
      A entity-name named "-" indicates that entity-names should
      be read from the standard input.

Class SQLTagSet(cs.obj.SingletonMixin, cs.tagset.TagSet, builtins.dict, cs.dateutils.UNIXTimeMixin, cs.lex.FormatableMixin, cs.lex.FormatableFormatter, string.Formatter, cs.mappings.AttrableMappingMixin)

A singleton TagSet attached to an SQLTags instance.

As with the TagSet superclass, tag values can be any Python type. However, because we are storing these values in an SQL database it is necessary to provide a conversion facility to prepare those values for storage.

The database schema is described in the SQLTagsORM class; in short we directly support None, float and str, ints which round trip with float, and list, tuple and dict whose contents transcribe to JSON.

ints which are too large to round trip with float are treated as an extended "bigint" type using the scheme described below.

Because the ORM has distinct float and str columns to support indexing, there will be no plain strings in the remaining JSON blob column. Therefore we support other types by providing functions to convert each type to a str and back, and an associated "type label" which will be prefixed to the string; the resulting string is stored in the JSON blob.

The default mechanism is based on the following class attributes and methods:

  • TYPE_JS_MAPPING: a mapping of a type label string to a 3 tuple of (type,to_str,from_str) being the extended type, a function to convert an instance to str and a function to convert a str to an instance of this type
  • to_js_str: a method accepting (tag_name,tag_value) and returning tag_value as a str; the default implementation looks up the type of tag_value in TYPE_JS_MAPPING to locate the corresponding to_str function
  • from_js_str: a method accepting (tag_name,js) which uses the leading type label prefix from the js to look up the corresponding from_str function from TYPE_JS_MAPPING and use it on the tail of js

The default TYPE_JS_MAPPING has mappings for:

  • "bigint": conversions for int
  • "date": conversions for datetime.date
  • "datetime": conversions for datetime.datetime

Subclasses wanting to augument the TYPE_JS_MAPPING should prepare their own with code such as:

class SubSQLTagSet(SQLTagSet,....):
    ....
    TYPE_JS_MAPPING=dict(SQLTagSet.TYPE_JS_MAPPING)
    TYPE_JS_MAPPING.update(
      typelabel=(type, to_str, from_str),
      ....
    )

Class SQLTagsORM(cs.sqlalchemy_utils.ORM, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, cs.dateutils.UNIXTimeMixin)

The ORM for an SQLTags.

The current implementation uses 3 tables:

  • entities: this has a NULLable name and unixtime UNIX timestamp; this is unique per name if the name is not NULL
  • tags: this has an entity_id, name and a value stored in one of three columns: float_value, string_value and structured_value which is a JSON blob; this is unique per (entity_id,name)
  • tag_subvalues: this is a broken out version of tags when structured_value is a sequence or mapping, breaking out the values one per row; this exists to support "tag contains value" lookups

Tag values are stored as follows:

  • None: all 3 columns are set to NULL
  • float: stored in float_value
  • int: if the int round trips to float then it is stored in float_value, otherwise it is stored in structured_value with the type label "bigint"
  • str: stored in string_value
  • list, tuple, dict: stored in structured_value; if these containers contain unJSONable content there will be trouble
  • other types, such as datetime: these are converted to strings with identifying type label prefixes and stored in structured_value

The float_value and string_value columns allow us to provide indices for these kinds of tag values.

The type label scheme takes advantage of the fact that actual strs are stored in the string_value column. Because of this, there will be no actual strings in structured_value. Therefore, we can convert nonJSONable types to str and store them here.

The scheme used is to provide conversion functions to convert types to str and back, and an associated "type label" prefix. For example, we store a datetime as the ISO format of the datetime with "datetime:" prefixed to it.

The actual conversions are kept with the SQLTagSet class (or any subclass). This ORM receives the 3-tuples of SQL ready values from that class as the PolyValue namedtuple and does not perform any conversion itself. The conversion process is described in SQLTagSet.

Class SQTCriterion(cs.tagset.TagSetCriterion, cs.deco.Promotable)

Subclass of TagSetCriterion requiring an .sql_parameters method which returns an SQLParameters providing the information required to construct an sqlalchemy query. It also resets .CRITERION_PARSE_CLASSES, which will pick up the SQL capable criterion classes below.

Class SQTEntityIdTest(SQTCriterion, cs.tagset.TagSetCriterion, cs.deco.Promotable)

A test on entity.id.

Function verbose(msg, *a)

Emit message if in verbose mode.

Release Log

Release 20240316: Fixed release upload artifacts.

Release 20240305: SQLTags: new .from_str so that we can inherit Promotable.promote.

Release 20240201.1: Release with the "sqltags" script.

Release 20240201:

  • SQLTagSet.to_polyvalue: treat sets like lists.
  • SQLTags.default_factory: honour new skip_refresh parameter, apply any presupplied tags.
  • Pull the cmd_* methods from BaseSQLTagsCommand into new SQLTagsCommandsMixin for reuse.

Release 20230612:

  • SQLTagBasedTest.sql_parameters: fix general tag name.
  • SQLTagSet: new jsonable class method to produce a JSON serialisable object - converts sets and Sequences to flat lists, etc.

Release 20230217: SQLTagsORM.search: previous changes seem to have dropped SQTCriterion support.

Release 20230212.1: Mark SQLTags as promotable.

Release 20230212:

  • @promote support for SQLTags, promoting a filesystem path to a .sqlite db.
  • Simpler SQLTagsORM.search comparison implementation.
  • SQLTagSet: inherit format attributes from superclasses (TagSet).
  • New BaseSQLTagsCommand.cmd_shell method.
  • New BaseSQLTagsCommand.cmd_orm method with "define_schema" subcommand to update the db schema.
  • SQLTagsORM.init: drop case_sensitive, no longer supported?
  • SQLTagsORM.init: always call define_schema, it seems there are scenarios where this does some necessary sqlalchemy prep.

Release 20221228: SQLTagsCommand: update implementation of BaseCommand.run_context to use super().run_context().

Release 20220806:

  • Bugfix for SQLTagsORM.search(mode='entity').
  • SQLTags.find: new _without_tags=False parameter to allow fast searches omitting the entity tags.

Release 20220606:

  • New SQLTagsORM.Entities.add_new_tags method, use it in SQLTags.default_factory for bulk insert.
  • SQTCriterion: new .from_equality(tag_name,tag_value) factory to make an equality criterion.
  • SQLTags.find: accept criteria as positional parameters instead of a single iterable, accept new keyword parameters as equality criteria.
  • SQLTags.getitem: accept a slice to index the .unixtime tag.
  • SQLTagsORM: also turn on echo mode if "ECHO" in $SQLTAGS_MODES.

Release 20220311: Assorted updates.

Release 20211212:

  • Rename edit_many to edit_tagsets for clarity.
  • Small bugfixes.

Release 20210913:

  • SQLTagsCommand: rename cmd_ns to cmd_list,cmd_ls.
  • SQLTagsCommand.cmd_export: accept "-F export_format" for csv or fstags export, accept no criteria to mean all tagsets.
  • Encoding schema for nonJSONable types.
  • Rename the TagSets abstract base class to BaseTagSets.
  • BaseSQLTagsCommand.cmd_edit: implement rename.
  • Many other internal small changes.

Release 20210420:

  • New PolyValueMixin pulled out of Tags for common support of the (float_value,string_value,structured_value).
  • SQLTagsORM: new TagSubValues relation containing broken out values for values which are sequences, to support efficient lookup if sequence values such as log entry categories.
  • New BaseSQLTagsCommand.parse_categories static method to parse FOO,BAH into ['foo','bah'].
  • sqltags find: change default format to "{datetime} {headline}".
  • Assorted small changes.

Release 20210404:

  • SQLTags.getitem: when autocreating an entity, do it in a new session so that the entity is commited to the database before any further use.
  • SQLTagsCommand: new cmd_dbshell to drop you into the database.

Release 20210321: Drop logic now merged with cs.sqlalchemy_utils, use the new default session stuff.

Release 20210306.1: Docstring updates.

Release 20210306: Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cs.sqltags-20240316.tar.gz (38.5 kB view hashes)

Uploaded Source

Built Distribution

cs.sqltags-20240316-py3-none-any.whl (28.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page