Skip to main content

Tags and sets of tags with __format__ support and optional ontology information.

Project description

Tags and sets of tags with format support and optional ontology information.

Latest release 20210306:

  • ExtendedNamespace,TagSetNamespace: move the .[:alpha:]* attribute support from ExtendedNamespace to TagSetNamespace because it requires Tags.

  • TagSetNamespace.getattr: new _i, _s, _f suffixes to return int, str or float tag values (or None); fold _lc in with these.

  • Pull most of TaggedEntity out into TaggedEntityMixin for reuse by domain specific tagged entities.

  • TaggedEntity: new .set and .discard methods.

  • TaggedEntity: new as_editable_line, from_editable_line, edit and edit_entities methods to support editing entities using a text editor.

  • ontologies: type entries are now prefixed with "type." and metadata entries are prefixed with "meta."; provide a worked ontology example in the introduction and improve related docstrings.

  • TagsOntology: new .types(), .types_names(), .meta(type_name,value), .meta_names() methods.

  • TagsOntology.getitem: create missing TagSets on demand.

  • New TagsOntologyCommand, initially with a "type [type_name [{edit|list}]]" subcommand, ready for use as the cmd_ont subcommand of other tag related commands.

  • TagSet: support initialisation like a dict including keywords, and move the ontology parameter to _onotology.

  • TagSet: include AttrableMappingMixin to enable attribute access to values when there is no conflict with normal methods.

  • UUID encode/decode support.

  • Honour $TAGSET_EDITOR or $EDITOR as preferred interactive editor for tags.

  • New TagSet.subtags(prefix) to extract a subset of the tags.

  • TagsOntology.value_metadata: new optional convert parameter to override the default "convert human friendly name" algorithm, particularly to pass convert=str to things which are already the basic id.

  • Rename TaggedEntity to TagSet.

  • Rename TaggedEntities to TagSets.

  • TagSet: new csvrow and from_csvrow methods imported from obsolete TaggedEntityMixin class.

  • Move BaseTagFile from cs.fstags to TagFile in cs.tagset.

  • TagSet: support access to the tag "c.x" via attributes provided there is no "c" tag in the way.

  • TagSet.unixtime: implement the autoset-to-now semantics.

  • New as_timestamp(): convert date, datetime, int or float to a UNIX timestamp.

  • Assorted docstring updates and bugfixes.

    See cs.fstags for support for applying these to filesystem objects such as directories and files.

    See cs.sqltags for support for databases of entities with tags, not directly associated with filesystem objects. This is suited to both log entries (entities with no "name") and large collections of named entities; both accept Tags and can be searched on that basis.

    All of the available complexity is optional: you can use Tags without bothering with TagSets or TagsOntologys.

    This module contains the following main classes:

    • Tag: an object with a .name and optional .value (default None) and also an optional reference .ontology for associating semantics with tag values. The .value, if not None, will often be a string, but may be any Python object. If you're using these via cs.fstags, the object will need to be JSON transcribeable.
    • TagSet: a dict subclass representing a set of Tags to associate with something; it also has setlike .add and .discard methods. As such it only supports a single Tag for a given tag name, but that tag value can of course be a sequence or mapping for more elaborate tag values.
    • TagsOntology: a mapping of type names to TagSets defining the type. This mapping also contains entries for the metadata for specific type values.

    Here's a simple example with some Tags and a TagSet.

      >>> tags = TagSet()
      >>> # add a "bare" Tag named 'blue' with no value
      >>> tags.add('blue')
      >>> # add a "topic=tagging" Tag
      >>> tags.add('topic','tagging')
      >>> # make a "subtopic" Tag and add it
      >>> subtopic = Tag('subtopic', 'ontologies')
      >>> tags.add(subtopic)
      >>> # Tags have nice repr() and str()
      >>> subtopic
      Tag(name='subtopic',value='ontologies',ontology=None)
      >>> print(subtopic)
      subtopic=ontologies
      >>> # TagSets also have nice repr() and str()
      >>> tags
      TagSet:{'blue': None, 'topic': 'tagging', 'subtopic': 'ontologies'}
      >>> print(tags)
      blue subtopic=ontologies topic=tagging
      >>> tags2 = TagSet({'a': 1}, b=3, c=[1,2,3], d='dee')
      >>> tags2
      TagSet:{'a': 1, 'b': 3, 'c': [1, 2, 3], 'd': 'dee'}
      >>> print(tags2)
      a=1 b=3 c=[1,2,3] d=dee
      >>> # since you can print a TagSet to a file as a line of text
      >>> # you can get it back from a line of text
      >>> TagSet.from_line('a=1 b=3 c=[1,2,3] d=dee')
      TagSet:{'a': 1, 'b': 3, 'c': [1, 2, 3], 'd': 'dee'}
      >>> # because TagSets are dicts you can format strings with them
      >>> print('topic:{topic} subtopic:{subtopic}'.format_map(tags))
      topic:tagging subtopic:ontologies
      >>> # TagSets have convenient membership tests
      >>> # test for blueness
      >>> 'blue' in tags
      True
      >>> # test for redness
      >>> 'red' in tags
      False
      >>> # test for any "subtopic" tag
      >>> 'subtopic' in tags
      True
      >>> # test for subtopic=ontologies
      >>> subtopic in tags
      True
      >>> # test for subtopic=libraries
      >>> subtopic2 = Tag('subtopic', 'libraries')
      >>> subtopic2 in tags
      False
    

== Ontologies ==

Tags and TagSets suffice to apply simple annotations to things. However, an ontology brings meaning to those annotations.

See the TagsOntology class for implementation details, access methods and more examples.

Consider a record about a movie, with this TagSet:

title="Avengers Assemble"
series="Avengers (Marvel)"
cast={"Scarlett Johansson":"Black Widow (Marvel)"}

where we have the movie title, a name for the series in which it resides, and a cast as an association of actors with roles.

An ontology lets us associate implied types and metadata with these values.

Here's an example ontology supporting the above TagSet:

type.cast type=dict key_type=person member_type=character description="members of a production"
type.character description="an identified member of a story"
type.series type=str
meta.character.marvel.black_widow type=character names=["Natasha Romanov"]
meta.person.scarlett_johansson fullname="Scarlett Johansson" bio="Known for Black Widow in the Marvel stories."

The type information for a cast is defined by the ontology entry named type.cast, which tells us that a cast Tag is a dict, whose keys are of type person and whose values are of type character. (The default type is str.)

To find out the underlying type for a character we look that up in the ontology in turn; because it does not have a specified type Tag, it it taken to be a str.

Having the types for a cast, it is now possible to look up the metadata for the described cast members.

The key "Scarlett Johansson" is a person (from the type definition of cast). The ontology entry for her is named meta.person.scarlett_johansson which is computed as:

  • meta: the name prefix for metadata entries
  • person: the type name
  • scarlett_johansson: obtained by downcasing "Scarlett Johansson" and replacing whitespace with an underscore. The full conversion process is defined by the TagsOntology.value_to_tag_name function.

The key "Black Widow (Marvel)" is a character (again, from the type definition of cast). The ontology entry for her is named meta.character.marvel.black_widow which is computed as:

  • meta: the name prefix for metadata entries
  • character: the type name
  • marvel.black_widow: obtained by downcasing "Black Widow (Marvel)", replacing whitespace with an underscore, and moving a bracketed suffix to the front as an unbracketed prefix. The full conversion process is defined by the TagsOntology.value_to_tag_name function.

== Format Strings ==

While you can just use str.format_map as shown above for the directvalues in a TagSet (and some command line tools like fstags use this in output format specifications you can also use TagSets in format strings.

There is a TagSet.ns() method which constructs an enhanced type of SimpleNamespace from the tags in the set which allows convenient dot notation use in format strings, for example:

  tags = TagSet(colour='blue', labels=['a','b','c'], size=9, _ontology=ont)
  ns = tags.ns()
  print(f'colour={ns.colour}, info URL={ns.colour._meta.url}')
  colour=blue, info URL=https://en.wikipedia.org/wiki/Blue

There is a detailed run down of this in the TagSetNamespace docstring below.

Function as_unixtime(*a, **kw)

Convert a tag value to a UNIX timestamp.

This accepts int, float (already a timestamp) and date or datetime (use datetime.timestamp() for a nonnaive datetime, otherwise time.mktime(tag_value.time_tuple())`, which assumes the local time zone).

Class ExtendedNamespace(types.SimpleNamespace)

Subclass SimpleNamespace with inferred attributes intended primarily for use in format strings. As such it also presents attributes as [] elements via __getitem__.

Because [:alpha:]* attribute names are reserved for "public" keys/attributes, most methods commence with an underscore (_).

Method ExtendedNamespace.__format__(self, *a, **kw)

The default formatted form of this node. The value to format is '{type':'path'['public_keys']'`.

Method ExtendedNamespace.__getattr__(self, attr)

Just a stub so that (a) subclasses can call super().__getattr__ and (b) a pathbased AttributeError gets raised for better context.

Method ExtendedNamespace.__len__(self)

The number of public keys.

Method ExtendedNamespace.__str__(self)

Return a visible placeholder, supporting exposing this object in a format string so that the user knows there wasn't a value at this point in the dotted path.

Function main(_)

Test code.

Class RegexpTagRule

A regular expression based Tag rule.

This applies a regular expression to a string and returns inferred Tags.

Method RegexpTagRule.infer_tags(self, *a, **kw)

Apply the rule to the string s, return a list of Tags.

Class Tag(Tag,builtins.tuple)

A Tag has a .name (str) and a .value and an optional .ontology.

The name must be a dotted identifier.

Terminology:

  • A "bare" Tag has a value of None.
  • A "naive" Tag has an ontology of None.

The constructor for a Tag is unusual:

  • both the value and ontology are optional, defaulting to None
  • if name is a str then we always construct a new Tag with the suppplied values
  • if name is not a str it should be a Taglike object to promote; it is an error if the value parameter is not None in this case

The promotion process is as follows:

  • if name is a Tag subinstance then if the supplied ontology is not None and is not the ontology associated with name then a new Tag is made, otherwise name is returned unchanged
  • otherwise a new Tag is made from name using its .value and overriding its .ontology if the ontology parameter is not None

Method Tag.__str__(self)

Encode name and value.

Property Tag.basetype

The base type name for this tag. Returns None if there is no ontology.

This calls TagsOntology.basetype(self.ontology,self.type).

Method Tag.from_str(s, offset=0, ontology=None)

Parse a Tag definition from s at offset (default 0).

Method Tag.is_valid_name(name)

Test whether a tag name is valid: a dotted identifier.

Method Tag.key_metadata(self, *a, **kw)

Return the metadata definition for key.

The metadata TagSet is obtained from the ontology entry 'meta.*type*.*key_tag_name* where *type* is the Tag's key_typeand *key_tag_name* is the key converted into a dotted identifier byTagsOntology.value_to_tag_name`.

Property Tag.key_type

The type name for members of this tag.

This is required if .value is a mapping.

Property Tag.key_typedata

The typedata definition for this Tag's keys.

This is for Tags which store mappings, for example a movie cast, mapping actors to roles.

The name of the member type comes from the key_type entry from self.typedata. That name is then looked up in the ontology's types.

Method Tag.matches(self, name, value=None, *a, **kw)

Test whether this Tag matches (tag_name,value).

Method Tag.member_metadata(self, *a, **kw)

Return the metadata definition for self[member_key].

The metadata TagSet is obtained from the ontology entry 'meta.*type*.*member_tag_name* where *type* is the Tag's member_typeand *member_tag_name* is the member value converted into a dotted identifier byTagsOntology.value_to_tag_name`.

Property Tag.member_type

The type name for members of this tag.

This is required if .value is a sequence or mapping.

Property Tag.member_typedata

The typedata definition for this Tag's members.

This is for Tags which store mappings or sequences, for example a movie cast, mapping actors to roles, or a list of scenes.

The name of the member type comes from the member_type entry from self.typedata. That name is then looked up in the ontology's types.

Property Tag.meta

The Tag metadata derived from the Tag's ontology.

Method Tag.metadata(self, ontology=None, convert=None)

Fetch the metadata information about this specific tag value, derived through the ontology from the tag name and value. The default ontology is self.onotology.

For a scalar type (int, float, str) this is the ontology TagSet for self.value.

For a sequence (list) this is a list of the metadata for each member.

For a mapping (dict) this is mapping of key->value_metadata.

Method Tag.parse(s, offset=0, *, ontology)

Parse tag_name[=value], return (Tag,offset).

Method Tag.parse_name(s, offset=0)

Parse a tag name from s at offset: a dotted identifier.

Method Tag.parse_value(s, offset=0)

Parse a value from s at offset (default 0). Return the value, or None on no data.

Method Tag.transcribe_value(value)

Transcribe value for use in Tag transcription.

Property Tag.type

The type name for this Tag.

Unless the definition for self.name has a type tag, the type is self.ontology.value_to_tag_name(self.name).

For example, the tag series="Avengers (Marvel)" would look up the definition for series. If that had no type= tag, then the type would default to series which is what would be returned.

The corresponding metadata TagSet for that tag would have the name series.marvel.avengers.

By contrast, the tag cast={"Scarlett Johansson":"Black Widow (Marvel)"} would look up the definition for cast which might look like this:

cast type=dict key_type=person member_type=character

That says that the type name is dict, which is what would be returned.

Because the type is dict the definition also has key_type and member_type tags identifying the type names for the keys and values of the cast= tag. As such, the corresponding metadata TagSets in this example would be named person.scarlett_johansson and character.marvel.black_widow respectively.

Property Tag.typedata

The defining TagSet for this tag's name.

This is how its type is defined, and is obtained from: self.ontology['type.'+self.name]

For example, a Tag colour=blue gets its type information from the type.colour entry in an ontology.

Method Tag.with_prefix(name, value, *, ontology=None, prefix)

Make a new Tag whose name is prefixed with prefix+'.'.

Function tag_or_tag_value(*da, **dkw)

A decorator for functions or methods which may be called as:

func(name, [value])

or as:

func(Tag, [None])

The optional decorator argument no_self (default False) should be supplied for plain functions as they have no leading self parameter to accomodate.

Example:

@tag_or_tag_value
def add(self, tag_name, value, *, verbose=None):

This defines a .add() method which can be called with name and value or with single Taglike object (something with .name and .value attributes), for example:

tags = TagSet()
....
tags.add('colour', 'blue')
....
tag = Tag('size', 9)
tags.add(tag)

Class TagBasedTest(TagBasedTest,builtins.tuple,TagSetCriterion)

A test based on a Tag.

Attributes:

  • spec: the source text from which this choice was parsed, possibly None
  • choice: the apply/reject flag
  • tag: the Tag representing the criterion
  • comparison: an indication of the test comparison

The following comparison values are recognised:

  • None: test for the presence of the Tag
  • '=': test that the tag value equals tag.value
  • '<': test that the tag value is less than tag.value
  • '<=': test that the tag value is less than or equal to tag.value
  • '>': test that the tag value is greater than tag.value
  • '>=': test that the tag value is greater than or equal to tag.value
  • '~/': test if the tag value as a regexp is present in tag.value
  • '~': test if a matching tag value is present in tag.value

Method TagBasedTest.by_tag_value(name, value=None, *a, **kw)

Return a TagBasedTest based on a Tag or tag_name,tag_value.

Method TagBasedTest.match_tagged_entity(self, te: 'TagSet') -> bool

Test against the Tags in tags.

Note: comparisons when self.tag.name is not in tags always return False (possibly inverted by self.choice).

Method TagBasedTest.parse(s, offset=0, delim=None)

Parse tag_name[{<|<=|'='|'>='|>|'~'}value] and return (dict,offset) where the dict contains the following keys and values:

  • tag: a Tag embodying the tag name and value
  • comparison: an indication of the test comparison

Class TagFile(cs.obj.SingletonMixin,TagSets,cs.resources.MultiOpenMixin)

A reference to a specific file containing tags.

This manages a mapping of name => TagSet, itself a mapping of tag name => tag value.

Method TagFile.__setitem__(self, name, te)

Set item name to te.

Method TagFile.get(self, name, default=None)

Get from the tagsets.

Method TagFile.items(self, prefix=None)

tagsets.items

If the optional prefix is supplied, yield only those items whose keys start with prefix.

Method TagFile.keys(self, prefix=None)

tagsets.keys

If the options prefix is supplied, yield only those keys starting with prefix.

Method TagFile.load_tagsets(filepath, ontology)

Load filepath and return (tagsets,unparsed).

The returned tagsets are a mapping of name=>tag_name=>value. The returned unparsed is a list of (lineno,line) for lines which failed the parse (excluding the trailing newline).

Property TagFile.names

The names from this FSTagsTagFile as a list.

Method TagFile.parse_tags_line(*a, **kw)

Parse a "name tags..." line as from a .fstags file, return (name,TagSet).

Method TagFile.save(self)

Save the tag map to the tag file.

Method TagFile.save_tagsets(*a, **kw)

Save tagsets and unparsed to filepath.

This method will create the required intermediate directories if missing.

Method TagFile.shutdown(self)

Save the tagsets if modified.

Method TagFile.startup(self)

No special startup.

Method TagFile.tags_line(name, tags)

Transcribe a name and its tags for use as a .fstags file line.

Property TagFile.tagsets

The tag map from the tag file, a mapping of name=>TagSet.

This is loaded on demand.

Method TagFile.update(self, name, tags, *, prefix=None, verbose=None)

Update the tags for name from the supplied tags as for Tagset.update.

Method TagFile.values(self, prefix=None)

tagsets.values

If the optional prefix is supplied, yield only those values whose keys start with prefix.

Class TagsCommandMixin

Utility methods for cs.cmdutils.BaseCommand classes working with tags.

Optional subclass attributes:

  • TAGSET_CRITERION_CLASS: a TagSetCriterion duck class, default TagSetCriterion. For example, cs.sqltags has a subclass with an .extend_query method for computing an SQL JOIN used in searching for tagged entities.

Method TagsCommandMixin.parse_tag_choices(argv)

Parse argv as an iterable of [!]tag_name[=*tag_value] Tag` additions/deletions.

Method TagsCommandMixin.parse_tagset_criteria(argv, tag_based_test_class=None)

Parse tag specifications from argv until an unparseable item is found. Return (criteria,argv) where criteria is a list of the parsed criteria and argv is the remaining unparsed items.

Each item is parsed via cls.parse_tagset_criterion(item,tag_based_test_class).

Method TagsCommandMixin.parse_tagset_criterion(arg, tag_based_test_class=None)

Parse arg as a tag specification and return a tag_based_test_class instance via its .from_str factory method. Raises ValueError in a misparse. The default tag_based_test_class comes from cls.TAGSET_CRITERION_CLASS, which itself defaults to class TagSetCriterion.

The default TagSetCriterion.from_str recognises:

  • -tag_name: a negative requirement for tag_name
  • tag_name[=value]: a positive requirement for a tag_name with optional value.

Class TagSet(builtins.dict,cs.lex.FormatableMixin,cs.mappings.AttrableMappingMixin)

A setlike class associating a set of tag names with values.

This actually subclasses dict, so a TagSet is a direct mapping of tag names to values. It accepts attribute access to simple tag values when they do not conflict with the class methods; the reliable method is normal item access.

NOTE: iteration yields Tags, not dict keys.

Also note that all the Tags from TagSet share its ontology.

Subclasses should override the set and discard methods; the dict and mapping methods are defined in terms of these two basic operations.

TagSets have a few special properties:

  • id: a domain specific identifier; this may reasonably be None for entities not associated with database rows; the cs.sqltags.SQLTags class associates this with the database row id.
  • name: the entity's name; a read only alias for the 'name' Tag. The cs.sqltags.SQLTags class defines "log entries" as TagSets with no name.
  • unixtime: a UNIX timestamp, a float holding seconds since the UNIX epoch (midnight, 1 January 1970 UTC). This is typically the row creation time for entities associated with database rows.

Because TagSet subclasses cs.mappings.AttrableMappingMixin you can also access tag values as attributes provided that they do conflict with instance attributes or class methods or properties. The TagSet class defines the class attribute ATTRABLE_MAPPING_DEFAULT as None which causes attribute access to return None for missing tag names. This supports code like:

if tags.title:
    # use the title in something
else:
    # handle a missing title tag

Method TagSet.__init__(self, *a, **kw)

Initialise the TagSet.

Parameters:

  • positional parameters initialise the dict and are passed to dict.__init__
  • _id: optional identity value for databaselike implementations
  • _ontology: optional TagsOntology to use for this TagSet`
  • other alphabetic keyword parameters are also used to initialise the dict and are passed to dict.__init__

Method TagSet.__getattr__(self, attr)

Support access to dotted name attributes if attr is not found via the superclass __getattr__.

This is done by returning a subtags of those tags commencing with attr+'.'.

Example:

>>> tags=TagSet(a=1,b=2)
>>> tags.a
1
>>> tags.c
>>> tags['c.z']=9
>>> tags['c.x']=8
>>> tags
TagSet:{'a': 1, 'b': 2, 'c.z': 9, 'c.x': 8}
>>> tags.c
TagSet:{'z': 9, 'x': 8}
>>> tags.c.z
9

However, this is not supported when there is a tag named 'c' because tags.c has to return the 'c' tag value:

>>> tags=TagSet(a=1,b=2,c=3)
>>> tags.a
1
>>> tags.c
3
>>> tags['c.z']=9
>>> tags.c.z
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'z'

Method TagSet.__iter__(self, prefix=None, ontology=None)

Yield the tag data as Tags.

Method TagSet.__setattr__(self, attr, value)

Attribute based Tag access.

If attr is in self.__dict__ then that is updated, supporting "normal" attributes set on the instance. Otherwise the Tag named attr is set to value.

The __init__ methods of subclasses should do something like this (from TagSet.__init__) to set up the ordinary instance attributes which are not to be treated as Tags:

self.__dict__.update(id=_id, ontology=_ontology, modified=False)

Method TagSet.__str__(self)

The TagSet suitable for writing to a tag file.

Method TagSet.add(self, name, value=None, *a, **kw)

Set self[tag_name]=value. If verbose, emit an info message if this changes the previous value.

Method TagSet.as_dict(self)

Return a dict mapping tag name to value.

Method TagSet.as_tags(self, prefix=None, ontology=None)

Yield the tag data as Tags.

Property TagSet.csvrow

This TagSet as a list useful to a csv.writer. The inverse of from_csvrow.

Method TagSet.discard(self, name, value=None, *a, **kw)

Discard the tag matching (tag_name,value). Return a Tag with the old value, or None if there was no matching tag.

Note that if the tag value is None then the tag is unconditionally discarded. Otherwise the tag is only discarded if its value matches.

Method TagSet.edit(self, editor=None, verbose=None)

Edit this TagSet.

Method TagSet.edit_many(*a, **kw)

Edit an iterable of TagSets. Return a list of (old_name,new_name,TagSet) for those which were modified.

This function supports modifying both name and Tags.

Method TagSet.format_kwargs(self, *a, **kw)

Return a TagSetNamespace for this TagSet.

This has many convenience facilities for use in format strings.

Method TagSet.from_csvrow(csvrow)

Construct a TagSet from a CSV row like that from TagSet.csvrow, being unixtime,id,name,tags....

Method TagSet.from_line(line, offset=0, *, ontology=None, verbose=None)

Create a new TagSet from a line of text.

Property TagSet.name

Read only name property, None if there is no 'name' tag.

Method TagSet.ns(self, *a, **kw)

Return a TagSetNamespace for this TagSet.

This has many convenience facilities for use in format strings.

Method TagSet.set(self, name, value=None, *a, **kw)

Set self[tag_name]=value. If verbose, emit an info message if this changes the previous value.

Method TagSet.set_from(self, other, verbose=None)

Completely replace the values in self with the values from other, a TagSet or any other name=>value dict.

This has the feature of logging changes by calling .set and .discard to effect the changes.

Method TagSet.subtags(self, prefix)

Return a new TagSet containing tags commencing with prefix+'.' with the key prefixes stripped off.

Example:

>>> tags = TagSet({'a.b':1, 'a.d':2, 'c.e':3})
>>> tags.subtags('a')
TagSet:{'b': 1, 'd': 2}

Method TagSet.tag(self, tag_name, prefix=None, ontology=None)

Return a Tag for tag_name, or None if missing.

Property TagSet.unixtime

unixtime property, autosets to time.time() if accessed.

Method TagSet.update(self, other, *, prefix=None, verbose=None)

Update this TagSet from other, a dict of {name:value} or an iterable of Taglike or (name,value) things.

Class TagSetCriterion

A testable criterion for a TagSet.

TagSetCriterion.TAG_BASED_TEST_CLASS

SKIP DOC: A test based on a Tag.

Attributes:

  • spec: the source text from which this choice was parsed, possibly None
  • choice: the apply/reject flag
  • tag: the Tag representing the criterion
  • comparison: an indication of the test comparison

The following comparison values are recognised:

  • None: test for the presence of the Tag
  • '=': test that the tag value equals tag.value
  • '<': test that the tag value is less than tag.value
  • '<=': test that the tag value is less than or equal to tag.value
  • '>': test that the tag value is greater than tag.value
  • '>=': test that the tag value is greater than or equal to tag.value
  • '~/': test if the tag value as a regexp is present in tag.value
  • '~': test if a matching tag value is present in tag.value

Method TagSetCriterion.from_any(*a, **kw)

Convert some suitable object o into a TagSetCriterion.

Various possibilities for o are:

  • TagSetCriterion: returned unchanged
  • str: a string tests for the presence of a tag with that name and optional value;
  • an object with a .choice attribute; this is taken to be a TagSetCriterion ducktype and returned unchanged
  • an object with .name and .value attributes; this is taken to be Tag-like and a positive test is constructed
  • Tag: an object with a .name and .value is equivalent to a positive equality TagBasedTest
  • (name,value): a 2 element sequence is equivalent to a positive equality TagBasedTest

Method TagSetCriterion.from_str(*a, **kw)

Prepare a TagSetCriterion from the string s.

Method TagSetCriterion.from_str2(s, offset=0, delim=None)

Parse a criterion from s at offset and return (TagSetCriterion,offset).

This method recognises an optional leading '!' or '-' indicating negation of the test, followed by a criterion recognised by the .parse method of one of the classes in cls.CRITERION_PARSE_CLASSES.

Method TagSetCriterion.match_tagged_entity(self, te: 'TagSet') -> bool

Apply this TagSetCriterion to a TagSet.

Class TagSetNamespace(ExtendedNamespace,types.SimpleNamespace)

A formattable nested namespace for a TagSet, subclassing ExtendedNamespace, providing attribute based access to tag data.

TagSets have a .ns() method which returns a TagSetNamespace derived from that TagSet.

This class exists particularly to help with format strings because tools like fstags and sqltags use these for their output formats. As such, I wanted to be able to put some expressive stuff in the format strings.

However, this also gets you attribute style access to various related values without mucking with format strings. For example for some TagSet tags with a colour=blue Tag, if I set ns=tags.ns():

  • ns.colour is itself a namespace based on the colour Tag`
  • ns.colour_s is the string 'blue'
  • ns.colour._tag is the colour Tag itself If the TagSet had an ontology:
  • ns.colour._meta is a namespace based on the metadata for the colour Tag

This provides an assortment of special names derived from the TagSet. See the docstring for __getattr__ for the special attributes provided beyond those already provided by ExtendedNamespace.__getattr__.

Example with a simple TagSet:

>>> tags = TagSet(colour='blue', labels=['a','b','c'], size=9)
>>> 'The colour is {colour}.'.format_map(tags)
'The colour is blue.'
>>> # the natural way to obtain a TagSetNamespace from a TagSet
>>> ns = tags.ns()  # returns TagSetNamespace.from_tagset(tags)
>>> # the ns object has additional computed attributes
>>> 'The colour tag is {colour._tag}.'.format_map(ns)
'The colour tag is colour=blue.'
>>> # also, the direct name for any Tag can be used
>>> # which returns its value
>>> 'The colour is {colour}.'.format_map(ns)
'The colour is blue.'
>>> 'The colours are {colours}. The labels are {labels}.'.format_map(ns)
"The colours are ['blue']. The labels are ['a', 'b', 'c']."
>>> 'The first label is {label}.'.format_map(ns)
'The first label is a.'

The same TagSet with an ontology:

>>> ont = TagsOntology({
...   'type.colour': TagSet(description="a colour, a hue", type="str"),
...   'meta.colour.blue': TagSet(
...     url='https://en.wikipedia.org/wiki/Blue',
...     wavelengths='450nm-495nm'),
... })
>>> tags = TagSet(colour='blue', labels=['a','b','c'], size=9, _ontology=ont)
>>> # the colour Tag
>>> tags.tag('colour')  # doctest: +ELLIPSIS
Tag(name='colour',value='blue',ontology=TagsOntology<...>)
>>> # type information about a colour
>>> tags.tag('colour').type
'str'
>>> tags.tag('colour').typedata
TagSet:{'description': 'a colour, a hue', 'type': 'str'}
>>> # metadata about this particular colour value
>>> tags.tag('colour').meta
TagSet:{'url': 'https://en.wikipedia.org/wiki/Blue', 'wavelengths': '450nm-495nm'}

Using a namespace view of the Tag, useful for format strings:

>>> # the TagSet as a namespace for use in format strings
>>> ns = tags.ns()
>>> # The namespace .colour node, which has the Tag attached.
>>> # When there is a Tag attached, the repr is that of the Tag value.
>>> ns.colour         # doctest: +ELLIPSIS
'blue'
>>> # The underlying colour Tag itself.
>>> ns.colour._tag    # doctest: +ELLIPSIS
Tag(name='colour',value='blue',ontology=TagsOntology<...>)
>>> # The str() of a namespace with a ._tag is the Tag value
>>> # making for easy use in a format string.
>>> f'{ns.colour}'
'blue'
>>> # the type information about the colour Tag
>>> ns.colour._tag.typedata
TagSet:{'description': 'a colour, a hue', 'type': 'str'}
>>> # The metadata: a TagSetNamespace for the metadata TagSet
>>> ns.colour._meta   # doctest: +ELLIPSIS
TagSetNamespace(_path='.', _pathnames=(), _ontology=None, wavelengths='450nm-495nm', url='https://en.wikipedia.org/wiki/Blue')
>>> # the _meta.url is itself a namespace with a ._tag for the URL
>>> ns.colour._meta.url   # doctest: +ELLIPSIS
'https://en.wikipedia.org/wiki/Blue'
>>> # but it formats nicely because it has a ._tag
>>> f'colour={ns.colour}, info URL={ns.colour._meta.url}'
'colour=blue, info URL=https://en.wikipedia.org/wiki/Blue'

Method TagSetNamespace.__bool__(self)

Truthiness: True unless the ._bool attribute overrides that.

Method TagSetNamespace.__format__(self, *a, **kw)

Format this node. If there's a Tag on the node, format its value. Otherwise use the superclass format.

Method TagSetNamespace.__getattr__(self, *a, **kw)

Look up an indirect node attribute, whose value is inferred from another.

The following attribute names and forms are supported:

  • _keys: the keys of the value for the Tag associated with this node; meaningful if self._tag.value has a keys method
  • _meta: a namespace containing the meta information for the Tag associated with this node: self._tag.meta.ns()
  • _type: a namespace containing the type definition for the Tag associated with this node: self._tag.typedata.ns()
  • _values: the values within the Tag.value for the Tag associated with this node
  • baseattr_lc: lowercase and titled forms. If baseattr exists, return its value lowercased via cs.lex.lc_(). Conversely, if baseattr is required and does not directly exist but its baseattr_lc form does, return the value of baseattr_lc titlelified using cs.lex.titleify_lc().
  • baseattrs, baseattres: singular/plural. If baseattr exists return [self.baseattr]. Conversely, if baseattr does not exist but one of its plural attributes does, return the first element from the plural attribute.
  • [:alpha:]*: an identifierish name binds to a stub subnamespace so the {a.b.c.d} in a format string can be replaced with itself to present the undefined name in full.

Method TagSetNamespace.__getitem__(self, *a, **kw)

If this node has a ._tag then dereference its .value, otherwise fall through to the superclass __getitem__.

Method TagSetNamespace.__str__(self)

A TagSetNamespace with a ._tag renders str(_tag.value), otherwise ExtendedNamespace.__str__ is used.

Method TagSetNamespace.from_tagset(*a, **kw)

Compute and return a presentation of this TagSet as a nested TagSetNamespace.

Note that multiple dots in Tag names are collapsed; for example Tags named 'a.b', 'a..b', 'a.b.' and '..a.b' will all map to the namespace entry a.b.

Tags are processed in reverse lexical order by name, which dictates which of the conflicting multidot names takes effect in the namespace - the first found is used.

Property TagSetNamespace.key

The key.

Property TagSetNamespace.ontology

The reference ontology.

Property TagSetNamespace.value

The value.

Class TagSets(cs.resources.MultiOpenMixin)

Base class for collections of TagSet instances such as cs.fstags.FSTags and cs.sqltags.SQLTags.

Examples of this include:

  • cs.fstags.FSTags: a mapping of filesystem paths to their associated TagSets
  • cs.sqltags.SQLTags: a mapping of names to TagSets stored in an SQL database

Subclasses must implement:

  • default_factory(self,name,**kw): as with defaultdict this is called as from __missing__ for missing names, and also from add. If set to None then __getitem__ will raise KeyError for missing names. Unlike defaultdict, the factory is called with the key name and any additional keyword parameters.
  • get(name,default=None): return the TagSet associated with name, or default.
  • __setitem__(name,tagset): associate a TagSetwith the key name; this is called by the __missing__ method with a newly created TagSet.

Subclasses may reasonably want to define the following:

  • startup(self): allocate any needed resources such as database connections
  • shutdown(self): write pending changes to a backing store, release resources acquired during startup
  • keys(self): return an iterable of names
  • __len__(self): return the number of names

Method TagSets.__init__(self, *, ontology=None)

Initialise the collection.

TagSets.TagSetClass

SKIP DOC: A setlike class associating a set of tag names with values.

This actually subclasses dict, so a TagSet is a direct mapping of tag names to values. It accepts attribute access to simple tag values when they do not conflict with the class methods; the reliable method is normal item access.

NOTE: iteration yields Tags, not dict keys.

Also note that all the Tags from TagSet share its ontology.

Subclasses should override the set and discard methods; the dict and mapping methods are defined in terms of these two basic operations.

TagSets have a few special properties:

  • id: a domain specific identifier; this may reasonably be None for entities not associated with database rows; the cs.sqltags.SQLTags class associates this with the database row id.
  • name: the entity's name; a read only alias for the 'name' Tag. The cs.sqltags.SQLTags class defines "log entries" as TagSets with no name.
  • unixtime: a UNIX timestamp, a float holding seconds since the UNIX epoch (midnight, 1 January 1970 UTC). This is typically the row creation time for entities associated with database rows.

Because TagSet subclasses cs.mappings.AttrableMappingMixin you can also access tag values as attributes provided that they do conflict with instance attributes or class methods or properties. The TagSet class defines the class attribute ATTRABLE_MAPPING_DEFAULT as None which causes attribute access to return None for missing tag names. This supports code like:

if tags.title:
    # use the title in something
else:
    # handle a missing title tag

Method TagSets.__contains__(self, name: str)

Test whether name is present in self.te_mapping.

Method TagSets.__getitem__(self, name: str)

Obtain the TagSet associated with name.

If name is not presently mapped, return self.__missing__(name).

Method TagSets.__len__(self)

Return the length of self.te_mapping.

Method TagSets.__missing__(self, *a, **kw)

Like dict, the __missing__ method may autocreate a new TagSet.

This is called from __getitem__ if name is missing and uses the factory cls.default_factory. If that is None raise KeyError, otherwise call self.default_factory(name,**kw). If that returns None raise KeyError, otherwise save the entity under name and return the entity.

Method TagSets.__setitem__(self, name, te)

Save te in the backend under the key name.

Method TagSets.add(self, name: str, **kw)

Return a new TagSet associated with name, which should not already be in use.

Method TagSets.default_factory(self, name: str)

Create a new TagSet named name.

Method TagSets.get(self, name: str, default=None)

Return the TagSet associated with name, or default if there is no such entity.

Method TagSets.shutdown(self)

Write any pending changes to a backing store, release resources allocated during startup.

Method TagSets.startup(self)

Allocate any needed resources such as database connections.

Method TagSets.subdomain(self, subname)

Return a proxy for this TagSets for the names starting with subname+'.'.

Class TagSetsSubdomain(cs.obj.SingletonMixin,cs.mappings.PrefixedMappingProxy)

A view into a TagSets for keys commencing with a prefix.

Property TagSetsSubdomain.TAGGED_ENTITY_FACTORY

The entity factory comes from the parent collection.

Class TagsOntology(cs.obj.SingletonMixin,TagSets,cs.resources.MultiOpenMixin)

An ontology for tag names.

This is based around a mapping of names to ontological information expressed as a TagSet.

A cs.fstags.FSTags uses ontologies initialised from TagFiles containing ontology mappings.

There are two main categories of entries in an ontology:

  • types: an entry named type.{typename} contains a TagSet defining the type named typename
  • metadata: an entry named meta.{typename}.{value_key} contains a TagSet holding metadata for a value of type {typename}

Types:

The type of a Tag is nothing more than its name.

The basic types have their Python names: int, float, str, list, dict, date, datetime. You can define subtypes of these for your own purposes, for example:

type.colour type=str description="A hue."

which subclasses str.

Subtypes of list include a member_type specifying the type for members of a Tag value:

type.scene type=list member_type=str description="A movie scene."

Subtypes of dict include a key_type and a member_type specifying the type for keys and members of a Tag value:

type.cast type=dict key_type=actor member_type=role description="Cast members and their roles."
type.actor type=person description="An actor's stage name."
type.person type=str description="A person."
type.role type=character description="A character role in a performance."
type.character type=str description="A person in a story."

Metadata:

Metadata are Tags describing particular values of a type. For example, the metadata for the Tag colour=blue:

meta.colour.blue url="https://en.wikipedia.org/wiki/Blue" wavelengths="450nm-495nm"
meta.actor.scarlett_johansson
meta.character.marvel.black_widow type=character names=["Natasha Romanov"]

Accessing type data and metadata:

A TagSet may have a reference to a TagsOntology as .ontology and so also do any of its Tags.

Method TagsOntology.__bool__(self)

Support easy ontology or some_default tests, since ontologies are broadly optional.

Method TagsOntology.__setitem__(self, name, te)

Save te against the key name.

Method TagsOntology.basetype(self, typename)

Infer the base type name from a type name. The default type is 'str', but any type which resolves to one in self.BASE_TYPES may be returned.

Method TagsOntology.convert_tag(self, tag)

Convert a Tag's value accord to the ontology. Return a new Tag with the converted value or the original Tag unchanged.

This is primarily aimed at things like regexp based autotagging, where the matches are all strings but various fields have special types, commonly ints or dates.

Method TagsOntology.edit_indices(self, *a, **kw)

Edit the entries specified by indices. Return TagSets for the entries which were changed.

Method TagsOntology.get(self, name, default=None)

Proxy .get through to self.te_mapping.

Method TagsOntology.meta(self, type_name, value)

Return the metadata TagSet for (type_name,value).

Method TagsOntology.meta_index(type_name=None, value=None)

Return the entry index for the metadata for (type_name,value).

Method TagsOntology.meta_names(self, type_name=None)

Generator yielding defined metadata names.

If type_name is specified, yield only the value_names for that type_name.

For example, meta_names('character') on an ontology with a meta.character.marvel.black_widow would yield 'marvel.black_widow' i.e. only the suffix part for character metadata.

Method TagsOntology.type(self, type_name)

Return the TagSet defining the type named type_name.

Method TagsOntology.type_index(type_name)

Return the entry index for the type type_name.

Method TagsOntology.type_names(self)

Generator yielding defined type names.

Method TagsOntology.types(self)

Generator yielding defined type names and their defining TagSet.

Method TagsOntology.value_metadata(self, *a, **kw)

Return a ValueMetadata for type_name and value. This provides the mapping between a type's value and its semantics.

For example, if a TagSet had a list of characters such as:

characters=["Captain America (Marvel)","Black Widow (Marvel)"]

then these values could be converted to the dotted identifiers characters.marvel.captain_america and characters.marvel.black_widow respectively, ready for lookup in the ontology to obtain the "metadata" TagSet for each specific value.

Method TagsOntology.value_to_tag_name(*a, **kw)

Convert a tag value to a tagnamelike dotted identifierish string for use in ontology lookup. Returns None for unconvertable values.

Nonnegative ints are converted to str.

Strings are converted as follows:

  • a trailing (.*) is turned into a prefix with a dot, for example "Captain America (Marvel)" becomes "Marvel.Captain America".
  • the string is split into words (nonwhitespace), lowercased and joined with underscores, for example "Marvel.Captain America" becomes "marvel.captain_america".

Class TagsOntologyCommand(cs.cmdutils.BaseCommand)

A command line for working with ontology types.

Method TagsOntologyCommand.cmd_type(self, argv)

Usage: {cmd} With no arguments, list the defined types. {cmd} type_name With a type name, print its Tags. {cmd} type_name edit Edit the tags defining a type. {cmd} type_name edit meta_names_pattern... Edit the tags for the metadata names matching the meta_names_patterns. {cmd} type_name list Listt the metadata names for this type and their tags.

Class ValueMetadataNamespace(TagSetNamespace,ExtendedNamespace,types.SimpleNamespace)

A subclass of TagSetNamespace for a Tag's metadata.

The reference TagSet is the defining TagSet for the metadata of a particular Tag value as defined by a ValueMetadata (the return value of Tag.metadata).

Method ValueMetadataNamespace.__format__(self, *a, **kw)

Format this node. If there's a Tag on the node, format its value. Otherwise use the superclass format.

Method ValueMetadataNamespace.from_metadata(*a, **kw)

Construct a new ValueMetadataNamespace from meta (a ValueMetadata).

Release Log

Release 20210306:

  • ExtendedNamespace,TagSetNamespace: move the .[:alpha:]* attribute support from ExtendedNamespace to TagSetNamespace because it requires Tags.
  • TagSetNamespace.getattr: new _i, _s, _f suffixes to return int, str or float tag values (or None); fold _lc in with these.
  • Pull most of TaggedEntity out into TaggedEntityMixin for reuse by domain specific tagged entities.
  • TaggedEntity: new .set and .discard methods.
  • TaggedEntity: new as_editable_line, from_editable_line, edit and edit_entities methods to support editing entities using a text editor.
  • ontologies: type entries are now prefixed with "type." and metadata entries are prefixed with "meta."; provide a worked ontology example in the introduction and improve related docstrings.
  • TagsOntology: new .types(), .types_names(), .meta(type_name,value), .meta_names() methods.
  • TagsOntology.getitem: create missing TagSets on demand.
  • New TagsOntologyCommand, initially with a "type [type_name [{edit|list}]]" subcommand, ready for use as the cmd_ont subcommand of other tag related commands.
  • TagSet: support initialisation like a dict including keywords, and move the ontology parameter to _onotology.
  • TagSet: include AttrableMappingMixin to enable attribute access to values when there is no conflict with normal methods.
  • UUID encode/decode support.
  • Honour $TAGSET_EDITOR or $EDITOR as preferred interactive editor for tags.
  • New TagSet.subtags(prefix) to extract a subset of the tags.
  • TagsOntology.value_metadata: new optional convert parameter to override the default "convert human friendly name" algorithm, particularly to pass convert=str to things which are already the basic id.
  • Rename TaggedEntity to TagSet.
  • Rename TaggedEntities to TagSets.
  • TagSet: new csvrow and from_csvrow methods imported from obsolete TaggedEntityMixin class.
  • Move BaseTagFile from cs.fstags to TagFile in cs.tagset.
  • TagSet: support access to the tag "c.x" via attributes provided there is no "c" tag in the way.
  • TagSet.unixtime: implement the autoset-to-now semantics.
  • New as_timestamp(): convert date, datetime, int or float to a UNIX timestamp.
  • Assorted docstring updates and bugfixes.

Release 20200716:

  • Update for changed cs.obj.SingletonMixin API.
  • Pull in TaggedEntity from cs.sqltags and add the .csvrow property and the .from_csvrow factory.

Release 20200521.1: Fix DISTINFO.install_requires, drop debug import.

Release 20200521:

  • New ValueDetail and KeyValueDetail classes for returning ontology information; TagInfo.detail now returns a ValueDetail for scalar types, a list of ValueDetails for sequence types and a list of KeyValueDetails for mapping types; drop various TagInfo mapping/iterable style methods, too confusing to use.
  • Plumb ontology parameter throughout, always optional.
  • Drop TypedTag, Tags now use ontologies for this.
  • New TagsCommandMixin to support BaseCommands which manipulate Tags.
  • Many improvements and bugfixes.

Release 20200318:

  • Note that the TagsOntology stuff is in flux and totally alpha.
  • Tag.prefix_name factory returning a new tag if prefix is not empty, ptherwise self.
  • TagSet.update: accept an optional prefix for inserting "foreign" tags with a distinguishing name prefix.
  • Tag.as_json: turn sets and tuples into lists for encoding.
  • Backport for Python < 3.7 (no fromisoformat functions).
  • TagSet: drop unused and illplaced .titleify, .episode_title and .title methods.
  • TagSet: remove "defaults", unused.
  • Make TagSet a direct subclass of dict, adjust uses of .update etc.
  • New ExtendedNamespace class which is a SimpleNamespace with some inferred attributes and a partial mapping API (keys and getitem).
  • New TagSet.ns() returning the Tags as an ExtendedNamespace, which doubles as a mapping for str.format_map; TagSet.format_kwargs is now an alias for this.
  • New Tag.from_string factory to parse a str into a Tag.
  • New TagsOntology and TypedTag classes to provide type and value-detail information; very very alpha and subject to change.

Release 20200229.1: Initial release: pull TagSet, Tag, TagChoice from cs.fstags for independent use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cs.tagset-20210306.tar.gz (88.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page