Tags and sets of tags with __format__ support and optional ontology information.
Project description
Tags and sets of tags with format support and optional ontology information.
Latest release 20210913:
-
TagSet.get_value: raise KeyError in strict mode, leave placeholder otherwise.
-
Other small changes.
See
cs.fstagsfor support for applying these to filesystem objects such as directories and files.See
cs.sqltagsfor support for databases of entities with tags, not directly associated with filesystem objects. This is suited to both log entries (entities with no "name") and large collections of named entities; both acceptTags and can be searched on that basis.All of the available complexity is optional: you can use
Tags without bothering withTagSets orTagsOntologys.This module contains the following main classes:
Tag: an object with a.nameand optional.value(defaultNone) and also an optional reference.ontologyfor associating semantics with tag values. The.value, if notNone, will often be a string, but may be any Python object. If you're using these viacs.fstags, the object will need to be JSON transcribeable.TagSet: adictsubclass representing a set ofTags to associate with something; it also has setlike.addand.discardmethods. As such it only supports a singleTagfor a given tag name, but that tag value can of course be a sequence or mapping for more elaborate tag values.TagsOntology: a mapping of type names toTagSets defining the type and also to entries for the metadata for specific per-type values.
Here's a simple example with some
Tags and aTagSet.>>> tags = TagSet() >>> # add a "bare" Tag named 'blue' with no value >>> tags.add('blue') >>> # add a "topic=tagging" Tag >>> tags.set('topic', 'tagging') >>> # make a "subtopic" Tag and add it >>> subtopic = Tag('subtopic', 'ontologies') >>> tags.add(subtopic) >>> # Tags have nice repr() and str() >>> subtopic Tag(name='subtopic',value='ontologies',ontology=None) >>> print(subtopic) subtopic=ontologies >>> # a TagSet also has a nice repr() and str() >>> tags TagSet:{'blue': None, 'topic': 'tagging', 'subtopic': 'ontologies'} >>> print(tags) blue subtopic=ontologies topic=tagging >>> tags2 = TagSet({'a': 1}, b=3, c=[1,2,3], d='dee') >>> tags2 TagSet:{'a': 1, 'b': 3, 'c': [1, 2, 3], 'd': 'dee'} >>> print(tags2) a=1 b=3 c=[1,2,3] d=dee >>> # since you can print a TagSet to a file as a line of text >>> # you can get it back from a line of text >>> TagSet.from_line('a=1 b=3 c=[1,2,3] d=dee') TagSet:{'a': 1, 'b': 3, 'c': [1, 2, 3], 'd': 'dee'} >>> # because TagSets are dicts you can format strings with them >>> print('topic:{topic} subtopic:{subtopic}'.format_map(tags)) topic:tagging subtopic:ontologies >>> # TagSets have convenient membership tests >>> # test for blueness >>> 'blue' in tags True >>> # test for redness >>> 'red' in tags False >>> # test for any "subtopic" tag >>> 'subtopic' in tags True >>> # test for subtopic=ontologies >>> print(subtopic) subtopic=ontologies >>> subtopic in tags True >>> # test for subtopic=libraries >>> subtopic2 = Tag('subtopic', 'libraries') >>> subtopic2 in tags False
== Ontologies ==
Tags and TagSets suffice to apply simple annotations to things.
However, an ontology brings meaning to those annotations.
See the TagsOntology class for implementation details,
access methods and more examples.
Consider a record about a movie, with these tags (a TagSet):
title="Avengers Assemble"
series="Avengers (Marvel)"
cast={"Scarlett Johansson":"Black Widow (Marvel)"}
where we have the movie title, a name for the series in which it resides, and a cast as an association of actors with roles.
An ontology lets us associate implied types and metadata with these values.
Here's an example ontology supporting the above TagSet:
type.cast type=dict key_type=person member_type=character description="members of a production"
type.character description="an identified member of a story"
type.series type=str
character.marvel.black_widow type=character names=["Natasha Romanov"]
person.scarlett_johansson fullname="Scarlett Johansson" bio="Known for Black Widow in the Marvel stories."
The type information for a cast
is defined by the ontology entry named type.cast,
which tells us that a cast Tag is a dict,
whose keys are of type person
and whose values are of type character.
(The default type is str.)
To find out the underlying type for a character
we look that up in the ontology in turn;
because it does not have a specified type Tag, it it taken to be a str.
Having the types for a cast,
it is now possible to look up the metadata for the described cast members.
The key "Scarlett Johansson" is a person
(from the type definition of cast).
The ontology entry for her is named person.scarlett_johansson
which is computed as:
person: the type namescarlett_johansson: obtained by downcasing"Scarlett Johansson"and replacing whitespace with an underscore. The full conversion process is defined by theTagsOntology.value_to_tag_namefunction.
The key "Black Widow (Marvel)" is a character
(again, from the type definition of cast).
The ontology entry for her is named character.marvel.black_widow
which is computed as:
character: the type namemarvel.black_widow: obtained by downcasing"Black Widow (Marvel)", replacing whitespace with an underscore, and moving a bracketed suffix to the front as an unbracketed prefix. The full conversion process is defined by theTagsOntology.value_to_tag_namefunction.
== Format Strings ==
You can just use str.format_map as shown above
for the direct values in a TagSet,
since it subclasses dict.
However, TagSets also subclass cs.lex.FormatableMixin
and therefore have a richer format_as method which has an extended syntax
for the format component.
Command line tools like fstags use this for output format specifications.
An example:
>>> # an ontology specifying the type for a colour
>>> # and some information about the colour "blue"
>>> ont = TagsOntology(
... {
... 'type.colour':
... TagSet(description="a colour, a hue", type="str"),
... 'colour.blue':
... TagSet(
... url='https://en.wikipedia.org/wiki/Blue',
... wavelengths='450nm-495nm'
... ),
... }
... )
>>> # tag set with a "blue" tag, using the ontology above
>>> tags = TagSet(colour='blue', labels=['a', 'b', 'c'], size=9, _ontology=ont)
>>> tags.format_as('The colour is {colour}.')
'The colour is blue.'
>>> # format a string about the tags showing some metadata about the colour
>>> tags.format_as('Information about the colour may be found here: {colour:metadata.url}')
'Information about the colour may be found here: https://en.wikipedia.org/wiki/Blue'
Function as_unixtime(*a, **kw)
Convert a tag value to a UNIX timestamp.
This accepts int, float (already a timestamp)
and date or datetime
(use datetime.timestamp() for a nonnaive datetime, otherwise time.mktime(tag_value.time_tuple())`,
which assumes the local time zone).
Class BaseTagSets(cs.resources.MultiOpenMixin,collections.abc.MutableMapping,collections.abc.Mapping,collections.abc.Collection,collections.abc.Sized,collections.abc.Iterable,collections.abc.Container)
Base class for collections of TagSet instances
such as cs.fstags.FSTags and cs.sqltags.SQLTags.
Examples of this include:
cs.fstags.FSTags: a mapping of filesystem paths to their associatedTagSetcs.sqltags.SQLTags: a mapping of names toTagSets stored in an SQL database
Subclasses must implement:
get(name,default=None): return theTagSetassociated withname, ordefault.__setitem__(name,tagset): associate aTagSetwith the keyname; this is called by the__missing__method with a newly createdTagSet.keys(self): return an iterable of names
Subclasses may reasonably want to override the following:
startup_shutdown(self): context manager to allocate and release any needed resources such as database connections
Subclasses may implement:
__len__(self): return the number of names
The TagSet factory used to fetch or create a TagSet is
named TagSetClass. The default implementation honours two
class attributes:
TAGSETCLASS_DEFAULT: initiallyTagSetTAGSETCLASS_PREFIX_MAPPING: a mapping of type names toTagSetsubclasses
The type name of a TagSet name is the first dotted component.
For example, artist.nick_cave has the type name artist.
A subclass of BaseTagSets could utiliise an ArtistTagSet subclass of TagSet
and provide:
TAGSETCLASS_PREFIX_MAPPING = {
'artist': ArtistTagSet,
}
in its class definition. Accesses to artist.* entities would
result in ArtistTagSet instances and access to other enitities
would result in ordinary TagSet instances.
Method BaseTagSets.__init__(self, *, ontology=None)
Initialise the collection.
BaseTagSets.TAGSETCLASS_DEFAULT
Method BaseTagSets.TagSetClass(self, *a, **kw)
Factory to create a new TagSet from name.
Method BaseTagSets.__contains__(self, name: str)
Test whether name is present in the underlying mapping.
Method BaseTagSets.__getitem__(self, name: str)
Obtain the TagSet associated with name.
If name is not presently mapped,
return self.__missing__(name).
Method BaseTagSets.__iter__(self)
Iteration returns the keys.
Method BaseTagSets.__len__(self)
Return the length of the underlying mapping.
Method BaseTagSets.__missing__(self, *a, **kw)
Like dict, the __missing__ method may autocreate a new TagSet.
This is called from __getitem__ if name is missing
and uses the factory cls.default_factory.
If that is None raise KeyError,
otherwise call self.default_factory(name,**kw).
If that returns None raise KeyError,
otherwise save the entity under name and return the entity.
Method BaseTagSets.__setitem__(self, name, te)
Save te in the backend under the key name.
Method BaseTagSets.add(self, name: str, **kw)
Return a new TagSet associated with name,
which should not already be in use.
Method BaseTagSets.default_factory(self, name: str)
Create a new TagSet named name.
Method BaseTagSets.edit(self, *, select_tagset=None, **kw)
Edit the TagSets.
Parameters:
select_tagset: optional callable accepting aTagSetwhich tests whether it should be included in theTagSets to be edited Other keyword arguments are passed toTag.edit_many.
Method BaseTagSets.get(self, name: str, default=None)
Return the TagSet associated with name,
or default if there is no such entity.
Method BaseTagSets.items(self, *, prefix=None)
Generator yielding (key,value) pairs,
optionally constrained to keys starting with prefix+'.'.
Method BaseTagSets.keys(self, *, prefix=None)
Return the keys starting with prefix+'.'
or all keys if prefix is None.
Method BaseTagSets.subdomain(self, subname: str)
Return a proxy for this BaseTagSets for the names
starting with subname+'.'.
Method BaseTagSets.values(self, *, prefix=None)
Generator yielding the mapping values (TagSets),
optionally constrained to keys starting with prefix+'.'.
Class MappingTagSets(BaseTagSets,cs.resources.MultiOpenMixin,collections.abc.MutableMapping,collections.abc.Mapping,collections.abc.Collection,collections.abc.Sized,collections.abc.Iterable,collections.abc.Container)
A BaseTagSets subclass using an arbitrary mapping.
If no mapping is supplied, a dict is created for the purpose.
Example:
>>> tagsets = MappingTagSets()
>>> list(tagsets.keys())
[]
>>> tagsets.get('foo')
>>> tagsets['foo'] = TagSet(bah=1, zot=2)
>>> list(tagsets.keys())
['foo']
>>> tagsets.get('foo')
TagSet:{'bah': 1, 'zot': 2}
>>> list(tagsets.keys(prefix='foo'))
['foo']
>>> list(tagsets.keys(prefix='bah'))
[]
Method MappingTagSets.__delitem__(self, name)
Delete the TagSet named name.
Method MappingTagSets.__setitem__(self, name, te)
Save te in the backend under the key name.
Method MappingTagSets.keys(self, *, prefix: Optional[str] = None)
Return an iterable of the keys commencing with prefix
or all keys if prefix is None.
Class RegexpTagRule
A regular expression based Tag rule.
This applies a regular expression to a string
and returns inferred Tags.
Method RegexpTagRule.infer_tags(self, *a, **kw)
Apply the rule to the string s, return a list of Tags.
Function selftest(argv)
Run some ad hoc self tests.
Class Tag(Tag,builtins.tuple,cs.lex.FormatableMixin,cs.lex.FormatableFormatter,string.Formatter)
A Tag has a .name (str) and a .value
and an optional .ontology.
The name must be a dotted identifier.
Terminology:
- A "bare"
Taghas avalueofNone. - A "naive"
Taghas anontologyofNone.
The constructor for a Tag is unusual:
- both the
valueandontologyare optional, defaulting toNone - if
nameis astrthen we always construct a newTagwith the suppplied values - if
nameis not astrit should be aTaglike object to promote; it is an error if thevalueparameter is notNonein this case - an optional
prefixmay be supplied which is prepended tonamewith a dot ('.') if not empty
The promotion process is as follows:
- if
nameis aTagsubinstance then if the suppliedontologyis notNoneand is not the ontology associated withnamethen a newTagis made, otherwise the originalTagis returned unchanged - otherwise a new
Tagis made fromnameusing its.valueand overriding its.ontologyif theontologyparameter is notNone
Examples:
>>> ont = TagsOntology({'colour.blue': TagSet(wavelengths='450nm-495nm')})
>>> tag0 = Tag('colour', 'blue')
>>> tag0
Tag(name='colour',value='blue',ontology=None)
>>> tag = Tag(tag0)
>>> tag
Tag(name='colour',value='blue',ontology=None)
>>> tag is tag0
True
>>> tag = Tag(tag0, ontology=ont)
>>> tag # doctest: +ELLIPSIS
Tag(name='colour',value='blue',ontology=...)
>>> tag is tag0
False
>>> tag = Tag(tag0, prefix='surface')
>>> tag
Tag(name='surface.colour',value='blue',ontology=None)
>>> tag is tag0
False
Method Tag.__init__(self, *a, **kw)
Dummy __init__ to avoid FormatableMixin.__init__
because we subclass namedtuple which has no __init__.
Tag.__hash__
Method Tag.__str__(self)
Encode name and value.
Property Tag.basetype
The base type name for this tag.
Returns None if there is no ontology.
This calls self.onotology.basetype(self.name).
The basetype is the endpoint of a cascade down the defined types.
For example, this might tell us that a Tag role="Fred"
has a basetype "str"
by cascading through a hypothetical chain role->character->str:
type.role type=character
type.character type=str
Method Tag.from_str(s, offset=0, ontology=None)
Parse a Tag definition from s at offset (default 0).
Method Tag.from_str2(s, offset=0, *, ontology, extra_types=None)
Parse tag_name[=value], return (Tag,offset).
Method Tag.is_valid_name(name)
Test whether a tag name is valid: a dotted identifier.
Method Tag.key_metadata(self, *a, **kw)
Return the metadata definition for key.
The metadata TagSet is obtained from the ontology entry
type.key_tag_name
where type is the Tag's key_type
and key_tag_name is the key converted
into a dotted identifier by TagsOntology.value_to_tag_name.
Property Tag.key_type
The type name for members of this tag.
This is required if .value is a mapping.
Property Tag.key_typedef
The typedata definition for this Tag's keys.
This is for Tags which store mappings,
for example a movie cast, mapping actors to roles.
The name of the member type comes from
the key_type entry from self.typedata.
That name is then looked up in the ontology's types.
Method Tag.matches(self, name, value=None, *a, **kw)
Test whether this Tag matches (tag_name,value).
Method Tag.member_metadata(self, *a, **kw)
Return the metadata definition for self[member_key].
The metadata TagSet is obtained from the ontology entry
type.member_tag_name
where type is the Tag's member_type
and member_tag_name is the member value converted
into a dotted identifier by TagsOntology.value_to_tag_name.
Property Tag.member_type
The type name for members of this tag.
This is required if .value is a sequence or mapping.
Property Tag.member_typedef
The typedata definition for this Tag's members.
This is for Tags which store mappings or sequences,
for example a movie cast, mapping actors to roles,
or a list of scenes.
The name of the member type comes from
the member_type entry from self.typedata.
That name is then looked up in the ontology's types.
Property Tag.meta
Shortcut property for the metadata TagSet.
Method Tag.metadata(self, *, ontology=None, convert=None)
Fetch the metadata information about this specific tag value,
derived through the ontology from the tag name and value.
The default ontology is self.ontology.
For a scalar type (int, float, str) this is the ontology TagSet
for self.value.
For a sequence (list) this is a list of the metadata
for each member.
For a mapping (dict) this is mapping of key->metadata.
Method Tag.parse_name(s, offset=0)
Parse a tag name from s at offset: a dotted identifier.
Method Tag.parse_value(s, offset=0, extra_types=None)
Parse a value from s at offset (default 0).
Return the value, or None on no data.
The optional extra_types parameter may be an iterable of
(type,from_str,to_str) tuples where from_str is a
function which takes a string and returns a Python object
(expected to be an instance of type).
The default comes from cls.EXTRA_TYPES.
This supports storage of nonJSONable values in text form.
The core syntax for values is JSON;
value text commencing with any of '"', '[' or '{'
is treated as JSON and decoded directly,
leaving the offset at the end of the JSON parse.
Otherwise all the nonwhitespace at this point is collected
as the value text,
leaving the offset at the next whitespace character
or the end of the string.
The text so collected is then tried against the from_str
function of each extra_types;
the first successful parse is accepted as the value.
If no extra type match,
the text is tried against int() and float();
if one of these parses the text and str() of the result round trips
to the original text
then that value is used.
Otherwise the text itself is kept as the value.
Method Tag.transcribe_value(value, extra_types=None)
Transcribe value for use in Tag transcription.
The optional extra_types parameter may be an iterable of
(type,from_str,to_str) tuples where to_str is a
function which takes a string and returns a Python object
(expected to be an instance of type).
The default comes from cls.EXTRA_TYPES.
If value is an instance of type
then the to_str function is used to transcribe the value
as a str, which should not include any whitespace
(because of the implementation of parse_value).
If there is no matching to_str function,
cls.JSON_ENCODER.encode is used to transcribe value.
This supports storage of nonJSONable values in text form.
Property Tag.typedef
The defining TagSet for this tag's name.
This is how its type is defined,
and is obtained from:
self.ontology['type.'+self.name]
Basic Tags often do not need a type definition;
these are only needed for structured tag values
(example: a mapping of cast members)
or when a Tag name is an alias for another type
(example: a cast member name might be an actor
which in turn might be a person).
For example, a Tag colour=blue
gets its type information from the type.colour entry in an ontology;
that entry is just a TagSet with relevant information.
Function tag_or_tag_value(*da, **dkw)
A decorator for functions or methods which may be called as:
func(name, [value])
or as:
func(Tag, [None])
The optional decorator argument no_self (default False)
should be supplied for plain functions
as they have no leading self parameter to accomodate.
Example:
@tag_or_tag_value
def add(self, tag_name, value, *, verbose=None):
This defines a .add() method
which can be called with name and value
or with single Taglike object
(something with .name and .value attributes),
for example:
tags = TagSet()
....
tags.add('colour', 'blue')
....
tag = Tag('size', 9)
tags.add(tag)
Class TagBasedTest(TagBasedTest,builtins.tuple,TagSetCriterion)
A test based on a Tag.
Attributes:
spec: the source text from which this choice was parsed, possiblyNonechoice: the apply/reject flagtag: theTagrepresenting the criterioncomparison: an indication of the test comparison
The following comparison values are recognised:
None: test for the presence of theTag'=': test that the tag value equalstag.value'<': test that the tag value is less thantag.value'<=': test that the tag value is less than or equal totag.value'>': test that the tag value is greater thantag.value'>=': test that the tag value is greater than or equal totag.value'~/': test if the tag value as a regexp is present intag.value- '~': test if a matching tag value is present in
tag.value
Method TagBasedTest.by_tag_value(name, value=None, *a, **kw)
Return a TagBasedTest based on a Tag or tag_name,tag_value.
Method TagBasedTest.match_tagged_entity(self, te: 'TagSet') -> bool
Test against the Tags in tags.
Note: comparisons when self.tag.name is not in tags
always return False (possibly inverted by self.choice).
Method TagBasedTest.parse(s, offset=0, delim=None)
Parse tag_name[{<|<=|'='|'>='|>|'~'}value]
and return (dict,offset)
where the dict contains the following keys and values:
tag: aTagembodying the tag name and valuecomparison: an indication of the test comparison
Class TagFile(cs.obj.SingletonMixin,BaseTagSets,cs.resources.MultiOpenMixin,collections.abc.MutableMapping,collections.abc.Mapping,collections.abc.Collection,collections.abc.Sized,collections.abc.Iterable,collections.abc.Container)
A reference to a specific file containing tags.
This manages a mapping of name => TagSet,
itself a mapping of tag name => tag value.
Method TagFile.__setitem__(self, name, te)
Set item name to te.
Method TagFile.get(self, name, default=None)
Get from the tagsets.
Method TagFile.is_modified(self)
Test whether this TagSet has been modified.
Method TagFile.keys(self, *, prefix=None)
tagsets.keys
If the options prefix is supplied,
yield only those keys starting with prefix.
Method TagFile.load_tagsets(filepath, ontology, extra_types=None)
Load filepath and return (tagsets,unparsed).
The returned tagsets are a mapping of name=>tag_name=>value.
The returned unparsed is a list of (lineno,line)
for lines which failed the parse (excluding the trailing newline).
Property TagFile.names
The names from this FSTagsTagFile as a list.
Method TagFile.parse_tags_line(*a, **kw)
Parse a "name tags..." line as from a .fstags file,
return (name,TagSet).
Method TagFile.save(self, extra_types=None)
Save the tag map to the tag file if modified.
Method TagFile.save_tagsets(*a, **kw)
Save tagsets and unparsed to filepath.
This method will create the required intermediate directories if missing.
This method does not clear the .modified attribute of the TagSets
because it does not know it is saving to the Tagset's primary location.
Method TagFile.shutdown(self)
Save the tagsets if modified.
Method TagFile.startup(self)
No special startup.
Method TagFile.tags_line(name, tags, extra_types=None)
Transcribe a name and its tags for use as a .fstags file line.
Property TagFile.tagsets
The tag map from the tag file,
a mapping of name=>TagSet.
This is loaded on demand.
Method TagFile.update(self, name, tags, *, prefix=None, verbose=None)
Update the tags for name from the supplied tags
as for Tagset.update.
Class TagsCommandMixin
Utility methods for cs.cmdutils.BaseCommand classes working with tags.
Optional subclass attributes:
TAGSET_CRITERION_CLASS: aTagSetCriterionduck class, defaultTagSetCriterion. For example,cs.sqltagshas a subclass with an.extend_querymethod for computing an SQL JOIN used in searching for tagged entities.
Method TagsCommandMixin.parse_tag_choices(argv)
Parse argv as an iterable of [!]tag_name[=*tag_value] Tag`
additions/deletions.
Method TagsCommandMixin.parse_tagset_criteria(argv, tag_based_test_class=None)
Parse tag specifications from argv until an unparseable item is found.
Return (criteria,argv)
where criteria is a list of the parsed criteria
and argv is the remaining unparsed items.
Each item is parsed via
cls.parse_tagset_criterion(item,tag_based_test_class).
Method TagsCommandMixin.parse_tagset_criterion(arg, tag_based_test_class=None)
Parse arg as a tag specification
and return a tag_based_test_class instance
via its .from_str factory method.
Raises ValueError in a misparse.
The default tag_based_test_class
comes from cls.TAGSET_CRITERION_CLASS,
which itself defaults to class TagSetCriterion.
The default TagSetCriterion.from_str recognises:
-tag_name: a negative requirement for tag_name- tag_name[
=value]: a positive requirement for a tag_name with optional value.
Class TagSet(builtins.dict,cs.dateutils.UNIXTimeMixin,cs.lex.FormatableMixin,cs.lex.FormatableFormatter,string.Formatter,cs.mappings.AttrableMappingMixin)
A setlike class associating a set of tag names with values.
This actually subclasses dict, so a TagSet is a direct
mapping of tag names to values.
It accepts attribute access to simple tag values when they
do not conflict with the class methods;
the reliable method is normal item access.
NOTE: iteration yields Tags, not dict keys.
Also note that all the Tags from TagSet
share its ontology.
Subclasses should override the set and discard methods;
the dict and mapping methods
are defined in terms of these two basic operations.
TagSets have a few special properties:
id: a domain specific identifier; this may reasonably beNonefor entities not associated with database rows; thecs.sqltags.SQLTagsclass associates this with the database row id.name: the entity's name; a read only alias for the'name'Tag. Thecs.sqltags.SQLTagsclass defines "log entries" asTagSets with noname.unixtime: a UNIX timestamp, afloatholding seconds since the UNIX epoch (midnight, 1 January 1970 UTC). This is typically the row creation time for entities associated with database rows.
Because TagSet subclasses cs.mappings.AttrableMappingMixin
you can also access tag values as attributes
provided that they do not conflict with instance attributes
or class methods or properties.
The TagSet class defines the class attribute ATTRABLE_MAPPING_DEFAULT
as None which causes attribute access to return None
for missing tag names.
This supports code like:
if tags.title:
# use the title in something
else:
# handle a missing title tag
Method TagSet.__init__(self, *a, **kw)
Initialise the TagSet.
Parameters:
- positional parameters initialise the
dictand are passed todict.__init__ _id: optional identity value for databaselike implementations_ontology: optionalTagsOntology to use for thisTagSet`- other alphabetic keyword parameters are also used to initialise the
dictand are passed todict.__init__
Method TagSet.__contains__(self, tag)
Test for a tag being in this TagSet.
If the supplied tag is a str then this test
is for the presence of tag in the keys.
Otherwise,
for each tag T in the tagset
test T.matches(tag) and return True on success.
The default Tag.matches method compares the tag name
and if the same,
returns true if tag.value is None (basic "is the tag present" test)
and otherwise true if tag.value==T.value (basic "tag value equality" test).
Otherwise return False.
Method TagSet.__getattr__(self, attr)
Support access to dotted name attributes.
The following attribute access are supported:
If attr is a key, return self[attr].
If self.auto_infer(attr) does not raise ValueError,
return that value.
If this TagSet has an ontology
and attr looks like *typename*_*fieldname* and *typename* is a key, look up the metadata for the Tag` value
and return the metadata's fieldname key.
This also works for plural values.
For example if a TagSet has the tag artists=["fred","joe"]
and attr is artist_names
then the metadata entries for "fred" and "joe" looked up
and their artist_name tags are returned,
perhaps resulting in the list
["Fred Thing","Joe Thang"].
If there are keys commencing with attr+'.'
then this returns a view of those keys
so that a subsequent attribute access can access one of those keys.
Otherwise, a superclass attribute access is performed.
Example:
>>> tags=TagSet(a=1,b=2)
>>> tags.a
1
>>> tags.c
>>> tags['c.z']=9
>>> tags['c.x']=8
>>> tags
TagSet:{'a': 1, 'b': 2, 'c.z': 9, 'c.x': 8}
>>> tags.c
TagSetPrefixView:c.{'z': 9, 'x': 8}
>>> tags.c.z
9
However, this is not supported when there is a tag named 'c'
because tags.c has to return the 'c' tag value:
>>> tags=TagSet(a=1,b=2,c=3)
>>> tags.a
1
>>> tags.c
3
>>> tags['c.z']=9
>>> tags.c.z
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'z'
Method TagSet.__iter__(self, prefix=None, ontology=None)
Yield the tag data as Tags.
Method TagSet.__setattr__(self, attr, value)
Attribute based Tag access.
If attr is in self.__dict__ then that is updated,
supporting "normal" attributes set on the instance.
Otherwise the Tag named attr is set to value.
The __init__ methods of subclasses should do something like this
(from TagSet.__init__)
to set up the ordinary instance attributes
which are not to be treated as Tags:
self.__dict__.update(id=_id, ontology=_ontology, modified=False)
Method TagSet.__str__(self)
The TagSet suitable for writing to a tag file.
Method TagSet.add(self, name, value=None, *a, **kw)
Adding a Tag calls the class set() method.
Method TagSet.as_dict(self)
Return a dict mapping tag name to value.
Method TagSet.as_tags(self, prefix=None, ontology=None)
Yield the tag data as Tags.
Method TagSet.auto_infer(self, *a, **kw)
The default inference implementation.
This should return a value if attr is inferrable
and raise ValueError if not.
The default implementation returns the direct tag value for attr
if present.
Property TagSet.csvrow
This TagSet as a list useful to a csv.writer.
The inverse of from_csvrow.
Method TagSet.discard(self, name, value=None, *a, **kw)
Discard the tag matching (tag_name,value).
Return a Tag with the old value,
or None if there was no matching tag.
Note that if the tag value is None
then the tag is unconditionally discarded.
Otherwise the tag is only discarded
if its value matches.
Method TagSet.edit(self, editor=None, verbose=None)
Edit this TagSet.
Method TagSet.edit_many(*a, **kw)
Edit a collection of TagSets.
Return a list of (old_name,new_name,TagSet) for those which were modified.
This function supports modifying both name and Tags.
The Tags are updated directly.
The changed names are returning in the old_name,new_name above.
The collection tes may be either a mapping of name/key
to TagSet or an iterable of TagSets. If the latter, a
mapping is made based on te.name or te.id for each item
te in the iterable.
Method TagSet.from_csvrow(csvrow)
Construct a TagSet from a CSV row like that from
TagSet.csvrow, being unixtime,id,name,tags....
Method TagSet.from_line(line, offset=0, *, ontology=None, extra_types=None, verbose=None)
Create a new TagSet from a line of text.
Method TagSet.get_arg_name(self, field_name)
Leading dotted identifiers represent tags or tag prefixes.
Property TagSet.name
Read only name property, None if there is no 'name' tag.
Method TagSet.set(self, name, value=None, *a, **kw)
Set self[tag_name]=value.
If verbose, emit an info message if this changes the previous value.
Method TagSet.set_from(self, other, verbose=None)
Completely replace the values in self
with the values from other,
a TagSet or any other name=>value dict.
This has the feature of logging changes
by calling .set and .discard to effect the changes.
Method TagSet.subtags(self, prefix, as_tagset=False)
Return TagSetPrefixView of the tags commencing with prefix+'.'
with the key prefixes stripped off.
If as_tagset is true (default False)
return a new standalone TagSet containing the prefixed keys.
Example:
>>> tags = TagSet({'a.b':1, 'a.d':2, 'c.e':3})
>>> tags.subtags('a')
TagSetPrefixView:a.{'b': 1, 'd': 2}
>>> tags.subtags('a', as_tagset=True)
TagSet:{'b': 1, 'd': 2}
Method TagSet.tag(self, tag_name, prefix=None, ontology=None)
Return a Tag for tag_name, or None if missing.
Method TagSet.tag_metadata(self, tag_name, prefix=None, ontology=None, convert=None)
Return a list of the metadata for the Tag named tag_name,
or an empty list if the Tag is missing.
Property TagSet.unixtime
unixtime property, autosets to time.time() if accessed.
Method TagSet.update(self, other=None, *, prefix=None, verbose=None, **kw)
Update this TagSet from other,
a dict of {name:value}
or an iterable of Taglike or (name,value) things.
Class TagSetCriterion
A testable criterion for a TagSet.
TagSetCriterion.TAG_BASED_TEST_CLASS
Method TagSetCriterion.from_any(*a, **kw)
Convert some suitable object o into a TagSetCriterion.
Various possibilities for o are:
TagSetCriterion: returned unchangedstr: a string tests for the presence of a tag with that name and optional value;- an object with a
.choiceattribute; this is taken to be aTagSetCriterionducktype and returned unchanged - an object with
.nameand.valueattributes; this is taken to beTag-like and a positive test is constructed Tag: an object with a.nameand.valueis equivalent to a positive equalityTagBasedTest(name,value): a 2 element sequence is equivalent to a positive equalityTagBasedTest
Method TagSetCriterion.from_str(*a, **kw)
Prepare a TagSetCriterion from the string s.
Method TagSetCriterion.from_str2(s, offset=0, delim=None)
Parse a criterion from s at offset and return (TagSetCriterion,offset).
This method recognises an optional leading '!' or '-'
indicating negation of the test,
followed by a criterion recognised by the .parse method
of one of the classes in cls.CRITERION_PARSE_CLASSES.
Method TagSetCriterion.match_tagged_entity(self, te: 'TagSet') -> bool
Apply this TagSetCriterion to a TagSet.
Class TagSetPrefixView(cs.lex.FormatableMixin,cs.lex.FormatableFormatter,string.Formatter)
A view of a TagSet via a prefix.
Access to a key k accesses the TagSet
with the key prefix+'.'+k.
This is a kind of funny hybrid of a Tag and a TagSet
in that some things such as __format__
will format the Tag named prefix if it exists
in preference to the subtags.
Example:
>>> tags = TagSet(a=1, b=2)
>>> tags
TagSet:{'a': 1, 'b': 2}
>>> tags['sub.x'] = 3
>>> tags['sub.y'] = 4
>>> tags
TagSet:{'a': 1, 'b': 2, 'sub.x': 3, 'sub.y': 4}
>>> sub = tags.sub
>>> sub
TagSetPrefixView:sub.{'x': 3, 'y': 4}
>>> sub.z = 5
>>> sub
TagSetPrefixView:sub.{'x': 3, 'y': 4, 'z': 5}
>>> tags
TagSet:{'a': 1, 'b': 2, 'sub.x': 3, 'sub.y': 4, 'sub.z': 5}
Method TagSetPrefixView.__getattr__(self, attr)
Proxy other attributes through to the TagSet.
Method TagSetPrefixView.__setattr__(self, attr, value)
Attribute based Tag access.
If attr is in self.__dict__ then that is updated,
supporting "normal" attributes set on the instance.
Otherwise the Tag named attr is set to value.
The __init__ methods of subclasses should do something like this
(from TagSet.__init__)
to set up the ordinary instance attributes
which are not to be treated as Tags:
self.__dict__.update(id=_id, ontology=_ontology, modified=False)
Method TagSetPrefixView.get_format_attribute(self, attr)
Fetch a formatting attribute from the proxied object.
Method TagSetPrefixView.items(self)
Return an iterable of the items (Tag name, Tag).
Method TagSetPrefixView.keys(self)
The keys of the subtags.
Property TagSetPrefixView.ontology
The ontology of the references TagSet.
Method TagSetPrefixView.subtags(self, subprefix)
Return a deeper view of the TagSet.
Property TagSetPrefixView.tag
The Tag for the prefix, or None if there is no such Tag.
Property TagSetPrefixView.value
Return the Tag value for the prefix, or None if there is no such Tag.
Method TagSetPrefixView.values(self)
Return an iterable of the values (Tags).
Class TagSetsSubdomain(cs.obj.SingletonMixin,cs.mappings.PrefixedMappingProxy,cs.mappings.RemappedMappingProxy)
A view into a BaseTagSets for keys commencing with a prefix.
Property TagSetsSubdomain.TAGGED_ENTITY_FACTORY
The entity factory comes from the parent collection.
Class TagsOntology(cs.obj.SingletonMixin,BaseTagSets,cs.resources.MultiOpenMixin,collections.abc.MutableMapping,collections.abc.Mapping,collections.abc.Collection,collections.abc.Sized,collections.abc.Iterable,collections.abc.Container)
An ontology for tag names.
This is based around a mapping of names
to ontological information expressed as a TagSet.
Normally an object's tags are not a self contained repository of all the information; instead a tag just names some information.
As a example, consider the tag colour=blue.
Meta information about blue is obtained via the ontology,
which has an entry for the colour blue.
We adopt the convention that the type is just the tag name,
so we obtain the metadata by calling ontology.metadata(tag)
or alternatively ontology.metadata(tag.name,tag.value)
being the type name and value respectively.
The ontology itself is based around TagSets and effectively the call
ontology.metadata('colour','blue')
would look up the TagSet named colour.blue in the underlying Tagsets.
For a self contained dataset this means that it can be its own ontology.
For tags associated with arbitrary objects
such as the filesystem tags maintained by cs.fstags
the ontology would be a separate tags collection stored in a central place.
There are two main categories of entries in an ontology:
- metadata: other entries named typename
.value_key contains aTagSetholding metadata for a value of type typename whose value is mapped to value_key - types: an optional entry named
type.typename contains aTagSetdescribing the type named typename; really this is just more metadata where the "type name" istype
Metadata are TagSets instances describing particular values of a type.
For example, some metadata for the Tag colour="blue":
colour.blue url="https://en.wikipedia.org/wiki/Blue" wavelengths="450nm-495nm"
Some metadata associated with the Tag actor="Scarlett Johansson":
actor.scarlett_johansson role=["Black Widow (Marvel)"]
character.marvel.black_widow fullname=["Natasha Romanov"]
The tag values are lists above because an actor might play many roles, etc.
There's a convention for converting human descriptions
such as the role string "Black Widow (Marvel)" to its metadata.
- the value
"Black Widow (Marvel)"if converted to a key by the ontology methodvalue_to_tag_name; it moves a bracket suffix such as(Marvel)to the front as a prefixmarvel.and downcases the rest of the string and turns spaces into underscores. This yields the value keymarvel.black_widow. - the type is
role, so the ontology entry for the metadata isrole.marvel.black_widow
this requires type information about a role.
Here are some type definitions supporting the above metadata:
type.person type=str description="A person."
type.actor type=person description="An actor's stage name."
type.character type=str description="A person in a story."
type.role type_name=character description="A character role in a performance."
type.cast type=dict key_type=actor member_type=role description="Cast members and their roles."
The basic types have their Python names: int, float, str, list,
dict, date, datetime.
You can define subtypes of these for your own purposes
as illustrated above.
For example:
type.colour type=str description="A hue."
which subclasses str.
Subtypes of list include a member_type
specifying the type for members of a Tag value:
type.scene type=list member_type=str description="A movie scene."
Subtypes of dict include a key_type and a member_type
specifying the type for keys and members of a Tag value:
Accessing type data and metadata:
A TagSet may have a reference to a TagsOntology as .ontology
and so also do any of its Tags.
Method TagsOntology.__bool__(self)
Support easy ontology or some_default tests,
since ontologies are broadly optional.
Method TagsOntology.__delitem__(self, name)
Delete the entity named name.
Method TagsOntology.__setitem__(self, name, tags)
Apply tags to the entity named name.
Method TagsOntology.add_tagsets(self, *a, **kw)
Insert a _TagsOntology_SubTagSets at index
in the list of _TagsOntology_SubTagSetses.
The new _TagsOntology_SubTagSets instance is initialised
from the supplied tagsets, match, unmatch parameters.
Method TagsOntology.as_dict(self)
Return a dict containing a mapping of entry names to their TagSets.
Method TagsOntology.basetype(self, typename)
Infer the base type name from a type name.
The default type is 'str',
but any type which resolves to one in self.BASE_TYPES
may be returned.
Method TagsOntology.by_type(self, type_name, with_tagsets=False)
Yield keys or (key,tagset) of type type_name
i.e. all keys commencing with type_name..
Method TagsOntology.convert_tag(self, tag)
Convert a Tag's value accord to the ontology.
Return a new Tag with the converted value
or the original Tag unchanged.
This is primarily aimed at things like regexp based autotagging,
where the matches are all strings
but various fields have special types,
commonly ints or dates.
Method TagsOntology.edit_indices(self, *a, **kw)
Edit the entries specified by indices.
Return TagSets for the entries which were changed.
Method TagsOntology.from_match(*a, **kw)
Initialise a SubTagSets from tagsets, match and optional unmatch.
Parameters:
tagsets: aTagSetsholding ontology informationmatch: a match function used to choose entries based on a type nameunmatch: an optional reverse formatch, accepting a subtype name and returning its public name
If match is None
then tagsets will always be chosen if no prior entry matched.
Otherwise, match is resolved to a function match-func(type_name)
which returns a subtype name on a match and a false value on no match.
If match is a callable it is used as match_func directly.
if match is a list, tuple or set
then this method calls itself with (tagsets,submatch)
for each member submatch if match.
If match is a str,
if it ends in a dot '.', dash '-' or underscore '_'
then it is considered a prefix of type_name and the returned
subtype name is the text from type_name after the prefix
othwerwise it is considered a full match for the type_name
and the returns subtype name is type_name unchanged.
The match string is a simplistic shell style glob
supporting * but not ? or [seq].
The value of unmatch is constrained by match.
If match is None, unmatch must also be None;
the type name is used unchanged.
If match is callable, unmatchmust also be callable; it is expected to reversematch`.
Examples:
>>> from cs.sqltags import SQLTags
>>> from os.path import expanduser as u
>>> # an initial empty ontology with a default in memory mapping
>>> ont = TagsOntology()
>>> # divert the types actor, role and series to my media ontology
>>> ont.add_tagsets(
... SQLTags(u('~/var/media-ontology.sqlite')),
... ['actor', 'role', 'series'])
>>> # divert type "musicbrainz.recording" to mbdb.sqlite
>>> # mapping to the type "recording"
>>> ont.add_tagsets(SQLTags(u('~/.cache/mbdb.sqlite')), 'musicbrainz.')
>>> # divert type "tvdb.actor" to tvdb.sqlite
>>> # mapping to the type "actor"
>>> ont.add_tagsets(SQLTags(u('~/.cache/tvdb.sqlite')), 'tvdb.')
Method TagsOntology.get(self, name, default=None)
Fetch the entity named name or default.
Method TagsOntology.items(self)
Yield (entity_name,tags) for all the items in each subtagsets.
Method TagsOntology.keys(self)
Yield entity names for all the entities.
Method TagsOntology.metadata(self, name, value=None, *a, **kw)
Return the metadata TagSet for type_name and value.
This implements the mapping between a type's value and its semantics.
The optional parameter convert
may specify a function to use to convert value to a tag name component
to be used in place of self.value_to_tag_name (the default).
For example, if a TagSet had a list of characters such as:
character=["Captain America (Marvel)","Black Widow (Marvel)"]
then these values could be converted to the dotted identifiers
character.marvel.captain_america
and character.marvel.black_widow respectively,
ready for lookup in the ontology
to obtain the "metadata" TagSet for each specific value.
Method TagsOntology.startup_shutdown(self)
Open all the subTagSets and close on exit.
Method TagsOntology.subtype_name(self, type_name)
Return the type name for use within self.tagsets from type_name.
Returns None if this is not a supported type_name.
Method TagsOntology.type_name(self, subtype_name)
Return the external type name from the internal subtype_name
which is used within self.tagsets.
Method TagsOntology.type_names(self)
Return defined type names i.e. all entries starting type..
Method TagsOntology.typedef(self, type_name)
Return the TagSet defining the type named type_name.
Method TagsOntology.types(self)
Generator yielding defined type names and their defining TagSet.
Method TagsOntology.value_to_tag_name(*a, **kw)
Convert a tag value to a tagnamelike dotted identifierish string
for use in ontology lookup.
Raises ValueError for unconvertable values.
We are allowing dashes in the result (UUIDs, MusicBrainz discids, etc).
ints are converted to str.
Strings are converted as follows:
- a trailing
(.*)is turned into a prefix with a dot, for example"Captain America (Marvel)"becomes"Marvel.Captain America". - the string is split into words (nonwhitespace),
lowercased and joined with underscores,
for example
"Marvel.Captain America"becomes"marvel.captain_america".
Class TagsOntologyCommand(cs.cmdutils.BaseCommand)
A command line for working with ontology types.
Command line usage:
Usage: tagsontology subcommand [...]
Subcommands:
edit [{/name-regexp | entity-name}]
Edit entities.
With no arguments, edit all the entities.
With an argument starting with a slash, edit the entities
whose names match the regexp.
Otherwise the argument is expected to be an entity name;
edit the tags of that entity.
help [subcommand-names...]
Print the help for the named subcommands,
or for all subcommands if no names are specified.
meta tag=value
type
With no arguments, list the defined types.
type type_name
With a type name, print its `Tag`s.
type type_name edit
Edit the tags defining a type.
type type_name edit meta_names_pattern...
Edit the tags for the metadata names matching the
meta_names_patterns.
type type_name list
type type_name ls
List the metadata names for this type and their tags.
type type_name + entity_name [tags...]
Create type_name.entity_name and apply the tags.
Method TagsOntologyCommand.cmd_edit(self, argv)
Usage: {cmd} [{{/name-regexp | entity-name}}] Edit entities. With no arguments, edit all the entities. With an argument starting with a slash, edit the entities whose names match the regexp. Otherwise the argument is expected to be an entity name; edit the tags of that entity.
Method TagsOntologyCommand.cmd_meta(self, argv)
Usage: {cmd} tag=value
Method TagsOntologyCommand.cmd_type(self, argv)
Usage:
{cmd}
With no arguments, list the defined types.
{cmd} type_name
With a type name, print its Tags.
{cmd} type_name edit
Edit the tags defining a type.
{cmd} type_name edit meta_names_pattern...
Edit the tags for the metadata names matching the
meta_names_patterns.
{cmd} type_name list
{cmd} type_name ls
List the metadata names for this type and their tags.
{cmd} type_name + entity_name [tags...]
Create type_name.entity_name and apply the tags.
Release Log
Release 20210913:
- TagSet.get_value: raise KeyError in strict mode, leave placeholder otherwise.
- Other small changes.
Release 20210906: Many many updates; some semantics have changed.
Release 20210428: Bugfix TagSet.set: internal in place changes to a complex tag value were not noticed, causing TagFile to not update on shutdown.
Release 20210420:
- TagSet: also subclass cs.dateutils.UNIXTimeMixin.
- Various TagSetNamespace updates and bugfixes.
Release 20210404: Bugfix TagBasedTest.COMPARISON_FUNCS["="]: if cmp_value is None, return true (the tag is present).
Release 20210306:
- ExtendedNamespace,TagSetNamespace: move the .[:alpha:]* attribute support from ExtendedNamespace to TagSetNamespace because it requires Tags.
- TagSetNamespace.getattr: new _i, _s, _f suffixes to return int, str or float tag values (or None); fold _lc in with these.
- Pull most of
TaggedEntityout intoTaggedEntityMixinfor reuse by domain specific tagged entities. - TaggedEntity: new .set and .discard methods.
- TaggedEntity: new as_editable_line, from_editable_line, edit and edit_entities methods to support editing entities using a text editor.
- ontologies: type entries are now prefixed with "type." and metadata entries are prefixed with "meta."; provide a worked ontology example in the introduction and improve related docstrings.
- TagsOntology: new .types(), .types_names(), .meta(type_name,value), .meta_names() methods.
- TagsOntology.getitem: create missing TagSets on demand.
- New TagsOntologyCommand, initially with a "type [type_name [{edit|list}]]" subcommand, ready for use as the cmd_ont subcommand of other tag related commands.
- TagSet: support initialisation like a dict including keywords, and move the
ontologyparameter to_onotology. - TagSet: include AttrableMappingMixin to enable attribute access to values when there is no conflict with normal methods.
- UUID encode/decode support.
- Honour $TAGSET_EDITOR or $EDITOR as preferred interactive editor for tags.
- New TagSet.subtags(prefix) to extract a subset of the tags.
- TagsOntology.value_metadata: new optional convert parameter to override the default "convert human friendly name" algorithm, particularly to pass convert=str to things which are already the basic id.
- Rename TaggedEntity to TagSet.
- Rename TaggedEntities to TagSets.
- TagSet: new csvrow and from_csvrow methods imported from obsolete TaggedEntityMixin class.
- Move BaseTagFile from cs.fstags to TagFile in cs.tagset.
- TagSet: support access to the tag "c.x" via attributes provided there is no "c" tag in the way.
- TagSet.unixtime: implement the autoset-to-now semantics.
- New as_timestamp(): convert date, datetime, int or float to a UNIX timestamp.
- Assorted docstring updates and bugfixes.
Release 20200716:
- Update for changed cs.obj.SingletonMixin API.
- Pull in TaggedEntity from cs.sqltags and add the .csvrow property and the .from_csvrow factory.
Release 20200521.1: Fix DISTINFO.install_requires, drop debug import.
Release 20200521:
- New ValueDetail and KeyValueDetail classes for returning ontology information; TagInfo.detail now returns a ValueDetail for scalar types, a list of ValueDetails for sequence types and a list of KeyValueDetails for mapping types; drop various TagInfo mapping/iterable style methods, too confusing to use.
- Plumb ontology parameter throughout, always optional.
- Drop TypedTag, Tags now use ontologies for this.
- New TagsCommandMixin to support BaseCommands which manipulate Tags.
- Many improvements and bugfixes.
Release 20200318:
- Note that the TagsOntology stuff is in flux and totally alpha.
- Tag.prefix_name factory returning a new tag if prefix is not empty, ptherwise self.
- TagSet.update: accept an optional prefix for inserting "foreign" tags with a distinguishing name prefix.
- Tag.as_json: turn sets and tuples into lists for encoding.
- Backport for Python < 3.7 (no fromisoformat functions).
- TagSet: drop unused and illplaced .titleify, .episode_title and .title methods.
- TagSet: remove "defaults", unused.
- Make TagSet a direct subclass of dict, adjust uses of .update etc.
- New ExtendedNamespace class which is a SimpleNamespace with some inferred attributes and a partial mapping API (keys and getitem).
- New TagSet.ns() returning the Tags as an ExtendedNamespace, which doubles as a mapping for str.format_map; TagSet.format_kwargs is now an alias for this.
- New Tag.from_string factory to parse a str into a Tag.
- New TagsOntology and TypedTag classes to provide type and value-detail information; very very alpha and subject to change.
Release 20200229.1: Initial release: pull TagSet, Tag, TagChoice from cs.fstags for independent use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.