Extensions to the Zope 3 Catalog
Project description
zc.catalog is an extension to the Zope 3 catalog, Zope 3’s indexing and search facility. zc.catalog contains a number of extensions to the Zope 3 catalog, such as some new indexes, improved globbing and stemming support, and an alternative catalog implementation.
Detailed Dcoumentation
Value Index
The valueindex is an index similar to, but more flexible than a standard Zope field index. The index allows searches for documents that contain any of a set of values; between a set of values; any (non-None) values; and any empty values.
Additionally, the index supports an interface that allows examination of the indexed values.
It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself.
On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty.
>>> from zc.catalog.index import ValueIndex >>> index = ValueIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0
The index supports indexing any value. All values within a given index must sort consistently across Python versions.
>>> data = {1: 'a', ... 2: 'b', ... 3: 'a', ... 4: 'c', ... 5: 'd', ... 6: 'c', ... 7: 'c', ... 8: 'b', ... 9: 'c', ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ...
After indexing, the statistics and values match the newly entered content.
>>> list(index.values()) ['a', 'b', 'c', 'd'] >>> index.documentCount() 9 >>> index.wordCount() 4 >>> index.maxValue() 'd' >>> index.minValue() 'a' >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9]
The index supports four types of query. The first is ‘any_of’. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are not weighted.
>>> list(index.apply({'any_of':('b', 'c')})) [2, 4, 6, 7, 8, 9] >>> list(index.apply({'any_of': ('b',)})) [2, 8] >>> list(index.apply({'any_of': ('d',)})) [5] >>> list(index.apply({'any_of':(42,)})) []
Another query is ‘any’, If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned.
>>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9]>>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4]
The ‘between’ argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value should be excluded, and also defaults to False. The results are not weighted.
>>> list(index.apply({'between': ('b', 'd')})) [2, 4, 5, 6, 7, 8, 9] >>> list(index.apply({'between': ('c', None)})) [4, 5, 6, 7, 9] >>> list(index.apply({'between': ('c',)})) [4, 5, 6, 7, 9] >>> list(index.apply({'between': ('b', 'd', True, True)})) [4, 6, 7, 9]
The ‘none’ argument takes an extent and returns the ids in the extent that are not indexed; it is intended to be used to return docids that have no (or empty) values.
>>> list(index.apply({'none': extent})) [0, 10, 11, 12, 13, 14]
Trying to use more than one of these at a time generates an error.
>>> index.apply({'between': (5,), 'any_of': (3,)}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:...
Using none of them simply returns None.
>>> index.apply({}) # returns None
Invalid query names cause ValueErrors.
>>> index.apply({'foo':()}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:...
When you unindex a document, the searches and statistics should be updated.
>>> index.unindex_doc(5) >>> len(index.apply({'any_of': ('d',)})) 0 >>> index.documentCount() 8 >>> index.wordCount() 3 >>> list(index.values()) ['a', 'b', 'c'] >>> list(index.ids()) [1, 2, 3, 4, 6, 7, 8, 9]
Reindexing a document that has a changed value also is reflected in subsequent searches and statistic checks.
>>> list(index.apply({'any_of': ('b',)})) [2, 8] >>> data[8] = 'e' >>> index.index_doc(8, data[8]) >>> index.documentCount() 8 >>> index.wordCount() 4 >>> list(index.apply({'any_of': ('e',)})) [8] >>> list(index.apply({'any_of': ('b',)})) [2] >>> data[2] = 'e' >>> index.index_doc(2, data[2]) >>> index.documentCount() 8 >>> index.wordCount() 3 >>> list(index.apply({'any_of': ('e',)})) [2, 8] >>> list(index.apply({'any_of': ('b',)})) []
Reindexing a document for which the value is now None causes it to be removed from the statistics.
>>> data[3] = None >>> index.index_doc(3, data[3]) >>> index.documentCount() 7 >>> index.wordCount() 3 >>> list(index.ids()) [1, 2, 4, 6, 7, 8, 9]
This affects both ways of determining the ids that are and are not in the index (that do and do not have values).
>>> list(index.apply({'any': None})) [1, 2, 4, 6, 7, 8, 9] >>> list(index.apply({'any': extent})) [1, 2, 4, 6, 7, 8, 9] >>> list(index.apply({'none': extent})) [0, 3, 5, 10, 11, 12, 13, 14]
The values method can be used to examine the indexed values for a given document id. For a valueindex, the “values” for a given doc_id will always have a length of 0 or 1.
>>> index.values(doc_id=8) ('e',)
And the containsValue method provides a way of determining membership in the values.
>>> index.containsValue('a') True >>> index.containsValue('q') False
Set Index
The setindex is an index similar to, but more general than a traditional keyword index. The values indexed are expected to be iterables; the index allows searches for documents that contain any of a set of values; all of a set of values; or between a set of values.
Additionally, the index supports an interface that allows examination of the indexed values.
It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself.
On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty.
>>> from zc.catalog.index import SetIndex >>> index = SetIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0
The index supports indexing any value. All values within a given index must sort consistently across Python versions. In our example, we hope that strings and integers will sort consistently; this may not be a reasonable hope.
>>> data = {1: ['a', 1], ... 2: ['b', 'a', 3, 4, 7], ... 3: [1], ... 4: [1, 4, 'c'], ... 5: [7], ... 6: [5, 6, 7], ... 7: ['c'], ... 8: [1, 6], ... 9: ['a', 'c', 2, 3, 4, 6,], ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ...
After indexing, the statistics and values match the newly entered content.
>>> list(index.values()) [1, 2, 3, 4, 5, 6, 7, 'a', 'b', 'c'] >>> index.documentCount() 9 >>> index.wordCount() 10 >>> index.maxValue() 'c' >>> index.minValue() 1 >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9]
The index supports five types of query. The first is ‘any_of’. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are weighted.
>>> list(index.apply({'any_of':('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of': ('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of':(42,)})) [] >>> index.apply({'any_of': ('a', 3, 7)}) # doctest: +ELLIPSIS BTrees...FBucket([(1, 1.0), (2, 3.0), (5, 1.0), (6, 1.0), (9, 2.0)])
Another query is ‘any’. If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned.
>>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9]>>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4]
The ‘all_of’ argument also takes an iterable of values, but returns an iterable of document ids that contains all of the values. The results are not weighted [1].
>>> list(index.apply({'all_of': ('a',)})) [1, 2, 9] >>> list(index.apply({'all_of': (3, 4)})) [2, 9]
The ‘between’ argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value should be excluded, and also defaults to False. The results are weighted.
>>> list(index.apply({'between': (1, 7)})) [1, 2, 3, 4, 5, 6, 8, 9] >>> list(index.apply({'between': ('b', None)})) [2, 4, 7, 9] >>> list(index.apply({'between': ('b',)})) [2, 4, 7, 9] >>> list(index.apply({'between': (1, 7, True, True)})) [2, 4, 6, 8, 9] >>> index.apply({'between': (2, 6)}) # doctest: +ELLIPSIS BTrees...FBucket([(2, 2.0), (4, 1.0), (6, 2.0), (8, 1.0), (9, 4.0)])
The ‘none’ argument takes an extent and returns the ids in the extent that are not indexed; it is intended to be used to return docids that have no (or empty) values.
>>> list(index.apply({'none': extent})) [0, 10, 11, 12, 13, 14]
Trying to use more than one of these at a time generates an error.
>>> index.apply({'all_of': (5,), 'any_of': (3,)}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:...
Using none of them simply returns None.
>>> index.apply({}) # returns None
Invalid query names cause ValueErrors.
>>> index.apply({'foo':()}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:...
When you unindex a document, the searches and statistics should be updated.
>>> index.unindex_doc(6) >>> len(index.apply({'any_of': (5,)})) 0 >>> index.documentCount() 8 >>> index.wordCount() 9 >>> list(index.values()) [1, 2, 3, 4, 6, 7, 'a', 'b', 'c'] >>> list(index.ids()) [1, 2, 3, 4, 5, 7, 8, 9]
Reindexing a document that has new additional values also is reflected in subsequent searches and statistic checks.
>>> data[8].extend([5, 'c']) >>> index.index_doc(8, data[8]) >>> index.documentCount() 8 >>> index.wordCount() 10 >>> list(index.apply({'any_of': (5,)})) [8] >>> list(index.apply({'any_of': ('c',)})) [4, 7, 8, 9]
The same is true for reindexing a document with both additions and removals.
>>> 2 in set(index.apply({'any_of': (7,)})) True >>> 2 in set(index.apply({'any_of': (2,)})) False >>> data[2].pop() 7 >>> data[2].append(2) >>> index.index_doc(2, data[2]) >>> 2 in set(index.apply({'any_of': (7,)})) False >>> 2 in set(index.apply({'any_of': (2,)})) True
Reindexing a document that no longer has any values causes it to be removed from the statistics.
>>> del data[2][:] >>> index.index_doc(2, data[2]) >>> index.documentCount() 7 >>> index.wordCount() 9 >>> list(index.ids()) [1, 3, 4, 5, 7, 8, 9]
This affects both ways of determining the ids that are and are not in the index (that do and do not have values).
>>> list(index.apply({'any': None})) [1, 3, 4, 5, 7, 8, 9] >>> list(index.apply({'none': extent})) [0, 2, 6, 10, 11, 12, 13, 14]
The values method can be used to examine the indexed values for a given document id.
>>> set(index.values(doc_id=8)) == set([1, 5, 6, 'c']) True
And the containsValue method provides a way of determining membership in the values.
>>> index.containsValue(5) True >>> index.containsValue(20) False
Normalized Index
The index module provides a normalizing wrapper, a DateTime normalizer, and a set index and a value index normalized with the DateTime normalizer.
The normalizing wrapper implements a full complement of index interfaces– zope.index.interfaces.IInjection, zope.index.interfaces.IIndexSearch, zope.index.interfaces.IStatistics, and zc.catalog.interfaces.IIndexValues– and delegates all of the behavior to the wrapped index, normalizing values using the normalizer before the index sees them.
The normalizing wrapper currently only supports queries offered by zc.catalog.interfaces.ISetIndex and zc.catalog.interfaces.IValueIndex.
The normalizer interface requires the following methods, as defined in the interface:
- def value(value):
“””normalize or check constraints for an input value; raise an error or return the value to be indexed.”””
- def any(value, index):
“””normalize a query value for a “any_of” search; return a sequence of values.”””
- def all(value, index):
“””Normalize a query value for an “all_of” search; return the value for query”””
- def minimum(value, index):
“””normalize a query value for minimum of a range; return the value for query”””
- def maximum(value, index):
“””normalize a query value for maximum of a range; return the value for query”””
The DateTime normalizer performs the following normalizations and validations. Whenever a timezone is needed, it tries to get a request from the current interaction and adapt it to zope.interface.common.idatetime.ITZInfo; failing that (no request or no adapter) it uses the system local timezone.
input values must be datetimes with a timezone. They are normalized to the resolution specified when the normalizer is created: a resolution of 0 normalizes values to days; a resolution of 1 to hours; 2 to minutes; 3 to seconds; and 4 to microseconds.
‘any’ values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to any value from the start to the end of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone.
‘all’ values may be timezone-aware datetimes or timezone-naive datetimes. timezone-naive datetimes get the found timezone.
‘minimum’ values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to the start of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone.
‘maximum’ values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to the end of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone.
Let’s look at the DateTime normalizer first, and then an integration of it with the normalizing wrapper and the value and set indexes.
The indexed values are parsed with ‘value’.
>>> from zc.catalog.index import DateTimeNormalizer >>> n = DateTimeNormalizer() # defaults to minutes >>> import datetime >>> import pytz >>> naive_datetime = datetime.datetime(2005, 7, 15, 11, 21, 32, 104) >>> date = naive_datetime.date() >>> aware_datetime = naive_datetime.replace( ... tzinfo=pytz.timezone('US/Eastern')) >>> n.value(naive_datetime) Traceback (most recent call last): ... ValueError: This index only indexes timezone-aware datetimes. >>> n.value(date) Traceback (most recent call last): ... ValueError: This index only indexes timezone-aware datetimes. >>> n.value(aware_datetime) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, tzinfo=<DstTzInfo 'US/Eastern'...>)
If we specify a different resolution, the results are different.
>>> another = DateTimeNormalizer(1) # hours >>> another.value(aware_datetime) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 0, tzinfo=<DstTzInfo 'US/Eastern'...>)
Note that changing the resolution of an indexed value may create surprising results, because queries do not change their resolution. Therefore, if you index something with a datetime with a finer resolution that the normalizer’s, then searching for that datetime will not find the doc_id.
Values in an ‘any_of’ query are parsed with ‘any’. ‘any’ should return a sequence of values. It requires an index, which we will mock up here.
>>> class DummyIndex(object): ... def values(self, start, stop, exclude_start, exclude_stop): ... assert not exclude_start and exclude_stop ... six_hours = datetime.timedelta(hours=6) ... res = [] ... dt = start ... while dt < stop: ... res.append(dt) ... dt += six_hours ... return res ... >>> index = DummyIndex() >>> tuple(n.any(naive_datetime, index)) # doctest: +ELLIPSIS (datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>),) >>> tuple(n.any(aware_datetime, index)) # doctest: +ELLIPSIS (datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>),) >>> tuple(n.any(date, index)) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS (datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 6, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 12, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 18, 0, tzinfo=<...Local...>))
Values in an ‘all_of’ query are parsed with ‘all’.
>>> n.all(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.all(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.all(date, index) # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError: ...
Minimum values in a ‘between’ query as well as those in other methods are parsed with ‘minimum’. They also take an optional exclude boolean, which indicates whether the minimum is to be excluded. For datetimes, it only makes a difference if you pass in a date.
>>> n.minimum(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.minimum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>)>>> n.minimum(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.minimum(aware_datetime, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>)>>> n.minimum(date, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>) >>> n.minimum(date, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>)
Maximum values in a ‘between’ query as well as those in other methods are parsed with ‘maximum’. They also take an optional exclude boolean, which indicates whether the maximum is to be excluded. For datetimes, it only makes a difference if you pass in a date.
>>> n.maximum(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.maximum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>)>>> n.maximum(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.maximum(aware_datetime, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>)>>> n.maximum(date, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>) >>> n.maximum(date, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>)
Now let’s examine these normalizers in the context of a real index.
>>> from zc.catalog.index import DateTimeValueIndex, DateTimeSetIndex >>> setindex = DateTimeSetIndex() # minutes resolution >>> data = [] # generate some data >>> def date_gen( ... start=aware_datetime, ... count=12, ... period=datetime.timedelta(hours=10)): ... dt = start ... ix = 0 ... while ix < count: ... yield dt ... dt += period ... ix += 1 ... >>> gen = date_gen() >>> count = 0 >>> while True: ... try: ... next = [gen.next() for i in range(6)] ... except StopIteration: ... break ... data.append((count, next[0:1])) ... count += 1 ... data.append((count, next[1:3])) ... count += 1 ... data.append((count, next[3:6])) ... count += 1 ... >>> print data # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE [(0, [datetime.datetime(2005, 7, 15, 11, 21, 32, 104, ...<...Eastern...>)]), (1, [datetime.datetime(2005, 7, 15, 21, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 16, 7, 21, 32, 104, ...<...Eastern...>)]), (2, [datetime.datetime(2005, 7, 16, 17, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 3, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 13, 21, 32, 104, ...<...Eastern...>)]), (3, [datetime.datetime(2005, 7, 17, 23, 21, 32, 104, ...<...Eastern...>)]), (4, [datetime.datetime(2005, 7, 18, 9, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 18, 19, 21, 32, 104, ...<...Eastern...>)]), (5, [datetime.datetime(2005, 7, 19, 5, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 19, 15, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 20, 1, 21, 32, 104, ...<...Eastern...>)])] >>> data_dict = dict(data) >>> for doc_id, value in data: ... setindex.index_doc(doc_id, value) ... >>> list(setindex.ids()) [0, 1, 2, 3, 4, 5] >>> set(setindex.values()) == set( ... setindex.normalizer.value(v) for v in date_gen()) True
For the searches, we will actually use a request and interaction, with an adapter that returns the Eastern timezone. This makes the examples less dependent on the machine that they use.
>>> import zope.security.management >>> import zope.publisher.browser >>> import zope.interface.common.idatetime >>> import zope.publisher.interfaces >>> request = zope.publisher.browser.TestRequest() >>> zope.security.management.newInteraction(request) >>> from zope import interface, component >>> @interface.implementer(zope.interface.common.idatetime.ITZInfo) ... @component.adapter(zope.publisher.interfaces.IRequest) ... def tzinfo(req): ... return pytz.timezone('US/Eastern') ... >>> component.provideAdapter(tzinfo) >>> n.all(naive_datetime, index).tzinfo is pytz.timezone('US/Eastern') True>>> set(setindex.apply({'any_of': (datetime.date(2005, 7, 17), ... datetime.date(2005, 7, 20), ... datetime.date(2005, 12, 31))})) == set( ... (2, 3, 5)) True
Note that this search is using the normalized values.
>>> set(setindex.apply({'all_of': ( ... datetime.datetime( ... 2005, 7, 16, 7, 21, tzinfo=pytz.timezone('US/Eastern')), ... datetime.datetime( ... 2005, 7, 15, 21, 21, tzinfo=pytz.timezone('US/Eastern')),)}) ... ) == set((1,)) True >>> list(setindex.apply({'any': None})) [0, 1, 2, 3, 4, 5] >>> set(setindex.apply({'between': ( ... datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1))}) ... ) == set((0, 1, 2, 3, 4, 5)) True >>> set(setindex.apply({'between': ( ... datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1), ... True, True)}) ... ) == set((0, 1, 2, 3, 4, 5)) True
‘between’ searches should deal with dates well.
>>> set(setindex.apply({'between': ( ... datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))}) ... ) == set((1, 2, 3)) True >>> len(setindex.apply({'between': ( ... datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))}) ... ) == len(setindex.apply({'between': ( ... datetime.date(2005, 7, 15), datetime.date(2005, 7, 18), ... True, True)}) ... ) True
Removing docs works as usual.
>>> setindex.unindex_doc(1) >>> list(setindex.ids()) [0, 2, 3, 4, 5]
Value, Minvalue and Maxvalue can take timezone-less datetimes and dates.
>>> setindex.minValue() # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, ...<...Eastern...>) >>> setindex.minValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>)>>> setindex.maxValue() # doctest: +ELLIPSIS datetime.datetime(2005, 7, 20, 1, 21, ...<...Eastern...>) >>> setindex.maxValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>)>>> list(setindex.values( ... datetime.date(2005, 7, 17), datetime.date(2005, 7, 17))) ... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE [datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 13, 21, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>)]>>> zope.security.management.endInteraction() # TODO put in tests tearDown
Extent Catalog
An extent catalog is very similar to a normal catalog except that it only indexes items addable to its extent. The extent is both a filter and a set that may be merged with other result sets. The filtering is an additional feature we will discuss below; we’ll begin with a simple “do nothing” extent that only supports the second use case.
To show the extent catalog at work, we need an intid utility, an index, some items to index. We’ll do this within a real ZODB and a real intid utility [2].
>>> import zc.catalog >>> import zc.catalog.interfaces >>> from zc.catalog import interfaces, extentcatalog >>> from zope import interface, component >>> from zope.interface import verify >>> import persistent >>> import BTrees.IFBTree>>> root = makeRoot() >>> intid = zope.component.getUtility( ... zope.app.intid.interfaces.IIntIds, context=root) >>> TreeSet = btrees_family.IF.TreeSet>>> from zope.app.container.interfaces import IContained >>> class DummyIndex(persistent.Persistent): ... interface.implements(IContained) ... __parent__ = __name__ = None ... def __init__(self): ... self.uids = TreeSet() ... def unindex_doc(self, uid): ... if uid in self.uids: ... self.uids.remove(uid) ... def index_doc(self, uid, obj): ... self.uids.insert(uid) ... def clear(self): ... self.uids.clear() ... >>> class DummyContent(persistent.Persistent): ... def __init__(self, name, parent): ... self.id = name ... self.__parent__ = parent ...>>> extent = extentcatalog.Extent(family=btrees_family) >>> verify.verifyObject(interfaces.IExtent, extent) True >>> root['catalog'] = catalog = extentcatalog.Catalog(extent) >>> verify.verifyObject(interfaces.IExtentCatalog, catalog) True >>> index = DummyIndex() >>> catalog['index'] = index >>> transaction.commit()
Now we have a catalog set up with an index and an extent. We can add some data to the extent:
>>> matches = [] >>> for i in range(100): ... c = DummyContent(i, root) ... root[i] = c ... doc_id = intid.register(c) ... catalog.index_doc(doc_id, c) ... matches.append(doc_id) >>> matches.sort() >>> sorted(extent) == sorted(index.uids) == matches True
We can get the size of the extent.
>>> len(extent) 100
Unindexing an object that is in the catalog should simply remove it from the catalog and index as usual.
>>> matches[0] in catalog.extent True >>> matches[0] in catalog['index'].uids True >>> catalog.unindex_doc(matches[0]) >>> matches[0] in catalog.extent False >>> matches[0] in catalog['index'].uids False >>> doc_id = matches.pop(0) >>> sorted(extent) == sorted(index.uids) == matches True
Clearing the catalog clears both the extent and the contained indexes.
>>> catalog.clear() >>> list(catalog.extent) == list(catalog['index'].uids) == [] True
Updating all indexes and an individual index both also update the extent.
>>> catalog.updateIndexes() >>> matches.insert(0, doc_id) >>> sorted(extent) == sorted(index.uids) == matches True>>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> index2.__parent__ == catalog True >>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched >>> catalog.updateIndex(index2) >>> sorted(extent) == sorted(index2.uids) == matches True >>> matches[0] in index.uids False >>> matches[0] in index2.uids True >>> res = index.uids.insert(matches[0])
But so why have an extent in the first place? It allows indices to operate against a reliable collection of the full indexed data; therefore, it allows the indices in zc.catalog to perform NOT operations.
The extent itself provides a number of merging features to allow its values to be merged with other BTrees.IFBTree data structures. These include intersection, union, difference, and reverse difference. Given an extent named ‘extent’ and another IFBTree data structure named ‘data’, intersections can be spelled “extent & data” or “data & extent”; unions can be spelled “extent | data” or “data | extent”; differences can be spelled “extent - data”; and reverse differences can be spelled “data - extent”. Unions and intersections are weighted.
>>> extent = extentcatalog.Extent(family=btrees_family) >>> for i in range(1, 100, 2): ... extent.add(i, None) ... >>> alt_set = TreeSet() >>> alt_set.update(range(0, 166, 33)) # return value is unimportant here 6 >>> sorted(alt_set) [0, 33, 66, 99, 132, 165] >>> sorted(extent & alt_set) [33, 99] >>> sorted(alt_set & extent) [33, 99] >>> sorted(extent.intersection(alt_set)) [33, 99] >>> original = set(extent) >>> union_matches = original.copy() >>> union_matches.update(alt_set) >>> union_matches = sorted(union_matches) >>> sorted(alt_set | extent) == union_matches True >>> sorted(extent | alt_set) == union_matches True >>> sorted(extent.union(alt_set)) == union_matches True >>> sorted(alt_set - extent) [0, 66, 132, 165] >>> sorted(extent.rdifference(alt_set)) [0, 66, 132, 165] >>> original.remove(33) >>> original.remove(99) >>> set(extent - alt_set) == original True >>> set(extent.difference(alt_set)) == original True
Catalog with a filter extent
As discussed at the beginning of this document, extents can not only help with index operations, but also act as a filter, so that a given catalog can answer questions about a subset of the objects contained in the intids.
The filter extent only stores objects that match a given filter.
>>> def filter(extent, uid, ob): ... assert interfaces.IFilterExtent.providedBy(extent) ... # This is an extent of objects with odd-numbered uids without a ... # True ignore attribute ... return uid % 2 and not getattr(ob, 'ignore', False) ... >>> extent = extentcatalog.FilterExtent(filter, family=btrees_family) >>> verify.verifyObject(interfaces.IFilterExtent, extent) True >>> root['catalog1'] = catalog = extentcatalog.Catalog(extent) >>> verify.verifyObject(interfaces.IExtentCatalog, catalog) True >>> index = DummyIndex() >>> catalog['index'] = index >>> transaction.commit()
Now we have a catalog set up with an index and an extent. If we create some content and ask the catalog to index it, only the ones that match the filter will be in the extent and in the index.
>>> matches = [] >>> fails = [] >>> i = 0 >>> while True: ... c = DummyContent(i, root) ... root[i] = c ... doc_id = intid.register(c) ... catalog.index_doc(doc_id, c) ... if filter(extent, doc_id, c): ... matches.append(doc_id) ... else: ... fails.append(doc_id) ... i += 1 ... if i > 99 and len(matches) > 4: ... break ... >>> matches.sort() >>> sorted(extent) == sorted(index.uids) == matches True
If a content object is indexed that used to match the filter but no longer does, it should be removed from the extent and indexes.
>>> matches[0] in catalog.extent True >>> obj = intid.getObject(matches[0]) >>> obj.ignore = True >>> filter(extent, matches[0], obj) False >>> catalog.index_doc(matches[0], obj) >>> doc_id = matches.pop(0) >>> doc_id in catalog.extent False >>> sorted(extent) == sorted(index.uids) == matches True
Unindexing an object that is not in the catalog should be a no-op.
>>> fails[0] in catalog.extent False >>> catalog.unindex_doc(fails[0]) >>> fails[0] in catalog.extent False >>> sorted(extent) == sorted(index.uids) == matches True
Updating all indexes and an individual index both also update the extent.
>>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> index2.__parent__ == catalog True >>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched >>> catalog.updateIndex(index2) >>> sorted(extent) == sorted(index2.uids) True >>> matches[0] in index.uids False >>> matches[0] in index2.uids True >>> res = index.uids.insert(matches[0])
If you update a single index and an object is no longer a member of the extent, it is removed from all indexes.
>>> matches[0] in catalog.extent True >>> matches[0] in index.uids True >>> matches[0] in index2.uids True >>> obj = intid.getObject(matches[0]) >>> obj.ignore = True >>> catalog.updateIndex(index2) >>> matches[0] in catalog.extent False >>> matches[0] in index.uids False >>> matches[0] in index2.uids False >>> doc_id = matches.pop(0) >>> (matches == sorted(catalog.extent) == sorted(index.uids) ... == sorted(index2.uids)) True
Self-populating extents
An extent may know how to populate itself; this is especially useful if the catalog can be initialized with fewer items than those available in the IIntIds utility that are also within the nearest Zope 3 site (the policy coded in the basic Zope 3 catalog).
Such an extent must implement the ISelfPopulatingExtent interface, which requires two attributes. Let’s use the FilterExtent class as a base for implementing such an extent, with a method that selects content item 0 (created and registered above):
>>> class PopulatingExtent( ... extentcatalog.FilterExtent, ... extentcatalog.NonPopulatingExtent): ... ... def populate(self): ... if self.populated: ... return ... self.add(intid.getId(root[0]), root[0]) ... super(PopulatingExtent, self).populate()
Creating a catalog based on this extent ignores objects in the database already:
>>> def accept_any(extent, uid, ob): ... return True >>> extent = PopulatingExtent(accept_any, family=btrees_family) >>> catalog = extentcatalog.Catalog(extent) >>> index = DummyIndex() >>> catalog['index'] = index >>> root['catalog2'] = catalog >>> transaction.commit()
At this point, our extent remains unpopulated:
>>> extent.populated False
Iterating over the extent does not cause it to be automatically populated:
>>> list(extent) []
Causing our new index to be filled will cause the populate() method to be called, setting the populate flag as a side-effect:
>>> catalog.updateIndex(index) >>> extent.populated True >>> list(extent) == [intid.getId(root[0])] True
The index has been updated with the documents identified by the extent:
>>> list(index.uids) == [intid.getId(root[0])] True
Updating the same index repeatedly will continue to use the extent as the source of documents to include:
>>> catalog.updateIndex(index) >>> list(extent) == [intid.getId(root[0])] True >>> list(index.uids) == [intid.getId(root[0])] True
The updateIndexes() method has a similar behavior. If we add an additional index to the catalog, we see that it indexes only those objects from the extent:
>>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> catalog.updateIndexes() >>> list(extent) == [intid.getId(root[0])] True >>> list(index.uids) == [intid.getId(root[0])] True >>> list(index2.uids) == [intid.getId(root[0])] True
When we have fresh catalog and extent (not yet populated), we see that updateIndexes() will cause the extent to be populated:
>>> extent = PopulatingExtent(accept_any, family=btrees_family) >>> root['catalog3'] = catalog = extentcatalog.Catalog(extent) >>> index1 = DummyIndex() >>> index2 = DummyIndex() >>> catalog['index1'] = index1 >>> catalog['index2'] = index2 >>> transaction.commit() >>> extent.populated False >>> catalog.updateIndexes() >>> extent.populated True >>> list(extent) == [intid.getId(root[0])] True >>> list(index1.uids) == [intid.getId(root[0])] True >>> list(index2.uids) == [intid.getId(root[0])] True
We’ll make sure everything can be safely committed.
>>> transaction.commit() >>> setSiteManager(None)
We create the state that the text needs here.
>>> import zope.app.keyreference.persistent >>> import zope.component >>> import zope.app.intid >>> import zope.component >>> import zope.component.interfaces >>> import zope.component.persistentregistry >>> from ZODB.tests.util import DB >>> import transaction
>>> zope.component.provideAdapter( ... zope.app.keyreference.persistent.KeyReferenceToPersistent, ... adapts=(zope.interface.Interface,)) >>> zope.component.provideAdapter( ... zope.app.keyreference.persistent.connectionOfPersistent, ... adapts=(zope.interface.Interface,))
>>> site_manager = None >>> def getSiteManager(context=None): ... if context is None: ... if site_manager is None: ... return zope.component.getGlobalSiteManager() ... else: ... return site_manager ... else: ... try: ... return zope.component.interfaces.IComponentLookup(context) ... except TypeError, error: ... raise zope.component.ComponentLookupError(*error.args) ... >>> def setSiteManager(sm): ... global site_manager ... site_manager = sm ... if sm is None: ... zope.component.getSiteManager.reset() ... else: ... zope.component.getSiteManager.sethook(getSiteManager) ... >>> def makeRoot(): ... db = DB() ... conn = db.open() ... root = conn.root() ... site_manager = root['components'] = ( ... zope.component.persistentregistry.PersistentComponents()) ... site_manager.__bases__ = (zope.component.getGlobalSiteManager(),) ... site_manager.registerUtility( ... zope.app.intid.IntIds(family=btrees_family), ... provided=zope.app.intid.interfaces.IIntIds) ... setSiteManager(site_manager) ... transaction.commit() ... return root ...
>>> @zope.component.adapter(zope.interface.Interface) ... @zope.interface.implementer(zope.component.interfaces.IComponentLookup) ... def getComponentLookup(obj): ... return obj._p_jar.root()['components'] ... >>> zope.component.provideAdapter(getComponentLookup)
Unregister the objects of the previous tests from intid utility:
>>> intid = zope.component.getUtility( ... zope.app.intid.interfaces.IIntIds, context=root) >>> for doc_id in matches: ... intid.unregister(intid.queryObject(doc_id))
Stemmer
The stemmer uses Andreas Jung’s stemmer code, which is a Python wrapper of M. F. Porter’s Snowball project (http://snowball.tartarus.org/index.php). It is designed to be used as part of a pipeline in a zope/index/text/ lexicon, after a splitter. This enables getting the relevance ranking of the zope/index/text code with the splitting functionality of TextIndexNG 3.x.
It requires that the TextIndexNG extensions–specifically txngstemmer–have been compiled and installed in your Python installation. Inclusion of the textindexng package is not necessary.
As of this writing (Jan 3, 2007), installing the necessary extensions can be done with the following steps:
svn co https://svn.sourceforge.net/svnroot/textindexng/extension_modules/trunk ext_mod
cd ext_mod
(using the python you use for Zope) python setup.py install
Another approach is to simply install TextIndexNG (see http://opensource.zopyx.com/software/textindexng3)
The stemmer must be instantiated with the language for which stemming is desired. It defaults to ‘english’. For what it is worth, other languages supported as of this writing, using the strings that the stemmer expects, include the following: ‘danish’, ‘dutch’, ‘english’, ‘finnish’, ‘french’, ‘german’, ‘italian’, ‘norwegian’, ‘portuguese’, ‘russian’, ‘spanish’, and ‘swedish’.
For instance, let’s build an index with an english stemmer.
>>> from zope.index.text import textindex, lexicon >>> import zc.catalog.stemmer >>> lex = lexicon.Lexicon( ... lexicon.Splitter(), lexicon.CaseNormalizer(), ... lexicon.StopWordRemover(), zc.catalog.stemmer.Stemmer('english')) >>> ix = textindex.TextIndex(lex) >>> data = [ ... (0, 'consigned consistency consoles the constables'), ... (1, 'knaves kneeled and knocked knees, knowing no knights')] >>> for doc_id, text in data: ... ix.index_doc(doc_id, text) ... >>> list(ix.apply('consoling a constable')) [0] >>> list(ix.apply('knightly kneel')) [1]
Note that query terms with globbing characters are not stemmed.
>>> list(ix.apply('constables*')) []
Support for legacy data
Prior to the introduction of btree “families” and the BTrees.Interfaces.IBTreeFamily interface, the indexes defined by the zc.catalog.index module used the instance attributes btreemodule and IOBTree, initialized in the constructor, and the BTreeAPI property. These are replaced by the family attribute in the current implementation.
This is a white-box test that verifies that the supported values in existing data structures (loaded from pickles) can be used effectively with the current implementation.
There are two supported sets of values; one for 32-bit btrees:
>>> import BTrees.IOBTree >>> legacy32 = { ... "btreemodule": "BTrees.IFBTree", ... "IOBTree": BTrees.IOBTree.IOBTree, ... }
and another for 64-bit btrees:
>>> import BTrees.LOBTree >>> legacy64 = { ... "btreemodule": "BTrees.LFBTree", ... "IOBTree": BTrees.LOBTree.LOBTree, ... }
In each case, actual legacy structures will also include index structures that match the right integer size:
>>> import BTrees.OOBTree >>> import BTrees.Length >>> legacy32["values_to_documents"] = BTrees.OOBTree.OOBTree() >>> legacy32["documents_to_values"] = BTrees.IOBTree.IOBTree() >>> legacy32["documentCount"] = BTrees.Length.Length(0) >>> legacy32["wordCount"] = BTrees.Length.Length(0) >>> legacy64["values_to_documents"] = BTrees.OOBTree.OOBTree() >>> legacy64["documents_to_values"] = BTrees.LOBTree.LOBTree() >>> legacy64["documentCount"] = BTrees.Length.Length(0) >>> legacy64["wordCount"] = BTrees.Length.Length(0)
What we want to do is verify that the family attribute is properly computed for instances loaded from legacy data, and ensure that the structure is updated cleanly without providing cause for a read-only transaction to become a write-transaction. We’ll need to create instances that conform to the old data structures, pickle them, and show that unpickling them produces instances that use the correct families.
Let’s create new instances, and force the internal data to match the old structures:
>>> import pickle >>> import zc.catalog.index >>> vi32 = zc.catalog.index.ValueIndex() >>> vi32.__dict__ = legacy32.copy() >>> legacy32_pickle = pickle.dumps(vi32) >>> vi64 = zc.catalog.index.ValueIndex() >>> vi64.__dict__ = legacy64.copy() >>> legacy64_pickle = pickle.dumps(vi64)
Now, let’s unpickle these structures and verify the structures. We’ll start with the 32-bit variety:
>>> vi32 = pickle.loads(legacy32_pickle) >>> vi32.__dict__["btreemodule"] 'BTrees.IFBTree' >>> vi32.__dict__["IOBTree"] <type 'BTrees.IOBTree.IOBTree'> >>> "family" in vi32.__dict__ False >>> vi32._p_changed False
The family property returns the BTrees.family32 singleton:
>>> vi32.family is BTrees.family32 True
Once accessed, the legacy values have been cleaned out from the instance dictionary:
>>> "btreemodule" in vi32.__dict__ False >>> "IOBTree" in vi32.__dict__ False >>> "BTreeAPI" in vi32.__dict__ False
Accessing these attributes as attributes provides the proper values anyway:
>>> vi32.btreemodule 'BTrees.IFBTree' >>> vi32.IOBTree <type 'BTrees.IOBTree.IOBTree'> >>> vi32.BTreeAPI <module 'BTrees.IFBTree' from ...>
Even though the instance dictionary has been cleaned up, the change flag hasn’t been set. This is handled this way to avoid turning a read-only transaction into a write-transaction:
>>> vi32._p_changed False
The 64-bit variation provides equivalent behavior:
>>> vi64 = pickle.loads(legacy64_pickle) >>> vi64.__dict__["btreemodule"] 'BTrees.LFBTree' >>> vi64.__dict__["IOBTree"] <type 'BTrees.LOBTree.LOBTree'> >>> "family" in vi64.__dict__ False >>> vi64._p_changed False >>> vi64.family is BTrees.family64 True >>> "btreemodule" in vi64.__dict__ False >>> "IOBTree" in vi64.__dict__ False >>> "BTreeAPI" in vi64.__dict__ False >>> vi64.btreemodule 'BTrees.LFBTree' >>> vi64.IOBTree <type 'BTrees.LOBTree.LOBTree'> >>> vi64.BTreeAPI <module 'BTrees.LFBTree' from ...> >>> vi64._p_changed False
Now, if we have a legacy structure and explicitly set the family attribute, the old data structures will be cleared and replaced with the new structure. If the object is associated with a data manager, the changed flag will be set as well:
>>> class DataManager(object): ... def register(self, ob): ... pass >>> vi64 = pickle.loads(legacy64_pickle) >>> vi64._p_jar = DataManager() >>> vi64.family = BTrees.family64 >>> vi64._p_changed True >>> "btreemodule" in vi64.__dict__ False >>> "IOBTree" in vi64.__dict__ False >>> "BTreeAPI" in vi64.__dict__ False >>> "family" in vi64.__dict__ True >>> vi64.family is BTrees.family64 True >>> vi64.btreemodule 'BTrees.LFBTree' >>> vi64.IOBTree <type 'BTrees.LOBTree.LOBTree'> >>> vi64.BTreeAPI <module 'BTrees.LFBTree' from ...>
Globber
The globber takes a query and makes any term that isn’t already a glob into something that ends in a star. It was originally envisioned as a very low- rent stemming hack. The author now questions its value, and hopes that the new stemming pipeline option can be used instead. Nonetheless, here is an example of it at work.
>>> from zope.index.text import textindex >>> index = textindex.TextIndex() >>> lex = index.lexicon >>> from zc.catalog import globber >>> globber.glob('foo bar and baz or (b?ng not boo)', lex) '(((foo* and bar*) and baz*) or (b?ng and not boo*))'
Callable Wrapper
If we want to index some value that is easily derivable from a document, we have to define an interface with this value as an attribute, and create an adapter that calculates this value and implements this interface. All this is too much hassle if the want to store a single easily derivable value. CallableWrapper solves this problem, by converting the document to the indexed value with a callable converter.
Here’s a contrived example. Suppose we have cars that know their mileage expressed in miles per gallon, but we want to index their economy in litres per 100 km.
>>> class Car(object): ... def __init__(self, mpg): ... self.mpg = mpg>>> def mpg2lp100(car): ... return 100.0/(1.609344/3.7854118 * car.mpg)
Let’s create an index that would index cars’ l/100 km rating.
>>> from zc.catalog import index, catalogindex >>> idx = catalogindex.CallableWrapper(index.ValueIndex(), mpg2lp100)
Let’s add a couple of cars to the index!
>>> hummer = Car(10.0) >>> beamer = Car(22.0) >>> civic = Car(45.0)>>> idx.index_doc(1, hummer) >>> idx.index_doc(2, beamer) >>> idx.index_doc(3, civic)
The indexed values should be the converted l/100 km ratings:
>>> list(idx.values()) [5.2269907628339389, 10.691572014887601, 23.521458432752723]
We can query for cars that consume fuel in some range:
>>> list(idx.apply({'between': (5.0, 7.0)})) [3]
zc.catalog Browser Support
The zc.catalog.browser package adds simple TTW addition/inspection for SetIndex and ValueIndex.
First, we need a browser so we can test the web UI.
>>> from zope.testbrowser.testing import Browser >>> browser = Browser() >>> browser.addHeader('Authorization', 'Basic mgr:mgrpw') >>> browser.addHeader('Accept-Language', 'en-US') >>> browser.open('http://localhost')
Now we need to add the catalog that these indexes are going to reside within.
>>> browser.open('/++etc++site/default/@@contents.html') >>> browser.getLink('Add').click() >>> browser.getControl('Catalog').click() >>> browser.getControl(name='id').value = 'catalog' >>> browser.getControl('Add').click()
SetIndex
Add the SetIndex to the catalog.
>>> browser.getLink('Add').click() >>> browser.getControl('Set Index').click() >>> browser.getControl(name='id').value = 'set_index' >>> browser.getControl('Add').click()
The add form needs values for what interface to adapt candidate objects to, and what field name to use, and whether-or-not that field is a callable. (We’ll use a simple interfaces for demonstration purposes, it’s not really significant.)
>>> browser.getControl('Interface', index=0).displayValue = [ ... 'zope.size.interfaces.ISized'] >>> browser.getControl('Field Name').value = 'sizeForSorting' >>> browser.getControl('Field Callable').click() >>> browser.getControl(name='add_input_name').value = 'set_index' >>> browser.getControl('Add').click()
Now we can look at the index and see how is is configured.
>>> browser.getLink('set_index').click() >>> print browser.contents <... ...Interface...zope.size.interfaces.ISized... ...Field Name...sizeForSorting... ...Field Callable...True...
We need to go back to the catalog so we can add a different index.
>>> browser.goBack()
ValueIndex
Add the ValueIndex to the catalog.
>>> browser.getLink('Add').click() >>> browser.getControl('Value Index').click() >>> browser.getControl(name='id').value = 'value_index' >>> browser.getControl('Add').click()
The add form needs values for what interface to adapt candidate objects to, and what field name to use, and whether-or-not that field is a callable. (We’ll use a simple interfaces for demonstration purposes, it’s not really significant.)
>>> browser.getControl('Interface', index=0).displayValue = [ ... 'zope.size.interfaces.ISized'] >>> browser.getControl('Field Name').value = 'sizeForSorting' >>> browser.getControl('Field Callable').click() >>> browser.getControl(name='add_input_name').value = 'value_index' >>> browser.getControl('Add').click()
Now we can look at the index and see how is is configured.
>>> browser.getLink('value_index').click() >>> print browser.contents <... ...Interface...zope.size.interfaces.ISized... ...Field Name...sizeForSorting... ...Field Callable...True...
CHANGES
The 1.2 line supports Zope 3.4/ZODB 3.8. The 1.1 line supports Zope 3.3/ZODB 3.7.
1.2.0 (2007-11-03)
Features added
Updated package meta-data.
zc.catalog now can use 64-bit BTrees (“L”) as provided by ZODB 3.8.
Albertas Agejavas (alga@pov.lt) included the new CallableWrapper, for when the typical Zope 3 index-by-adapter story (zope.app.catalog.attribute) is unnecessary trouble, and you just want to use a callable. See callablewrapper.txt. This can also be used for other indexes based on the zope.index interfaces.
Extents now have a __len__. The current implementation defers to the standard BTree len implementation, and shares its performance characteristics: it needs to wake up all of the buckets, but if all of the buckets are awake it is a fairly quick operation.
A simple ISelfPoulatingExtent was added to the extentcatalog module for which populating is a no-op. This is directly useful for catalogs that are used as implementation details of a component, in which objects are indexed explicitly by your own calls rather than by the usual subscribers. It is also potentially slightly useful as a base for other self-populating extents.
1.1.1 (2007-3-17)
Bugs fixed
‘all_of’ would return all results when one of the values had no results. Reported, with test and fix provided, by Nando Quintana.
1.1 (2007-01-06)
Features removed
The queueing of events in the extent catalog has been entirely removed. Subtransactions caused significant problems to the code introduced in 1.0. Other solutions also have significant problems, and the win of this kind of queueing is qustionable. Here is a run down of the approaches rejected for getting the queueing to work:
_p_invalidate (used in 1.0). Not really designed for use within a transaction, and reverts to last savepoint, rather than the beginning of the transaction. Could monkeypatch savepoints to iterate over precommit transaction hooks but that just smells too bad.
_p_resolveConflict. Requires application software to exist in ZEO and even ZRS installations, which is counter to our software deployment goals. Also causes useless repeated writes of empty queue to database, but that’s not the showstopper.
vague hand-wavy ideas for separate storages or transaction managers for the queue. Never panned out in discussion.
1.0 (2007-01-05)
Bugs fixed
adjusted extentcatalog tests to trigger (and discuss and test) the queueing behavior.
fixed problem with excessive conflict errors due to queueing code.
updated stemming to work with newest version of TextIndexNG’s extensions.
omitted stemming test when TextIndexNG’s extensions are unavailable, so tests pass without it. Since TextIndexNG’s extensions are optional, this seems reasonable.
removed use of zapi in extentcatalog.
0.2 (2006-11-22)
Features added
First release on Cheeseshop.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file zc.catalog-1.2.0.tar.gz
.
File metadata
- Download URL: zc.catalog-1.2.0.tar.gz
- Upload date:
- Size: 56.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82b1ba18205e06832aa87362b47dabcb1cd3a9d266c817918b195e416c028a4f |
|
MD5 | 82e823ccaf8e3d34551b32d2a76e9995 |
|
BLAKE2b-256 | c3acf27ce5a96a31de05c8afc614a81c5634d1dc2376f2532fbef5a5e98e9241 |