Feature collections for DossierStack
Project description
`dossier.fc` is a Python package that provides an implementation of feature
collections. Abstractly, a feature collection is a map from feature name to
a feature. While any type of feature can be supported, the core focus of this
package is on *multisets* or "bags of words" (BOW). In this package, the
default multiset implementation is a `StringCounter`, which maps Unicode
strings to counts.
This package also includes utilities for serializing and deserializating
collections of feature collections to CBOR (Concise Binary Object
Representation).
### Installation
`dossier.fc` is on PyPI and can be installed with `pip`:
```bash
pip install dossier.fc
```
Currently, `dossier.fc` requires Python 2.7. It is not yet Python 3 compatible.
### Documentation
API documentation with examples is available as part of the Dossier Stack
documentation:
[http://dossier-stack.readthedocs.org](http://dossier-stack.readthedocs.org#module-dossier.fc)
### Examples and basic usage
By default, feature collections use `StringCounter` for feature representation:
```python
from dossier.fc import FeatureCollection
fc = FeatureCollection()
fc['NAME']['alice'] += 1
fc['NAME']['bob'] = 5
print type(fc['NAME'])
# output: StringCounter
```
As the name suggests, a `StringCounter` is a subclass of the `Counter` class
found in the Python standard library `collections` module. As such, it supports
binary operations like addition, subtraction, equality testing, etc.
The `FeatureCollection` class also provides these binary operations which are
performed on corresponding features. For example:
```python
from dossier.fc import FeatureCollection
fc1 = FeatureCollection({'NAME': {'alice': 1, 'bob': 1}})
fc2 = FeatureCollection({'NAME': {'alice': 1}})
fc3 = fc1 + fc2
assert fc3 == FeatureCollection({'NAME': {'alice': 2, 'bob': 1}})
```
collections. Abstractly, a feature collection is a map from feature name to
a feature. While any type of feature can be supported, the core focus of this
package is on *multisets* or "bags of words" (BOW). In this package, the
default multiset implementation is a `StringCounter`, which maps Unicode
strings to counts.
This package also includes utilities for serializing and deserializating
collections of feature collections to CBOR (Concise Binary Object
Representation).
### Installation
`dossier.fc` is on PyPI and can be installed with `pip`:
```bash
pip install dossier.fc
```
Currently, `dossier.fc` requires Python 2.7. It is not yet Python 3 compatible.
### Documentation
API documentation with examples is available as part of the Dossier Stack
documentation:
[http://dossier-stack.readthedocs.org](http://dossier-stack.readthedocs.org#module-dossier.fc)
### Examples and basic usage
By default, feature collections use `StringCounter` for feature representation:
```python
from dossier.fc import FeatureCollection
fc = FeatureCollection()
fc['NAME']['alice'] += 1
fc['NAME']['bob'] = 5
print type(fc['NAME'])
# output: StringCounter
```
As the name suggests, a `StringCounter` is a subclass of the `Counter` class
found in the Python standard library `collections` module. As such, it supports
binary operations like addition, subtraction, equality testing, etc.
The `FeatureCollection` class also provides these binary operations which are
performed on corresponding features. For example:
```python
from dossier.fc import FeatureCollection
fc1 = FeatureCollection({'NAME': {'alice': 1, 'bob': 1}})
fc2 = FeatureCollection({'NAME': {'alice': 1}})
fc3 = fc1 + fc2
assert fc3 == FeatureCollection({'NAME': {'alice': 2, 'bob': 1}})
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dossier.fc-0.3.5.tar.gz
(24.8 kB
view details)
Built Distribution
dossier.fc-0.3.5-py2.7.egg
(76.4 kB
view details)
File details
Details for the file dossier.fc-0.3.5.tar.gz
.
File metadata
- Download URL: dossier.fc-0.3.5.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0bcb2a469ee5236fbe169b505316397532e96483554a9058b1675ee2a349245 |
|
MD5 | a134bca51a9d4b71709dd3435593bcc3 |
|
BLAKE2b-256 | ac8a96b7677b823fba209edfefb9b14d12e08b5b7d9822b71069b264234e6887 |
File details
Details for the file dossier.fc-0.3.5-py2.7.egg
.
File metadata
- Download URL: dossier.fc-0.3.5-py2.7.egg
- Upload date:
- Size: 76.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1e00faeeda45a5a08c2b2729e6b8b378a582eba0d3681ef16f1a6f98ca9fdd5 |
|
MD5 | 7a00a62325905acc1353b94b4e4dea1e |
|
BLAKE2b-256 | 8261f87eb1329cf6bfd7436a6e2d286544ec6a7b1600d4a3e7d20883dcf614ae |