STAM is a library for dealing with standoff annotations on text
Project description
STAM Python binding
STAM is a data model for stand-off text annotation and described in detail here. This is a python library (to be more specific; a python binding written in Rust) to work with the model.
This library offers a higher-level interface than the underlying Rust library. Implementation is currently in a preliminary stage. We aim to implement the full model and most extensions.
Installation
$ pip install stam
Or if you feel adventurous and have the necessary build-time dependencies installed (Rust), you can try the latest development release from Github:
$ pip install git+https://github.com/annotation/stam-python
Documentation
- STAM Specification - the STAM specification itself
- API Reference
- STAM Tutorial: Standoff Text Annotation for Pythonistas - An extensive tutorial showing how to work with this Python library, in the form of a Jupyter Notebook. Recommended!
Usage
Import the library
import stam
Loading a STAM JSON (or CSV) file containing an annotation store:
store = stam.AnnotationStore(file="example.stam.json")
The annotation store is your workspace, it holds all resources, annotation sets (i.e. keys and annotation data) and of course the actual annotations. It is a memory-based store and you can put as much as you like into it (as long as it fits in memory).
You can optionally pass configuration parameters upon loading a store, as follows:
store = stam.AnnotationStore(file="example.stam.json", config={"debug": True})
Once loaded, you can retrieving anything by its public ID:
annotation = store.annotation("my-annotation")
resource = store.resource("my-resource")
annotationset = store.annotationset("my-annotationset")
key = annotationset.key("my-key")
data = annotationset.annotationdata("my-data")
You can also iterating through all annotations in the store, and outputting a simple tab separated format:
for annotation in store.annotations():
# get the text to which this annotation refers (if any)
try:
text = str(annotation)
except stam.StamError:
text = "n/a"
for data in annotation:
print("\t".join(( annotation.id(), data.key().id(), str(data.value()), text)))
Adding a resource:
resource = store.add_resource(filename="my-text.txt")
Create a store and annotations from scratch:
from stam import AnnotationStore, Selector, AnnotationDataBuilder
store = AnnotationStore(id="test")
resource = store.add_resource(id="testres", text="Hello world")
store.annotate(id="A1",
target=Selector.textselector(resource, Offset.simple(6,11)),
data={ "id": "D1", "key": "pos", "value": "noun", "set": "testdataset"})
In the above example, the AnnotationDataSet
, DataKey
and AnnotationData
are created on-the-fly. You can also create them explicitly within the set first, as shown in the
next snippet, resulting in the exact same store:
store = AnnotationStore(id="test")
resource = store.add_resource(id="testres", text="Hello world")
annotationset = store.add_annotationset(id="testdataset")
annotationset.add_key("pos")
data = annotationset.add_data("pos","noun","D1")
store.annotate(id="A1",
target=Selector.textselector(resource, Offset.simple(6,11)),
data=data)
Providing the full data dictionary as in the earlier example would have
also worked fine, with the same end result, but would be less performant than passing an AnnotationData
instance directly.
The implementation will always ensure any already existing AnnotationData
will be reused if
possible, as not duplicating data is one of the core characteristics of the
STAM model.
You can serialize the entire annotation store (including all sets and annotations) to a STAM JSON file:
store.set_filename("example.stam.json")
store.save()
For more documentation, please read: STAM Tutorial: Standoff Text Annotation for Pythonistas.
Differences between the rust library and python library and performance considerations
Although this Python binding builds on the Rust library, the API it exposes differs in certain aspects to make it more pythonic and easier to work with. This results in a higher-level API that hides some of the lower-level details that are present in the Rust library. This approach does come at the cost of causing some additional runtime overhead.
The Rust methods will often return iterators, references or handles whenever they can, moreover it will do so safely. The Python API is often forced to make a local copy. For iterators we often decide to let the entire underlying Rust iterator run its course and then return the result as a whole as a tuple, rather than return a Python generator. Here you gain some speed at the cost of some memory.
Probably needless to say, but using Rust directly will always be more performant than using this Python binding. However, using this Python binding should still be way more performant than if the whole thing were implemented in native Python. The trick is in letting the binding work for you as much as possible, use higher-level methods whenever they are available rather than implementing your logic in Python.
Acknowledgements
This work is conducted at the KNAW Humanities Cluster's Digital Infrastructure department, and funded by the CLARIAH project (CLARIAH-PLUS, NWO grant 184.034.023) as part of the FAIR Annotations track.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for stam-0.2.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6be3c9c6778e9085aa1ea3def2a5beedfdaab289f4c221c74e8bd0ede663bef |
|
MD5 | bd5e56bd79e93f2c4024d864d6bed52d |
|
BLAKE2b-256 | 04f61da815cd9a57ec0a55142eac58c75bcfbdf973d3dad4149f4f89fa5e6287 |
Hashes for stam-0.2.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 705f872fe022b883576ff1e06c7dafccecf5bfcb73f81d0dc2fa697ff1e9f8ae |
|
MD5 | 8ca0768d020d54eed6a8f7156bea26d4 |
|
BLAKE2b-256 | 64f107987faeaa6e269513ba923b7a25bc517ccfca8f3a6a34922807aef3df61 |
Hashes for stam-0.2.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa0676524df5abf1cc6e70a825ccd10413b7f22c307e9021af0c812a11715000 |
|
MD5 | 45b24c62341e91985efb16761a4909ae |
|
BLAKE2b-256 | 094f1726299251d2fb92610a6d9876fe4846dfea944250d9ae965e763df89ea6 |
Hashes for stam-0.2.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b30a57b55ea380eb34edb22a5a5b29b9beafe782c3b7d0bbb2b004b444b81037 |
|
MD5 | 933aa96dba645582f2028d77fe322149 |
|
BLAKE2b-256 | 10e3b580265e8993b25032e782a5139f23e3a79f36ac17b55192f3c49c974e72 |
Hashes for stam-0.2.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bbd65667f59e9516f09bae6aacb2447fcdac4d03d89f37cd4b8db672c29502d |
|
MD5 | 7d37fac545760b04d40986fd2ef777d8 |
|
BLAKE2b-256 | dbe0d2489fe049982ee5141ae11045e4c6c7d287d1c431c8ecf250c5939eea39 |
Hashes for stam-0.2.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77b9a54d3fecc8f164882b2654af395c2003838ffede54f06813f5e87bc75943 |
|
MD5 | 45d96c36cd88d0bcb8cc0d94f2256384 |
|
BLAKE2b-256 | befeb47a2d8c3f30bee315dea2091e95679ec0e3de0f3c7425911445c0c772a2 |
Hashes for stam-0.2.1-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db31a99c306651791609c8be2a0cd7cf01cc60c6be850c10d8ec2829f306ab48 |
|
MD5 | f3aa0736123bf3fcf50b5b272c027274 |
|
BLAKE2b-256 | a5c9e0cebebc6d78ba7a4c74c72d50455ec6c64f1d05a321e77b40bc99e2a198 |
Hashes for stam-0.2.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50aef0a450f1ea72d8818721f050e4ff971c80bb4cd74a8cb1723dcd8bc2276d |
|
MD5 | 137d14dc04a310f1185dc5584cea26f4 |
|
BLAKE2b-256 | 0c1b5d2921bb41e6627867af5eeeca941178f5a9ee5e57acf54ff80279b17647 |
Hashes for stam-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6810d6986a806cb109a46a5555fa073326775da0c187a63b10aff2995cf487f |
|
MD5 | e041efc1fb59dd6a4639109962a4efbb |
|
BLAKE2b-256 | f929697bad4ff8596aa89654e72d4140ec9ec68f974e4c4b0328f835cd153f31 |
Hashes for stam-0.2.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63914de2367332a3ddace9ecb8e7e0157e11031729c52939bbc22987753c7616 |
|
MD5 | f83559617af81f3e726de1cafc1df833 |
|
BLAKE2b-256 | 565dff9c2b1c373c55ccca547be79173fc2739cd0f4587f1dcf55d141567c379 |
Hashes for stam-0.2.1-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 954e8218d097b6abbfbd0fe3f98b08330dd48a7a8cd2082c459fba82c6b7f4d4 |
|
MD5 | 8b5c4b1689f85eb3f6b3d85115174537 |
|
BLAKE2b-256 | 94d067935da8570dc8f7933eb457efd98400a4b548b28e648bbf546929d193d9 |
Hashes for stam-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd1e1f86694a9d62e8e27e31d94ca47208636fea156cf987f1b0dce17edebcd8 |
|
MD5 | f3d0cf61fb8a7f4d2376ba44036b892a |
|
BLAKE2b-256 | 3a8b2dc8567d06e72e188bbbb4fb0980865678fb3b8c772c571da1d3e46bd74f |
Hashes for stam-0.2.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 488071a29a0a338f4ca343d4ef77ad39a07fd89eca4c44dd507b493342b84ee7 |
|
MD5 | 603ac36648c9affd1f56bc6c0dac04ef |
|
BLAKE2b-256 | d7af55f4edc01bedbf4d67ec66f9932ab5851cc2f5d64ee1795b2c69c0638756 |
Hashes for stam-0.2.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc64b1f9149cfa01f3f16e7dd4bed8a66b88c16b1bfaf02056e281e7f97011cf |
|
MD5 | 6af9d2cd1059b3ecfeaae57a00645b72 |
|
BLAKE2b-256 | 7d398b813f0aa710419ff7365f31d9546a6750eac0d66e39446e4ff7d9c4548f |
Hashes for stam-0.2.1-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4a917ba7688e777cc63aa7fd8b7c3e3e55e672e2e6761ae12debb6147edca46 |
|
MD5 | 396943041baa39a97485dafbff483985 |
|
BLAKE2b-256 | d4ef5ff33c99e8bcaba83760b6def8c7a00a68fe122e679249dd51da6d5ba3db |
Hashes for stam-0.2.1-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69e9fa0da21ec84c0cd177a68fd13552b42a2d920595f5323ffb266ff4636b0c |
|
MD5 | 42750dd841eac421f9f6847862dd59b6 |
|
BLAKE2b-256 | e4cd799bede5ad27991661ff4696417d1b89f69c609efa737bec9027425278ee |
Hashes for stam-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74fe201c87e899b285ac68656bdb67e56929b1f7ae4753679771444eb8652d5a |
|
MD5 | ed82110f1fc0045df071b3ed28598640 |
|
BLAKE2b-256 | 515bb997523fa89e7e2ea0cfca829eec408b3cf321dde60b6aeb71c83c33c410 |
Hashes for stam-0.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 545327ca77b3950375b000ea8e12601ea17c421579d8f9e42e254f81246472ee |
|
MD5 | 0d63cb922a7e5ba34fa660538532635d |
|
BLAKE2b-256 | c0a26788569feae540dad0bad20b4a251113309f5cf31fa3fe1f8ef072f65ffb |
Hashes for stam-0.2.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cac9a5c906da48ab0f32b16927085f89fb017dbe6e24cc0481dbe4f4523464c3 |
|
MD5 | 06696b56d95f625367394374951ef833 |
|
BLAKE2b-256 | ef9fece8739a5c7c6d6ac3ca4146871b3fb4655ce80ed8cd51b809ac02fdff66 |
Hashes for stam-0.2.1-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d102d71c90e98bf60b2f924aa5fe7855dece1e4e40b8a3c08db7e5dd607f7a4 |
|
MD5 | d7d0005f5547d969f73b5d2a61def306 |
|
BLAKE2b-256 | e8199a2422c68c8dd8bf1c9777501153c854125c3f7124a7054df2fcfd7b0c09 |
Hashes for stam-0.2.1-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8f895db36b7c9af4ae5ab310fa408192ce09266e0d47b26cda0d1da5b130180 |
|
MD5 | f9d7e857ee6821d2eb3123aa636c4732 |
|
BLAKE2b-256 | fa66582e5fabfaf4776ac81db4911c3359a002cb0ed20a813de709f970f7ef33 |
Hashes for stam-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37178702a5a6b87b1923d2c9602248e4a64398e4d3acf266ace2ece68d6b855f |
|
MD5 | 9c7bfb5f2f3fa75b6f503789233b9ce4 |
|
BLAKE2b-256 | 6f9e17614c037184b61c0f58d2ccea96b700cd02511e8253d2757e0d35d6918f |
Hashes for stam-0.2.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfc9b050130a96b14f8be4538f6c18ec2e4bddb88ed8462de7b70654be8c9dad |
|
MD5 | c58843ca412400c3aa3f5a63944508d1 |
|
BLAKE2b-256 | 68fa2e78ffa68cfa1d2bdb7a73238d165272ffb0f56506df4cfa5d8dee4b7c26 |
Hashes for stam-0.2.1-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c40add5103dddd9788d29a0398b6bc200e036d9e894bdc07c79eb1adc2ac4909 |
|
MD5 | 50f811067b149c74383b0f49f54cc2ef |
|
BLAKE2b-256 | f2d3428d1673095d261c65088b989bd4a286e45ec124102251c8a3c6c198cd67 |
Hashes for stam-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81e7a423cb6b314f5753ff0aa607e9cb44111cb3c01cefc35dda9b28a184800b |
|
MD5 | 3e57bc632bfcdb02f930fcd5615063dd |
|
BLAKE2b-256 | 965ab8c422a10e8d77bb22ee7e62775c549dbbe14c6d9df7140c0ecdf4ff4ae5 |
Hashes for stam-0.2.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e44144399736f978eca0b594bcb5506f37682010f646af992dacfa558cbca661 |
|
MD5 | d5a2afba19d103125d4b85970b396566 |
|
BLAKE2b-256 | 12a32783c4d87e67ffd3839d28f4e56983244b7c43178dd9f6180d6b9cc6f56e |
Hashes for stam-0.2.1-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c03bb5144cdb1c8d07e4e3c1f863fa17c305b6e7859ea44f3ef23f5c24d516df |
|
MD5 | dea3046a6be4d153445ca62c6ba93499 |
|
BLAKE2b-256 | e016bfbb3686095292a2e60779c9868b552b43a1d3e6a5df29c690ea6a32a2c4 |
Hashes for stam-0.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d0458f48ad3432785f6cd799a8570da75c2443eb4da851e68cecb092a86fa34 |
|
MD5 | 233e7766b0962de3b9abd59b465b9ac6 |
|
BLAKE2b-256 | 62e1fc922dfe6acf940c6a1ebad4bb06ee5ee3bbf71452f23132da2ecdae9fbc |
Hashes for stam-0.2.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb49ebbe8f54e13970528b65d4db28456c74f4456c35dabf62ca8d98bc8fc388 |
|
MD5 | c83c5e69d5eb4b7aa40f80038dc50ae4 |
|
BLAKE2b-256 | a4ef0f34e7d0b235f092db04ac84b2beb885e7dbcaf0ffe6f0997bbacf545ee6 |