Skip to main content

A tool to work with any format for annotating vocalizations

Project description



a tool to work with any format for annotating vocalizations

Build Status Documentation Status DOI PyPI version codecov

crowsetta is a tool to work with any format for annotating vocalizations: speech, birdsong, mouse ultrasonic calls (insert your favorite animal vocalization here). The goal of crowsetta is to make sure that your ability to work with a dataset of vocalizations does not depend on your ability to work with any given format for annotating that dataset. What crowsetta gives you is not yet another format for annotation (I promise!); instead you get some nice data types that make it easy to work with any format: namely, Sequences made up of Segments.

    >>> from crowsetta import Segment, Sequence
    >>> a_segment = Segment.from_keyword(
    ...     label='a',
    ...     onset_ind=16000,
    ...     offset_ind=32000,
    ...     file='bird21.wav'
    ...     )
    >>> list_of_segments = [a_segment] * 3
    >>> seq = Sequence(segments=list_of_segments)
    >>> print(seq)
    Sequence(segments=[Segment(label='a', onset_s=None, offset_s=None, onset_ind=16000,
    offset_ind=32000, file='bird21.wav'), Segment(label='a', onset_s=None, offset_s=None,
    onset_ind=16000, offset_ind=32000, file='bird21.wav'), Segment(label='a', onset_s=None,
    offset_s=None, onset_ind=16000, offset_ind=32000, file='bird21.wav')])

You can load annotation from your format of choice into Sequences of Segments (most conveniently with the Transcriber, as explained below) and then use the Sequences however you need to in your program.

For example, if you want to loop through the Segments of each Sequences to pull syllables out of a spectrogram, you can do something like this, very Pythonically:

   >>> syllables_from_sequences = []
   >>> for a_seq in seq:
   ...     seq_dict = seq.to_dict()  # convert to dict with
   ...     spect = some_spectrogram_making_function(seq['file'])
   ...     syllables = []
   ...     for seg in seq.segments:
   ...         syllable = spect[:, seg.onset:seg.offset]  ## spectrogram is a 2d numpy array
   ...         syllables.append(syllable)
   ...     syllables_from_sequences.append(syllables)

As mentioned above, crowsetta provides you with a Transcriber that comes equipped with convenience functions to do the work of converting for you.

    from crowsetta import Transcriber
    scribe = Transcriber()
    seq = scribe.to_seq(file=notmat_files, format='notmat')

You can even easily adapt the Transcriber to use your own in-house format, like so:

    from crowsetta import Transcriber
    scribe = Transciber(user_config=your_config)
    scribe.to_csv(file_'your_annotation_file.mat',
                  csv_filename='your_annotation.csv')

Features

  • convert annotation formats to Sequence objects that can be easily used in a Python program
  • convert Sequence objects to comma-separated value text files that can be read on any system
  • load comma-separated values files back into Python and convert to other formats
  • easily use with your own annotation format

You might find it useful in any situation where you want to share audio files of song and some associated annotations, but you don't want to require the user to install a large application in order to work with the annotation files.

Getting Started

Installation

with pip

$ pip install crowsetta

with conda

$ conda install crowsetta -c conda-forge

Usage

To learn how to use crowsetta, please see the documentation at:
https://crowsetta.readthedocs.io/en/latest/index.html

Development Installation

Currently crowsetta is developed with conda. To set up a development environment:

$ conda create crowsetta-dev
$ conda create -n crowsetta-dev python=3.6 numpy scipy attrs
$ conda activate crowsetta-dev
$ $ pip install evfuncs koumura
$ git clone https://github.com/NickleDave/crowsetta.git
$ cd crowsetta
$ pip install -e .

Project Information

Background

crowsetta was developed for two libraries:

Testing relies on the Vocalization Annotation Formats Dataset which you may find useful if you need small samples of different audio files and associated annotation formats

Support

If you are having issues, please let us know.

Contribute

CHANGELOG

You can see project history and work in progress in the CHANGELOG

License

The project is licensed under the BSD license.

Citation

If you use crowsetta, please cite the DOI: DOI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowsetta-4.0.0.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crowsetta-4.0.0-py3-none-any.whl (125.5 kB view details)

Uploaded Python 3

File details

Details for the file crowsetta-4.0.0.tar.gz.

File metadata

  • Download URL: crowsetta-4.0.0.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.27.1

File hashes

Hashes for crowsetta-4.0.0.tar.gz
Algorithm Hash digest
SHA256 4a4de847708709fc5e6f2e5dc6680a0f29411c9fba9edfab891a7bf66cadb891
MD5 cb9a8404706e707e495c0b043e921a48
BLAKE2b-256 b63bb977d97cbc2c4f2d7b548d52c28907219609e2e2a827babf1dcfd5c86c9e

See more details on using hashes here.

File details

Details for the file crowsetta-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: crowsetta-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 125.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.27.1

File hashes

Hashes for crowsetta-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f650d340975514710738ec6bf7be7ab36bde317d8c91a0e589fbb364d5fca4eb
MD5 67ba7728873f4f3496c0d57788dab039
BLAKE2b-256 c129062fd964934daa9d3e1d9e3c9514c4eec823e8888b96cb2b354bbd26a0ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page