This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

What is Caterpillar?

Caterpillar is a pure python text indexing and analytics library. Some features include:

  • pluggable key/value object store for storage (currently only implementation is SQLite)
  • transaction layer for reading/writing (along with associated locking semantics)
  • supports searching indexes with some built in scoring algorithm implementations (including TF/IDF)
  • stores additional data structures for analytics above and beyond traditional information retrieval data structures
  • has a plugin architecture for quickly accessing the data structures and performing custom analytics
  • has 100% test coverage

Quick Example

Quick example of using caterpillar below:

import os
import tempfile

from caterpillar.processing.index import IndexWriter, IndexConfig
from caterpillar.processing.schema import TEXT, Schema, NUMERIC
from caterpillar.storage.sqlite import SqliteStorage

index_dir = os.path.join(tempfile.mkdtemp(), "examples")
with open('caterpillar/test_resources/moby.txt', 'r') as f:
    data = f.read()
    with IndexWriter(index_dir, IndexConfig(SqliteStorage, Schema(text=TEXT, some_number=NUMERIC))) as writer:
        writer.add_document(text=data, some_number=1)

Installation

pip install caterpillar

Documentation

The documentation can be found here.

Roadmap

We are working on porting our issues from our internal issue tracker over to a more visible system. But, for the time being, here is a general roadmap:

  • Move to (possibly only) Python 3 (see below).
  • Revamp schema and field design.
  • Add a memory storage implementation.
  • Revamp query design.
  • Remove the NLTK dependency (great library, but only used for tokenisation).
  • Switch index structures over to a more efficient data structure (possibly numpy arrays or similar).

The current plan is to move to using GitHub issues with HuBoard, but stay tuned.

Python Version

Currently Python 2.7+ only. Work is underway to support Python 3+. WARNING: Caterpillar might become Python 3+ only in the future. Stay tuned.

Contributors

Anyone who is willing! In other words none yet, but we are more then accepting of contributions.

Contributing

Not code will be merged unless it has 100% test coverage and passes pep8. We code with a line length of 120 characters (see tox.ini [pep8] section) and we use py.test for testing. Tests are in a test sub-folder in each package. We generally run coverage as follows:

coverage erase; coverage run --source caterpillar -m py.test -v caterpillar; coverage report
Release History

Release History

1.0.0.dev12

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev11

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev10

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev9

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev7

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.dev1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
caterpillar-1.0.0.dev12.linux-x86_64.tar.gz (89.8 kB) Copy SHA256 Checksum SHA256 Source Nov 30, 2016
caterpillar-1.0.0.dev12-py2-none-any.whl (58.0 kB) Copy SHA256 Checksum SHA256 py2 Wheel Nov 30, 2016
caterpillar-1.0.0.dev12.tar.gz (46.0 kB) Copy SHA256 Checksum SHA256 Source Nov 30, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting