Skip to main content

Serverless full text search in Python

Project description

Locasticsearch

Serverless full text search in Python

⚠️ alpha status: 🚧 Come back in a couple weekends 🚧

Locasticsearch provides serverless full text search powered by sqlite full text search capabilities but trying to be compatible with (a subset of) the elasticsearch API.

That way you can comfortably develop your text search appplication without needing to set up services and smoothly transition to Elasticsearch for scale or more features without changing your code.

That said, if you are only doing basic search operations within the subset supported by this library, and dont have a lot of documents (~million) that would justify going for a cluster deployment, Locasticsearch can be a faster alternative to Elasticsearch.

Test Publish Coverage Package version Python Versions

Getting started

from locasticsearch import Locasticsearch
from datetime import datetime

es = Locasticsearch()

doc = {
    "author": "kimchy",
    "text": "Elasticsearch: cool. bonsai cool.",
    "timestamp": datetime(2010, 10, 10, 10, 10, 10),
}
res = es.index(index="test-index", doc_type="tweet", id=1, body=doc)

res = es.get(index="test-index", doc_type="tweet", id=1)
print(res["_source"])

# this will get ignored in Locasticsearch
es.indices.refresh(index="test-index")

res = es.search(index="test-index", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res["hits"]["total"]["value"])
for hit in res["hits"]["hits"]:
    print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])

We are also adding a simplified API that can be converted to Elasticsearch.

Features

  • 💯% local, no server management
  • ✨ Lightweight pure python, no external dependencies
  • ⚡ Super fast searches thanks to sqlite full text search capabilities
  • 🔗 No lock in. Thanks to the API compatiblity with the official client, you can smoothly transition to Elasticsearch for scale or more features without changing your code.

Install

pip install locasticsearch

To use or not to use

You should NOT use Locasticsearch if:

  • you are deploying a security sensitive application. Locasticsearch code is very prone to SQL injection attacks. This should improve in future releases.
  • Your searches are more complicated than what you would find in a 5 min Elasticsearch tutorial. Elasticsearch has a huge API and it is very unlikely that we can support even a sizable portion of that.
  • You hate buggy libraries. Locasticsearch is a very young project so bugs are guaranteed. You can check the tests to see if your needs are covered.

You should use Locasticsearch if:

  • you dont want a docker or an elasticsearch service using precious resources in your laptop
  • you only need basic text search and Elasticsearch would be overkill
  • you want very easy deployments that only involve pip installs
  • using Java from a python program makes you feel dirty

Next steps

  • [] Add a real query DSL parsing
  • [] Add simplified non ES compatible interface for easy JSON ingestion, querying
  • [] Document supported vs unsupported query types

Comparison to similar libraries

whoosh

The most full featured pure python text search library by far:

  • 👍 Supports highlight, analyzers, query expansion, several ranking functions, ...
  • 👎 Unmaintained for a long time might see a revival at https://github.com/whoosh-community/whoosh
  • 👍 Pure python so doesnt scale as well (still fast enough for small medium datasets)

elasticsearch

The big champion of full text search. This is what you should be using in production:

  • 👍 Lots of features to accomodate any use case
  • 👍 Battle tested, scalable, performant
  • 👎 Non python native: more complex to deploy/integrate with python project for easy use cases

django haystack

Django Haystack provides an unified API that allows you to plug in different search backends (such as Solr, Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code:

  • 👍 Many features, boosting, highlight, autocomplete (some backend dependent though)
  • 👍 Possibility to switch backends
  • 👎 Library lock in.
  • 👎 Despite supporting several backends, Whoosh is the only one that is python native.

xapian

  • 👍 Very fast and full featured (C++)
  • 👎 No pip installable (needs system level compilation)
  • 👎 The python bindings and the documentation are not that user friendly

gensim

While gensim focuses on topic modeling you can use TfidfModel and SparseMatrixSimilarity for text search. That said this is doesnt use an inverted index (linear search) so it has limited scalability.

  • 👍 Unique features such as approximate search
  • 👎 Focus is on topic modeling, so no intuitive APIs for full text ingestion/search
  • 👎 Doesnt support inverted indexes search (mostly full scan and approximate)

peewee

Peewee is actually a more general ORM but offers abstractions to use full text search on Sqlite.

  • 👍 Support for full text search using several SQL backends (no elasticsearch though)
  • 👍 Custom ranking and analyzer functions
  • 👎 No elasticsearch compatible API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locasticsearch-0.0.3.tar.gz (16.5 kB view hashes)

Uploaded Source

Built Distribution

locasticsearch-0.0.3-py3-none-any.whl (15.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page