Skip to main content

Serverless full text search in Python

Project description

Locasticsearch

Serverless full text search in Python

⚠️ alpha status: 🚧 Come back in a couple weekends 🚧

Locasticsearch provides serverless full text search powered by sqlite full text search capabilities but trying to be compatible with (a subset of) the elasticsearch API.

That way you can comfortably develop your text search appplication without needing to set up services and smoothly transition to Elasticsearch for scale or more features without changing your code.

That said, if you are only doing basic search operations within the subset supported by this library, and dont have a lot of documents (~million) that would justify going for a cluster deployment, Locasticsearch can be a faster alternative to Elasticsearch.

Test Publish Coverage Package version Python Versions

Getting started

from locasticsearch import Locasticsearch
from datetime import datetime

es = Locasticsearch()

doc = {
    "author": "kimchy",
    "text": "Elasticsearch: cool. bonsai cool.",
    "timestamp": datetime(2010, 10, 10, 10, 10, 10),
}
res = es.index(index="test-index", doc_type="tweet", id=1, body=doc)

res = es.get(index="test-index", doc_type="tweet", id=1)
print(res["_source"])

# this will get ignored in Locasticsearch
es.indices.refresh(index="test-index")

res = es.search(index="test-index", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res["hits"]["total"]["value"])
for hit in res["hits"]["hits"]:
    print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])

We are also adding a simplified API that can be converted to Elasticsearch.

Features

  • 💯% local, no server management
  • ✨ Lightweight pure python, no external dependencies
  • ⚡ Super fast searches thanks to sqlite full text search capabilities
  • 🔗 No lock in. Thanks to the API compatiblity with the official client, you can smoothly transition to Elasticsearch for scale or more features without changing your code.

Install

pip install locasticsearch

To use or not to use

You should NOT use Locasticsearch if:

  • you are deploying a security sensitive application. Locasticsearch code is very prone to SQL injection attacks. This should improve in future releases.
  • Your searches are more complicated than what you would find in a 5 min Elasticsearch tutorial. Elasticsearch has a huge API and it is very unlikely that we can support even a sizable portion of that.
  • You hate buggy libraries. Locasticsearch is a very young project so bugs are guaranteed. You can check the tests to see if your needs are covered.

You should use Locasticsearch if:

  • you dont want a docker or an elasticsearch service using precious resources in your laptop
  • you only need basic text search and Elasticsearch would be overkill
  • you want very easy deployments that only involve pip installs
  • using Java from a python program makes you feel dirty

Next steps

  • [] Add a real query DSL parsing
  • [] Add simplified non ES compatible interface for easy JSON ingestion, querying
  • [] Document supported vs unsupported query types

Comparison to similar libraries

whoosh

The most full featured pure python text search library by far:

  • 👍 Supports highlight, analyzers, query expansion, several ranking functions, ...
  • 👎 Unmaintained for a long time might see a revival at https://github.com/whoosh-community/whoosh
  • 👍 Pure python so doesnt scale as well (still fast enough for small medium datasets)

elasticsearch

The big champion of full text search. This is what you should be using in production:

  • 👍 Lots of features to accomodate any use case
  • 👍 Battle tested, scalable, performant
  • 👎 Non python native: more complex to deploy/integrate with python project for easy use cases

django haystack

Django Haystack provides an unified API that allows you to plug in different search backends (such as Solr, Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code:

  • 👍 Many features, boosting, highlight, autocomplete (some backend dependent though)
  • 👍 Possibility to switch backends
  • 👎 Library lock in.
  • 👎 Despite supporting several backends, Whoosh is the only one that is python native.

xapian

  • 👍 Very fast and full featured (C++)
  • 👎 No pip installable (needs system level compilation)
  • 👎 The python bindings and the documentation are not that user friendly

gensim

While gensim focuses on topic modeling you can use TfidfModel and SparseMatrixSimilarity for text search. That said this is doesnt use an inverted index (linear search) so it has limited scalability.

  • 👍 Unique features such as approximate search
  • 👎 Focus is on topic modeling, so no intuitive APIs for full text ingestion/search
  • 👎 Doesnt support inverted indexes search (mostly full scan and approximate)

peewee

Peewee is actually a more general ORM but offers abstractions to use full text search on Sqlite.

  • 👍 Support for full text search using several SQL backends (no elasticsearch though)
  • 👍 Custom ranking and analyzer functions
  • 👎 No elasticsearch compatible API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locasticsearch-0.0.3.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

locasticsearch-0.0.3-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file locasticsearch-0.0.3.tar.gz.

File metadata

  • Download URL: locasticsearch-0.0.3.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.4 Darwin/17.7.0

File hashes

Hashes for locasticsearch-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7950489b27330fd77db67b869298dcbe3750884c1d86c28076a3880208c84dea
MD5 eb7688c42a3f29a3f8c0f7085d409018
BLAKE2b-256 a06fc683d02e7f5dbd2816f684549d02d01ec247ae817a6a73990d59382e6d68

See more details on using hashes here.

File details

Details for the file locasticsearch-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: locasticsearch-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.4 Darwin/17.7.0

File hashes

Hashes for locasticsearch-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dc691c49ab7ba3e0968f0cb9b0730641904ad1e4809710243e97ac54aeb279b0
MD5 e9e8364a018b6ffc9e465367705d2272
BLAKE2b-256 6525bddc0f8c74d3d2e1c67f60bda73503fe19394a4fad2aaee0976efe35830b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page