Skip to main content

Colav Similarity Search Engine

Project description

Mohan

Colav Similarity using Elastic Search / Pijao - Mohán spirit of water.

Description

This package allows to perform similarity using Colav similarity algorithms and Elastic Search

Installation

Dependencies

Docker and docker-compose is required.

apt install docker-compose

or

pip install docker-compose

Package

pip install mohan

Usage

This package was designed to be used as library, you need import the class Similarity, to create an index, insert documents(works) and perform searches.

The next example is with openalex but it can be used with any dataset.

from mohan.Similarity import Similarity
from pymongo import MongoClient

es_index = "openalex_index"

#creating the instance
s = Similarity(es_index,es_uri= "http://localhost:9200",
                 es_auth = ('elastic', 'colav'))

#taking openalex as example.
openalex = list(MongoClient()["openalexco"]["works"].find())

#example inserting documents to the Elastic Search index.
bulk_size = 100

es_entries = []
counter = 0
or i in openalex:
    work = {}
    work["title"] = i["title"]
    if "primary_location" in i.keys() and i["primary_location"]:
        if i["primary_location"]["source"]:
            work["source"] = i["primary_location"]["source"]["display_name"]
        work["source"] = ""
    else:
        work["source"] = ""
    work["year"] = i["publication_year"]
    work["volume"] = i["biblio"]["volume"]
    work["issue"] = i["biblio"]["issue"]
    work["first_page"] = i["biblio"]["first_page"]
    work["last_page"] = i["biblio"]["last_page"]
    authors = []
    for author in i['authorships']:
        if "display_name" in author["author"].keys():
            authors.append(author["author"]["display_name"])
    work["authors"] = authors
    
    entry = {"_index": es_index,
                "_id": str(i["_id"]),
                "_source": work}
    es_entries.append(entry)
    if len(es_entries) == bulk_size:
        s.insert_bulk(es_entries)
        es_entries = []

example inserting one document from openalex

work = {"title": i["title"],
        "source": i["host_venue"]["display_name"],
        "year": i["publication_year"],
        "authors": authors,
        "volume": i["biblio"]["volume"],
        "issue": i["biblio"]["issue"],
        "page_start": i["biblio"]["first_page"],
        "page_end": i["biblio"]["last_page"]}
res = s.insert_work(_id=str(i["_id"]), work=work)

example performing a search

res = s.search_work(title=i["title"], source = i["host_venue"]["display_name"], year = i["publication_year"],
                    authors = authors, volume = i["biblio"]["volume"], issue = i["biblio"]["issue"], 
                    page_start = i["biblio"]["first_page"], page_end = i["biblio"]["last_page"])

NOTES:

  • The search is performed using the same fields as the insert_work method.
  • The title field can be an array when inserting documents, but it will be used as a string when searching documents.
  • Authors field have to be an array when inserting/searching documents.
  • Extra fields can be added to the insert methods, but they will not be used for the search. Only the fields (title, source, year, authors, volume, issue, page_start, page_end) will be used for the search.

License

BSD-3-Clause License

Links

http://colav.udea.edu.co/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mohan-0.0.6a0.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

Mohan-0.0.6a0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file mohan-0.0.6a0.tar.gz.

File metadata

  • Download URL: mohan-0.0.6a0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for mohan-0.0.6a0.tar.gz
Algorithm Hash digest
SHA256 84ef028f7acdb387255fefeaf8b587c3dd3fa5d8a680f3e986aa43fb94115ca0
MD5 5343c58db0c685adcd09e7eba472edf0
BLAKE2b-256 eab52e3712ed87ce7c8eebcd8f1a32aee775d77b6e70d9632b6ed1fe68d72c10

See more details on using hashes here.

File details

Details for the file Mohan-0.0.6a0-py3-none-any.whl.

File metadata

  • Download URL: Mohan-0.0.6a0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for Mohan-0.0.6a0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa7c0dff647771f0607465f379966f74d2abbf7120acf754327cbd4cf9952f9f
MD5 23919823c7855ca7900342b5015cf18f
BLAKE2b-256 13bcef493ec2b7dff1bb6cc33308db861d4b7a84154a9d46ff6b42f19ed93c9e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page