Skip to main content

Colav Similarity Search Engine

Project description

Mohan

Colav Similarity using Elastic Search / Pijao - Mohán spirit of water.

Description

This package allows to perform similarity using Colav similarity algorithms and Elastic Search

Installation

Dependencies

Docker and docker-compose is required.

apt install docker-compose

or

pip install docker-compose

Package

pip install mohan

Usage

This package was designed to be used as library, you need import the class Similarity, to create an index, insert documents(works) and perform searches.

The next example is with openalex but it can be used with any dataset.

from mohan.Similarity import Similarity
from pymongo import MongoClient

es_index = "openalex_index"

#creating the instance
s = Similarity(es_index,es_uri= "http://localhost:9200",
                 es_auth = ('elastic', 'colav'))

#taking openalex as example.
openalex = list(MongoClient()["openalexco"]["works"].find())

#example inserting documents to the Elastic Search index.
bulk_size = 100

es_entries = []
counter = 0
or i in openalex:
    work = {}
    work["title"] = i["title"]
    if "primary_location" in i.keys() and i["primary_location"]:
        if i["primary_location"]["source"]:
            work["source"] = i["primary_location"]["source"]["display_name"]
        work["source"] = ""
    else:
        work["source"] = ""
    work["year"] = i["publication_year"]
    work["volume"] = i["biblio"]["volume"]
    work["issue"] = i["biblio"]["issue"]
    work["first_page"] = i["biblio"]["first_page"]
    work["last_page"] = i["biblio"]["last_page"]
    authors = []
    for author in i['authorships']:
        if "display_name" in author["author"].keys():
            authors.append(author["author"]["display_name"])
    work["authors"] = authors
    
    entry = {"_index": es_index,
                "_id": str(i["_id"]),
                "_source": work}
    es_entries.append(entry)
    if len(es_entries) == bulk_size:
        s.insert_bulk(es_entries)
        es_entries = []

example inserting one document from openalex

work = {"title": i["title"],
        "source": i["host_venue"]["display_name"],
        "year": i["publication_year"],
        "authors": authors,
        "volume": i["biblio"]["volume"],
        "issue": i["biblio"]["issue"],
        "page_start": i["biblio"]["first_page"],
        "page_end": i["biblio"]["last_page"]}
res = s.insert_work(_id=str(i["_id"]), work=work)

example performing a search

res = s.search_work(title=i["title"], source = i["host_venue"]["display_name"], year = i["publication_year"],
                    authors = authors, volume = i["biblio"]["volume"], issue = i["biblio"]["issue"], 
                    page_start = i["biblio"]["first_page"], page_end = i["biblio"]["last_page"])

NOTES:

  • The search is performed using the same fields as the insert_work method.
  • The title field can be an array when inserting documents, but it will be used as a string when searching documents.
  • Authors field have to be an array when inserting/searching documents.
  • Extra fields can be added to the insert methods, but they will not be used for the search. Only the fields (title, source, year, authors, volume, issue, page_start, page_end) will be used for the search.

License

BSD-3-Clause License

Links

http://colav.udea.edu.co/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mohan-0.0.7a0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Mohan-0.0.7a0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file mohan-0.0.7a0.tar.gz.

File metadata

  • Download URL: mohan-0.0.7a0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for mohan-0.0.7a0.tar.gz
Algorithm Hash digest
SHA256 7a078bdc9e784ec654f17893321e9adbc471c62537c78098ed2e95f83296a8b5
MD5 2b028b5cdaaa348faad37b96e00c3afb
BLAKE2b-256 dc65510cedd91ca53127c7e08ec28363ac98d56341603d2bf90eefc036062273

See more details on using hashes here.

File details

Details for the file Mohan-0.0.7a0-py3-none-any.whl.

File metadata

  • Download URL: Mohan-0.0.7a0-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for Mohan-0.0.7a0-py3-none-any.whl
Algorithm Hash digest
SHA256 ffc5e20994ed9f78b6a35803aabebfe34609e9a21a2c4566bae166f6ef057689
MD5 88ab73989ab567adcecdc409e62991e3
BLAKE2b-256 3df376a09db2a7814546275067e063e5d4f13b5922e1063cf4e7067e528874fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page