vectorhub-nightly

One liner to encode data into vectors with state-of-the-art models using tensorflow, pytorch and other open source libraries. Word2Vec, Image2Vec, BERT, etc

These details have not been verified by PyPI

Project links

Homepage

Project description

Vector Hub is a library for publication, discovery, and consumption of State-of-the-art models to turn data into vectors. (text2vec, image2vec, video2vec, graph2vec, bert, inception, etc)

There are many ways to extract vectors from data. This library aims to bring in all the state of the art models in a simple manner to vectorise your data easily.

Vector Hub provides:

A low barrier of entry for practitioners (using common methods)
Vectorise rich and complex data types like: text, image, audio, etc in 3 lines of code
Retrieve and find information about a model
An easy way to handle dependencies easily for different models

Quickstart:

New to Vectors

Full list of models

Google Colab Quickstart

Documentation

Why Vector Hub?

There are thousands of _____2Vec models across different use cases/domains. We wanted to create a hub that allowed people to aggregate their work and share it with the community.

Think transformers for NLP, Sci-kit Learn for data scientists.

Installation:

To get started quickly install vectorhub:

pip install vectorhub

Alternatively if you require more up-to-date models/features and are okay if it is not fully stable, you can install the nightly version of VectorHub using:

pip install vectorhub-nightly

After this, our built-in dependency manager will tell you what to install when you instantiate a model. The main types of installation options can be found here: https://hub.getvectorai.com/

To install different types of models:

# To install transformer requirements
pip install vectorhub[text-encoder-transformers]

To install all models at once:

pip install vectorhub[all]

We recommend activating a new virtual environment and then installing using the following:

python3 -m pip install virtualenv 
python3 -m virtualenv env 
source env/bin/activate
python3 -m pip install --upgrade pip 
python3 -m pip install vectorhub[all]

Instantiate our auto_encoder class as such and use any of the models!

from vectorhub.auto_encoder import AutoEncoder
encoder = AutoEncoder.from_model('text/bert')
encoder.encode("Hello vectorhub!")
[0.47, 0.83, 0.148, ...]

You can choose from our list of models:

['text/albert', 'text/bert', 'text/labse', 'text/use', 'text/use-multi', 'text/use-lite', 'text/legal-bert', 'audio/fairseq', 'audio/speech-embedding', 'audio/trill', 'audio/trill-distilled', 'audio/vggish', 'audio/yamnet', 'audio/wav2vec', 'image/bit', 'image/bit-medium', 'image/inception', 'image/inception-v2', 'image/inception-v3', 'image/inception-resnet', 'image/mobilenet', 'image/mobilenet-v2', 'image/resnet', 'image/resnet-v2', 'text_text/use-multi-qa', 'text_text/use-qa', 'text_text/dpr', 'text_text/lareqa-qa']

Leverage Google Tensorflow Hub's powerful models to create vectors

Vectorise your image in 3 lines of code using Google's Big Image Transfer model:

from vectorhub.encoders.image.tfhub import BitSmall2Vec
image_encoder = BitSmall2Vec()
image_encoder.encode('https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png')
[0.47, 0.83, 0.148, ...]

Vectorise your text in 3 lines of code using Google's BERT model:

from vectorhub.encoders.text.tfhub import Bert2Vec
text_encoder = Bert2Vec()
text_encoder.encode('This is sparta!')
[0.47, 0.83, 0.148, ...]

Vectorise your question and answer in 3 lines of code using Google's USE QA model:

from vectorhub.bi_encoders.text.tfhub import UseQA2Vec
text_encoder = UseQA2Vec()
text_encoder.encode_question('Who is sparta!')
[0.47, 0.83, 0.148, ...]
text_encoder.encode_answer('Sparta!')
[0.47, 0.83, 0.148, ...]

Leverage HuggingFace Transformer's Albert

from vectorhub.encoders.text import Transformer2Vec
text_encoder = Transformer2Vec('albert-base-v2')
text_encoder.encode('This is sparta!')
[0.47, 0.83, 0.148, ...]

Leverage Facebook's Dense Passage Retrieval

from vectorhub.bi_encoders.text_text.torch_transformers import DPR2Vec
text_encoder = DPR2Vec()
text_encoder.encode_question('Who is sparta!')
[0.47, 0.83, 0.148, ...]
text_encoder.encode_answer('Sparta!')
[0.47, 0.83, 0.148, ...]

Easily access information with your model!

# If you want to additional information about the model, you can access the information below:
text_encoder.definition.repo
text_encoder.definition.description
# If you want all the information in a dictionary, you can call:
text_encoder.definition.create_dict() # returns a dictionary with model id, description, paper, etc.

Upload vectors easily with documents alongside Vector AI

from vectorhub.encoders.text import Transformer2Vec
encoder = Transformer2Vec('bert-base-uncased')

from vectorai import ViClient
vi_client = ViClient(username, api_key)
docs = vi_client.create_sample_documents(10)
vi_client.insert_documents('collection_name_here', docs, models={'color': encoder.encode})

# Now we can search through our collection 
vi_client.search('collection_name_here', field='color_vector_', vector=encoder.encode('purple'))

What are Vectors?

Common Terminologys when operating with Vectors:

Vectors (aka. Embeddings, Encodings, Neural Representation) ~ It is a list of numbers to represent a piece of data. E.g. the vector for the word "king" using a Word2Vec model is [0.47, 0.83, 0.148, ...]
____2Vec (aka. Models, Encoders, Embedders) ~ Turns data into vectors e.g. Word2Vec turns words into vector

How can I use vectors?

Vectors have a broad range of applications. The most common use case is to perform semantic vector search and analysing the topics/clusters using vector analytics.

If you are interested in these applications, take a look at Vector AI.

How can I obtain vectors?

Taking the outputs of layers from deep learning models
Data cleaning, such as one hot encoding labels
Converting graph representations to vectors

How To Upload Your 2Vec Model

Read here if you would like to contribute your model!

Philosophy

The goal of VectorHub is to provide a flexible yet comprehensive framework that allows people to easily be able to turn their data into vectors in whatever form the data can be in. While our focus is largely on simplicity, customisation should always be an option and the level of abstraction is always up model-uploader as long as the reason is justified. For example - with text, we chose to keep the encoding at the text level as opposed to the token level because selection of text should not be applied at the token level so practitioners are aware of what texts go into the actual vectors (i.e. instead of ignoring a '[next][SEP][wo][##rd]', we are choosing to ignore 'next word' explicitly. We think this will allow practitioners to focus better on what should matter when it comes to encoding.

Similarly, when we are turning data into vectors, we convert to native Python objects. The decision for this is to attempt to remove as many dependencies as possible once the vectors are created - specifically those of deep learning frameworks such as Tensorflow/PyTorch. This is to allow other frameworks to be built on top of it.

Credit:

This library wouldn't exist if it weren't for the following libraries and the incredible machine learning community that releases their state-of-the-art models:

https://github.com/huggingface/transformers
https://github.com/tensorflow/hub
https://github.com/pytorch/pytorch
Word2Vec image - Alammar, Jay (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
https://github.com/UKPLab/sentence-transformers

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2.0.2021.6.2.1.17.47.427274

Jun 2, 2021

1.2.0.2021.6.1.1.11.29.100056

Jun 1, 2021

1.2.0.2021.5.31.0.57.41.826149

May 31, 2021

1.2.0.2021.5.30.0.51.2.748696

May 30, 2021

1.2.0.2021.5.29.0.46.35.902005

May 29, 2021

1.2.0.2021.5.28.0.54.27.163826

May 28, 2021

1.2.0.2021.5.27.0.32.30.861354

May 27, 2021

1.2.0.2021.5.26.0.30.45.655592

May 26, 2021

1.2.0.2021.5.25.0.24.43.329904

May 25, 2021

1.2.0.2021.5.24.0.23.35.960633

May 24, 2021

1.2.0.2021.5.23.0.26.4.336169

May 23, 2021

1.2.0.2021.5.22.0.23.23.936332

May 22, 2021

1.2.0.2021.5.21.0.22.47.618805

May 21, 2021

1.2.0.2021.5.20.0.22.3.681189

May 20, 2021

1.2.0.2021.5.19.0.21.47.801116

May 19, 2021

1.2.0.2021.5.18.0.24.26.234610

May 18, 2021

1.2.0.2021.5.17.0.21.52.708384

May 17, 2021

1.2.0.2021.5.16.0.24.8.856228

May 16, 2021

1.2.0.2021.5.15.0.22.6.395926

May 15, 2021

1.2.0.2021.5.14.0.23.8.763552

May 14, 2021

1.2.0.2021.5.13.0.22.37.610999

May 13, 2021

1.2.0.2021.5.12.0.12.20.448807

May 12, 2021

1.2.0.2021.5.11.0.11.50.94575

May 11, 2021

1.2.0.2021.5.10.0.11.48.904660

May 10, 2021

1.2.0.2021.5.9.0.12.59.396715

May 9, 2021

1.2.0.2021.5.8.0.11.35.697726

May 8, 2021

1.2.0.2021.5.7.0.11.12.381096

May 7, 2021

1.2.0.2021.5.6.0.11.7.351421

May 6, 2021

1.2.0.2021.5.5.0.10.44.291726

May 5, 2021

1.2.0.2021.5.4.0.17.30.957531

May 4, 2021

1.2.0.2021.5.3.0.23.24.240460

May 3, 2021

1.2.0.2021.5.2.0.25.21.717128

May 2, 2021

1.2.0.2021.5.1.0.22.16.448769

May 1, 2021

1.2.0.2021.4.30.0.21.3.169970

Apr 30, 2021

1.2.0.2021.4.29.0.31.18.504513

Apr 29, 2021

1.2.0.2021.4.28.0.31.37.727730

Apr 28, 2021

1.2.0.2021.4.27.0.31.1.41180

Apr 27, 2021

1.2.0.2021.4.26.0.33.46.137766

Apr 26, 2021

1.2.0.2021.4.25.0.33.47.273454

Apr 25, 2021

1.2.0.2021.4.24.0.32.44.921404

Apr 24, 2021

1.2.0.2021.4.23.0.31.58.270898

Apr 23, 2021

1.2.0.2021.4.22.0.32.41.626866

Apr 22, 2021

1.2.0.2021.4.21.0.31.19.272501

Apr 21, 2021

1.2.0.2021.4.20.0.31.36.13238

Apr 20, 2021

1.2.0.2021.4.19.0.31.54.587078

Apr 19, 2021

1.2.0.2021.4.18.0.32.16.268137

Apr 18, 2021

1.2.0.2021.4.17.0.30.45.779714

Apr 17, 2021

1.2.0.2021.4.16.0.30.12.167596

Apr 16, 2021

1.2.0.2021.4.15.0.31.35.616413

Apr 15, 2021

1.2.0.2021.4.14.0.29.43.487112

Apr 14, 2021

1.2.0.2021.4.13.0.33.49.530428

Apr 13, 2021

1.2.0.2021.4.12.0.33.32.417103

Apr 12, 2021

1.2.0.2021.4.11.0.34.45.92487

Apr 11, 2021

1.2.0.2021.4.10.0.32.7.811163

Apr 10, 2021

1.2.0.2021.4.9.0.29.54.312686

Apr 9, 2021

1.2.0.2021.4.8.0.34.49.242517

Apr 8, 2021

1.2.0.2021.4.7.0.31.49.247078

Apr 7, 2021

1.2.0.2021.4.6.0.32.21.435063

Apr 6, 2021

1.2.0.2021.4.5.0.33.39.82273

Apr 5, 2021

1.2.0.2021.4.4.0.32.55.305711

Apr 4, 2021

1.2.0.2021.4.3.0.30.55.517972

Apr 3, 2021

1.2.0.2021.4.2.0.40.6.35839

Apr 2, 2021

1.2.0.2021.4.1.0.31.45.810635

Apr 1, 2021

1.1.6.2021.3.31.0.26.39.366767

Mar 31, 2021

1.1.6.2021.3.30.0.30.5.715870

Mar 30, 2021

1.1.6.2021.3.29.0.31.40.597458

Mar 29, 2021

1.1.6.2021.3.28.0.31.39.13309

Mar 28, 2021

1.1.6.2021.3.27.0.30.7.501111

Mar 27, 2021

1.1.6.2021.3.26.0.28.40.512048

Mar 26, 2021

1.1.6.2021.3.25.0.27.14.262329

Mar 25, 2021

1.1.6.2021.3.24.0.9.53.150877

Mar 24, 2021

1.1.6.2021.3.23.0.9.38.452235

Mar 23, 2021

1.1.6.2021.3.22.0.9.40.889596

Mar 22, 2021

1.1.6.2021.3.21.0.10.14.530331

Mar 21, 2021

1.1.6.2021.3.20.0.9.28.95113

Mar 20, 2021

1.1.6.2021.3.19.0.9.46.210626

Mar 19, 2021

1.1.6.2021.3.18.0.9.36.140293

Mar 18, 2021

1.1.6.2021.3.17.0.9.36.911643

Mar 17, 2021

1.1.6.2021.3.16.0.9.24.109660

Mar 16, 2021

1.1.6.2021.3.15.0.10.24.734885

Mar 15, 2021

1.1.6.2021.3.14.0.9.48.131094

Mar 14, 2021

1.1.6.2021.3.13.0.9.27.249107

Mar 13, 2021

1.1.6.2021.3.12.0.9.32.57615

Mar 12, 2021

1.1.6.2021.3.11.0.9.26.718351

Mar 11, 2021

1.1.6.2021.3.10.0.9.21.260801

Mar 10, 2021

1.1.6.2021.3.9.0.9.21.714429

Mar 9, 2021

1.1.6.2021.3.8.0.9.31.603589

Mar 8, 2021

1.1.6.2021.3.7.0.9.58.555180

Mar 7, 2021

1.1.6.2021.3.6.0.9.23.597729

Mar 6, 2021

1.1.6.2021.3.5.0.9.24.920894

Mar 5, 2021

1.1.6.2021.3.4.0.9.17.991960

Mar 4, 2021

1.1.6.2021.3.3.0.9.4.768459

Mar 3, 2021

1.1.5.2021.3.2.0.8.10.696841

Mar 2, 2021

1.1.5.2021.3.1.0.9.40.506289

Mar 1, 2021

1.1.5.2021.2.28.0.9.40.178271

Feb 28, 2021

1.1.5.2021.2.27.0.9.10.797992

Feb 27, 2021

1.1.5.2021.2.26.0.9.3.243972

Feb 26, 2021

1.1.5.2021.2.25.0.8.51.710811

Feb 25, 2021

1.1.5.2021.2.24.0.9.18.633556

Feb 24, 2021

1.1.5.2021.2.23.0.8.55.821673

Feb 23, 2021

1.1.5.2021.2.22.0.9.17.353951

Feb 22, 2021

1.1.5.2021.2.21.0.9.24.289945

Feb 21, 2021

1.1.5.2021.2.20.0.8.55.461886

Feb 20, 2021

1.1.5.2021.2.19.0.8.54.569000

Feb 19, 2021

1.1.5.2021.2.18.0.9.29.84053

Feb 18, 2021

1.1.3.2021.2.17.0.9.37.493035

Feb 17, 2021

1.1.3.2021.2.16.0.8.55.124366

Feb 16, 2021

1.1.3.2021.2.15.0.9.9.981333

Feb 15, 2021

1.1.3.2021.2.14.0.9.18.480097

Feb 14, 2021

1.1.3.2021.2.13.0.8.42.854864

Feb 13, 2021

1.1.3.2021.2.12.0.9.7.80467

Feb 12, 2021

1.1.3.2021.2.11.0.8.42.899935

Feb 11, 2021

1.1.3.2021.2.10.0.10.25.107142

Feb 10, 2021

1.1.3.2021.2.9.0.8.48.448839

Feb 9, 2021

1.1.3.2021.2.8.0.11.44.822503

Feb 8, 2021

1.1.3.2021.2.7.0.14.11.777806

Feb 7, 2021

1.1.3.2021.2.6.0.12.43.795650

Feb 6, 2021

1.1.3.2021.2.5.0.8.47.159890

Feb 5, 2021

1.1.3.2021.2.4.0.12.24.705354

Feb 4, 2021

1.1.3.2021.2.3.0.11.4.995899

Feb 3, 2021

1.1.3.2021.2.2.0.13.52.677721

Feb 2, 2021

1.1.3.2021.2.1.0.13.24.738684

Feb 1, 2021

1.1.3.2021.1.31.0.12.23.440038

Jan 31, 2021

1.1.3.2021.1.30.0.11.56.482466

Jan 30, 2021

1.1.3.2021.1.29

Jan 29, 2021

1.1.1.2021.1.28.12.49.24.598269

Jan 28, 2021

1.1.1.2021.1.28

Jan 28, 2021

1.1.1.2021.1.27

Jan 27, 2021

1.1.0.2021.1.26

Jan 26, 2021

1.1.0.2021.1.25

Jan 25, 2021

1.1.0.2021.1.24

Jan 24, 2021

1.1.0.2021.1.23

Jan 23, 2021

1.1.0.2021.1.22

Jan 22, 2021

1.1.0.2021.1.21

Jan 21, 2021

1.1.0.2021.1.20

Jan 20, 2021

1.1.0.2021.1.19

Jan 19, 2021

1.1.0.2021.1.18

Jan 18, 2021

1.1.0.2021.1.17

Jan 17, 2021

1.1.0.2021.1.16

Jan 16, 2021

1.0.9.2021.1.15

Jan 15, 2021

1.0.8.2021.1.14

Jan 14, 2021

1.0.8.2021.1.13

Jan 13, 2021

1.0.8.2021.1.12

Jan 12, 2021

1.0.8.2021.1.11

Jan 11, 2021

1.0.8.2021.1.10

Jan 10, 2021

1.0.8.2021.1.9

Jan 9, 2021

1.0.8.2021.1.8

Jan 8, 2021

1.0.8.2021.1.7

Jan 7, 2021

1.0.8.2021.1.6

Jan 6, 2021

1.0.8.2021.1.5

Jan 5, 2021

1.0.8.2021.1.4

Jan 4, 2021

1.0.8.2021.1.3

Jan 3, 2021

1.0.8.2021.1.2

Jan 2, 2021

1.0.8.2021.1.1

Jan 1, 2021

1.0.8.2020.12.31

Dec 31, 2020

1.0.8.2020.12.30

Dec 30, 2020

This version

1.0.8.2020.12.29

Dec 29, 2020

1.0.8.2020.12.28

Dec 28, 2020

1.0.8.2020.12.27

Dec 27, 2020

1.0.8.2020.12.26

Dec 26, 2020

1.0.8.2020.12.25

Dec 25, 2020

1.0.8.2020.12.24

Dec 24, 2020

1.0.8.2020.12.23

Dec 23, 2020

1.0.8.2020.12.22

Dec 22, 2020

1.0.8.2020.12.21

Dec 21, 2020

1.0.8.2020.12.17

Dec 17, 2020

1.0.7.2020.12.14

Dec 14, 2020

1.0.7.2020.12.14a0 pre-release

Dec 14, 2020

1.0.7.20.2020.12.14

Dec 14, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorhub-nightly-1.0.8.2020.12.29.tar.gz (55.4 kB view details)

Uploaded Dec 29, 2020 Source

Built Distribution

vectorhub_nightly-1.0.8.2020.12.29-py3-none-any.whl (99.1 kB view details)

Uploaded Dec 29, 2020 Python 3

File details

Details for the file vectorhub-nightly-1.0.8.2020.12.29.tar.gz.

File metadata

Download URL: vectorhub-nightly-1.0.8.2020.12.29.tar.gz
Upload date: Dec 29, 2020
Size: 55.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.1

File hashes

Hashes for vectorhub-nightly-1.0.8.2020.12.29.tar.gz
Algorithm	Hash digest
SHA256	`453901ac1faefc2463bd24a558a2077bf4dd93247e8a7bcea8787ad762112789`
MD5	`0aedc9a31f6c9f1de2c92cb0c9b642eb`
BLAKE2b-256	`b5f4889b4842ce76d06da6dc3b286ff2dd5a27bf727e946f67a7ab34ba73209c`

See more details on using hashes here.

File details

Details for the file vectorhub_nightly-1.0.8.2020.12.29-py3-none-any.whl.

File metadata

Download URL: vectorhub_nightly-1.0.8.2020.12.29-py3-none-any.whl
Upload date: Dec 29, 2020
Size: 99.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.1

File hashes

Hashes for vectorhub_nightly-1.0.8.2020.12.29-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d579fc2c595be5aaba2b393dfa249b83c2f69c45b2ac90c9e5655a70810eb9f5`
MD5	`694f92cd5df77f1e56635d3f0294dde2`
BLAKE2b-256	`0af7bd05d54a0342d8ae7e44e54cc38a70318083a56aa19a67942f48344b6b86`