Skip to main content

Finalfusion in Python

Project description

finalfusion-python

Documentation Status

Introduction

finalfusion is a Python package for reading, writing and using finalfusion embeddings, but also supports other commonly used embeddings like fastText, GloVe and word2vec.

The Python package supports the same types of embeddings as the finalfusion-rust crate:

  • Vocabulary:
    • No subwords
    • Subwords
  • Embedding matrix:
    • Array
    • Memory-mapped
    • Quantized
  • Norms
  • Metadata

Installation

The finalfusion module is available on PyPi for Linux, Mac and Windows. You can use pip to install the module:

$ pip install --upgrade finalfusion

Installing from source

Building from source depends on Cython. If you install the package using pip, you don't need to explicitly install the dependency since it is specified in pyproject.toml.

$ git clone https://github.com/finalfusion/finalfusion-python
$ cd finalfusion-python
$ pip install .

If you want to build wheels from source, wheel needs to be installed. It's then possible to build wheels through:

$ python setup.py bdist_wheel

The wheels can be found in dist.

Package Usage

Basic usage

import finalfusion
# loading from different formats
w2v_embeds = finalfusion.load_word2vec("/path/to/w2v.bin")
text_embeds = finalfusion.load_text("/path/to/embeds.txt")
text_dims_embeds = finalfusion.load_text_dims("/path/to/embeds.dims.txt")
fasttext_embeds = finalfusion.load_fasttext("/path/to/fasttext.bin")
fifu_embeds = finalfusion.load_finalfusion("/path/to/embeddings.fifu")

# serialization to formats works similarly
finalfusion.compat.write_word2vec("to_word2vec.bin", fifu_embeds)

# embedding lookup
embedding = fifu_embeds["Test"]

# reading an embedding into a buffer
import numpy as np
buffer = np.zeros(fifu_embeds.storage.shape[1], dtype=np.float32)
fifu_embeds.embedding("Test", out=buffer)

# similarity and analogy query
sim_query = fifu_embeds.word_similarity("Test")
analogy_query = fifu_embeds.analogy("A", "B", "C")

# accessing the vocab and printing the first 10 words
vocab = fifu_embeds.vocab
print(vocab.words[:10])

# SubwordVocabs give access to the subword indexer:
subword_indexer = vocab.subword_indexer
print(subword_indexer.subword_indices("Test", with_ngrams=True))

# accessing the storage and calculate its dot product with an embedding
res = embedding.dot(fifu_embeds.storage)

# printing metadata
print(fifu_embeds.metadata) 

Beyond Embeddings

# load only a vocab from a finalfusion file
from finalfusion import load_vocab
vocab = load_vocab("/path/to/finalfusion_file.fifu")

# serialize vocab to single file
vocab.write("/path/to/vocab_file.fifu.voc")

# more specific loading functions exist
from finalfusion.vocab import load_finalfusion_bucket_vocab
fifu_bucket_vocab = load_finalfusion_bucket_vocab("/path/to/vocab_file.fifu.voc")

The package supports loading and writing all finalfusion chunks this way. This is only supported by the Python package, reading will fail with e.g. the finalfusion-rust.

Scripts

finalfusion also includes a conversion script ffp-convert to convert between the supported formats.

# convert from fastText format to finalfusion
$ ffp-convert -f fasttext fasttext.bin -t finalfusion embeddings.fifu

ffp-bucket-to-explicit can be used to convert bucket embeddings to embeddings with an explicit ngram lookup.

# convert finalfusion bucket embeddings to explicit
$ ffp-bucket-to-explicit -f finalfusion embeddings.fifu explicit.fifu

Finally, the package comes with ffp-similar and ffp-analogy to do analogy and similarity queries.

# get the 5 nearest neighbours of "Tübingen"
$ echo Tübingen | ffp-similar embeddings.fifu
# get the 5 top answers for "Tübingen" is to "Stuttgart" like "Heidelberg" to...
$ echo Tübingen Stuttgart Heidelberg | ffp-analogy embeddings.fifu

Where to go from here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finalfusion-0.7.1.tar.gz (227.8 kB view details)

Uploaded Source

Built Distributions

finalfusion-0.7.1-cp38-cp38m-win_amd64.whl (303.7 kB view details)

Uploaded CPython 3.8m Windows x86-64

finalfusion-0.7.1-cp38-cp38-manylinux2014_x86_64.whl (780.2 kB view details)

Uploaded CPython 3.8

finalfusion-0.7.1-cp38-cp38-macosx_10_14_x86_64.whl (298.0 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

finalfusion-0.7.1-cp37-cp37m-win_amd64.whl (302.3 kB view details)

Uploaded CPython 3.7m Windows x86-64

finalfusion-0.7.1-cp37-cp37m-manylinux2014_x86_64.whl (689.1 kB view details)

Uploaded CPython 3.7m

finalfusion-0.7.1-cp37-cp37m-macosx_10_14_x86_64.whl (296.7 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

finalfusion-0.7.1-cp36-cp36m-win_amd64.whl (302.5 kB view details)

Uploaded CPython 3.6m Windows x86-64

finalfusion-0.7.1-cp36-cp36m-manylinux2014_x86_64.whl (688.6 kB view details)

Uploaded CPython 3.6m

finalfusion-0.7.1-cp36-cp36m-macosx_10_14_x86_64.whl (297.6 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file finalfusion-0.7.1.tar.gz.

File metadata

  • Download URL: finalfusion-0.7.1.tar.gz
  • Upload date:
  • Size: 227.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10

File hashes

Hashes for finalfusion-0.7.1.tar.gz
Algorithm Hash digest
SHA256 f6a5f7b123fa573ad0a418261ef30e4516083bd4afc8db66ead322055d281f9f
MD5 057d744a35fcfcbd2e020a3938cd86e7
BLAKE2b-256 c668544cbd6e9b80fa70d7d181d0f615b396b34c6c5a5dc1b3c7d5bd64f4b028

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp38-cp38m-win_amd64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp38-cp38m-win_amd64.whl
  • Upload date:
  • Size: 303.7 kB
  • Tags: CPython 3.8m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for finalfusion-0.7.1-cp38-cp38m-win_amd64.whl
Algorithm Hash digest
SHA256 e6115a8a4e1e17280e781cbabea032233018c38cae9a610784a082b6a143dc67
MD5 ca1d37ccc7aca022ee8b6b32fd0db45a
BLAKE2b-256 1ff5cb2b6608f8a468e64350be52205b34f6b29cf8e7c5a467ceb2d69da6b70c

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 780.2 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10

File hashes

Hashes for finalfusion-0.7.1-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bc9adbfa634f7f269ea880df1bf1f6f071c59ba378a66df499c813505bcc324e
MD5 1796d066708dbb4dbb6f3d154a2541d7
BLAKE2b-256 5d458b04d1a4efcac4611a78fe46f180a9748cbcb07c7c2fa1be24674cc35a14

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 298.0 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for finalfusion-0.7.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 65cae920e09a23633f8921520307dcbffeb3f618fc95198415f607fd45695c99
MD5 c52e6e4a984c6d2ef5757675aa32611a
BLAKE2b-256 308702c8da9d67791b8ca4f21af73c53d3930c10a1d7160339fc5d23925f3e0a

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 302.3 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for finalfusion-0.7.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 1479200d70e3588d01104d2b6d5986609a92aabcc7df8c66f95a37b36d199cad
MD5 f1051d17c02fc592e9883411664ea4f9
BLAKE2b-256 dbc3e316e9a3487e8d94a2f0b69847daf28af3fc25560f3af68e5e4bb1c8be33

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 689.1 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10

File hashes

Hashes for finalfusion-0.7.1-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c0224bfd3c40e42a34c9f2c1620d994da2452cc974130f4589414fd60760f164
MD5 e8f340356034743a85dba54da1b41d9f
BLAKE2b-256 d8e85a0b6f17bf23d0ff3faee78420089d799219f56b5f2f0e56c41cc4dea307

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 296.7 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for finalfusion-0.7.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b183d60c4ab02eb8610fb739df9da1579e6c393f9bb81e5116f651634821c502
MD5 af38b66ac0fbd3bfc89ddcf789600069
BLAKE2b-256 6a68a99546cfc4a4489ab314f00cb86b5db2a1d415b81dd6f3d81fa1e8c9830e

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 302.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.8

File hashes

Hashes for finalfusion-0.7.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 140c602f518881a4e7a36ebc01a2d027954e8ead57d210285a3c1b22883cdb83
MD5 0d285c808747c27c221092d67d74eb09
BLAKE2b-256 9067a6a10ca4a7be988568fe0e4e10faa642f01d96e1ffba5210238da31ab462

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 688.6 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10

File hashes

Hashes for finalfusion-0.7.1-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dc3aff6798728ae684ef7ee7d3ecf7534ab594ebdb2a4b5a676d7559937d3d64
MD5 dba298c7f12796cdb0318b2c95723f56
BLAKE2b-256 7a2bcef007359f5ade9e8eb67f19159ff42d7008f35c0959344f6c75f8063499

See more details on using hashes here.

File details

Details for the file finalfusion-0.7.1-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: finalfusion-0.7.1-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 297.6 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10

File hashes

Hashes for finalfusion-0.7.1-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 28f491d8f1bcd58ed0a077cce84dd04965c37c71529cf3f64838b9a08301e7e8
MD5 7c8c5576d7ad8fc72e843f5b9d4b4ff7
BLAKE2b-256 b9a316017cdc236a709cf1ce344672c002fa6d583befacbace8b69b51182be1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page