Finalfusion in Python
Project description
finalfusion-python
Introduction
finalfusion
is a Python package for reading, writing and using
finalfusion embeddings, but also
supports other commonly used embeddings like fastText, GloVe and
word2vec.
The Python package supports the same types of embeddings as the finalfusion-rust crate:
- Vocabulary:
- No subwords
- Subwords
- Embedding matrix:
- Array
- Memory-mapped
- Quantized
- Norms
- Metadata
Installation
The finalfusion module is
available on PyPi for Linux,
Mac and Windows. You can use pip
to install the module:
$ pip install --upgrade finalfusion
Installing from source
Building from source depends on Cython
. If you install the package using
pip
, you don't need to explicitly install the dependency since it is
specified in pyproject.toml
.
$ git clone https://github.com/finalfusion/finalfusion-python
$ cd finalfusion-python
$ pip install .
If you want to build wheels from source, wheel
needs to be installed.
It's then possible to build wheels through:
$ python setup.py bdist_wheel
The wheels can be found in dist
.
Package Usage
Basic usage
import finalfusion
# loading from different formats
w2v_embeds = finalfusion.load_word2vec("/path/to/w2v.bin")
text_embeds = finalfusion.load_text("/path/to/embeds.txt")
text_dims_embeds = finalfusion.load_text_dims("/path/to/embeds.dims.txt")
fasttext_embeds = finalfusion.load_fasttext("/path/to/fasttext.bin")
fifu_embeds = finalfusion.load_finalfusion("/path/to/embeddings.fifu")
# serialization to formats works similarly
finalfusion.compat.write_word2vec("to_word2vec.bin", fifu_embeds)
# embedding lookup
embedding = fifu_embeds["Test"]
# reading an embedding into a buffer
import numpy as np
buffer = np.zeros(fifu_embeds.storage.shape[1], dtype=np.float32)
fifu_embeds.embedding("Test", out=buffer)
# similarity and analogy query
sim_query = fifu_embeds.word_similarity("Test")
analogy_query = fifu_embeds.analogy("A", "B", "C")
# accessing the vocab and printing the first 10 words
vocab = fifu_embeds.vocab
print(vocab.words[:10])
# SubwordVocabs give access to the subword indexer:
subword_indexer = vocab.subword_indexer
print(subword_indexer.subword_indices("Test", with_ngrams=True))
# accessing the storage and calculate its dot product with an embedding
res = embedding.dot(fifu_embeds.storage)
# printing metadata
print(fifu_embeds.metadata)
Beyond Embeddings
# load only a vocab from a finalfusion file
from finalfusion import load_vocab
vocab = load_vocab("/path/to/finalfusion_file.fifu")
# serialize vocab to single file
vocab.write("/path/to/vocab_file.fifu.voc")
# more specific loading functions exist
from finalfusion.vocab import load_finalfusion_bucket_vocab
fifu_bucket_vocab = load_finalfusion_bucket_vocab("/path/to/vocab_file.fifu.voc")
The package supports loading and writing all finalfusion
chunks this way.
This is only supported by the Python package, reading will fail with e.g.
the finalfusion-rust
.
Scripts
finalfusion
also includes a conversion script ffp-convert
to convert
between the supported formats.
# convert from fastText format to finalfusion
$ ffp-convert -f fasttext fasttext.bin -t finalfusion embeddings.fifu
ffp-bucket-to-explicit
can be used to convert bucket embeddings to embeddings
with an explicit ngram lookup.
# convert finalfusion bucket embeddings to explicit
$ ffp-bucket-to-explicit -f finalfusion embeddings.fifu explicit.fifu
Finally, the package comes with ffp-similar
and ffp-analogy
to do
analogy and similarity queries.
# get the 5 nearest neighbours of "Tübingen"
$ echo Tübingen | ffp-similar embeddings.fifu
# get the 5 top answers for "Tübingen" is to "Stuttgart" like "Heidelberg" to...
$ echo Tübingen Stuttgart Heidelberg | ffp-analogy embeddings.fifu
Where to go from here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file finalfusion-0.7.1.tar.gz
.
File metadata
- Download URL: finalfusion-0.7.1.tar.gz
- Upload date:
- Size: 227.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6a5f7b123fa573ad0a418261ef30e4516083bd4afc8db66ead322055d281f9f |
|
MD5 | 057d744a35fcfcbd2e020a3938cd86e7 |
|
BLAKE2b-256 | c668544cbd6e9b80fa70d7d181d0f615b396b34c6c5a5dc1b3c7d5bd64f4b028 |
File details
Details for the file finalfusion-0.7.1-cp38-cp38m-win_amd64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp38-cp38m-win_amd64.whl
- Upload date:
- Size: 303.7 kB
- Tags: CPython 3.8m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6115a8a4e1e17280e781cbabea032233018c38cae9a610784a082b6a143dc67 |
|
MD5 | ca1d37ccc7aca022ee8b6b32fd0db45a |
|
BLAKE2b-256 | 1ff5cb2b6608f8a468e64350be52205b34f6b29cf8e7c5a467ceb2d69da6b70c |
File details
Details for the file finalfusion-0.7.1-cp38-cp38-manylinux2014_x86_64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp38-cp38-manylinux2014_x86_64.whl
- Upload date:
- Size: 780.2 kB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc9adbfa634f7f269ea880df1bf1f6f071c59ba378a66df499c813505bcc324e |
|
MD5 | 1796d066708dbb4dbb6f3d154a2541d7 |
|
BLAKE2b-256 | 5d458b04d1a4efcac4611a78fe46f180a9748cbcb07c7c2fa1be24674cc35a14 |
File details
Details for the file finalfusion-0.7.1-cp38-cp38-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp38-cp38-macosx_10_14_x86_64.whl
- Upload date:
- Size: 298.0 kB
- Tags: CPython 3.8, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65cae920e09a23633f8921520307dcbffeb3f618fc95198415f607fd45695c99 |
|
MD5 | c52e6e4a984c6d2ef5757675aa32611a |
|
BLAKE2b-256 | 308702c8da9d67791b8ca4f21af73c53d3930c10a1d7160339fc5d23925f3e0a |
File details
Details for the file finalfusion-0.7.1-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 302.3 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1479200d70e3588d01104d2b6d5986609a92aabcc7df8c66f95a37b36d199cad |
|
MD5 | f1051d17c02fc592e9883411664ea4f9 |
|
BLAKE2b-256 | dbc3e316e9a3487e8d94a2f0b69847daf28af3fc25560f3af68e5e4bb1c8be33 |
File details
Details for the file finalfusion-0.7.1-cp37-cp37m-manylinux2014_x86_64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp37-cp37m-manylinux2014_x86_64.whl
- Upload date:
- Size: 689.1 kB
- Tags: CPython 3.7m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0224bfd3c40e42a34c9f2c1620d994da2452cc974130f4589414fd60760f164 |
|
MD5 | e8f340356034743a85dba54da1b41d9f |
|
BLAKE2b-256 | d8e85a0b6f17bf23d0ff3faee78420089d799219f56b5f2f0e56c41cc4dea307 |
File details
Details for the file finalfusion-0.7.1-cp37-cp37m-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp37-cp37m-macosx_10_14_x86_64.whl
- Upload date:
- Size: 296.7 kB
- Tags: CPython 3.7m, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b183d60c4ab02eb8610fb739df9da1579e6c393f9bb81e5116f651634821c502 |
|
MD5 | af38b66ac0fbd3bfc89ddcf789600069 |
|
BLAKE2b-256 | 6a68a99546cfc4a4489ab314f00cb86b5db2a1d415b81dd6f3d81fa1e8c9830e |
File details
Details for the file finalfusion-0.7.1-cp36-cp36m-win_amd64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp36-cp36m-win_amd64.whl
- Upload date:
- Size: 302.5 kB
- Tags: CPython 3.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 140c602f518881a4e7a36ebc01a2d027954e8ead57d210285a3c1b22883cdb83 |
|
MD5 | 0d285c808747c27c221092d67d74eb09 |
|
BLAKE2b-256 | 9067a6a10ca4a7be988568fe0e4e10faa642f01d96e1ffba5210238da31ab462 |
File details
Details for the file finalfusion-0.7.1-cp36-cp36m-manylinux2014_x86_64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp36-cp36m-manylinux2014_x86_64.whl
- Upload date:
- Size: 688.6 kB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc3aff6798728ae684ef7ee7d3ecf7534ab594ebdb2a4b5a676d7559937d3d64 |
|
MD5 | dba298c7f12796cdb0318b2c95723f56 |
|
BLAKE2b-256 | 7a2bcef007359f5ade9e8eb67f19159ff42d7008f35c0959344f6c75f8063499 |
File details
Details for the file finalfusion-0.7.1-cp36-cp36m-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: finalfusion-0.7.1-cp36-cp36m-macosx_10_14_x86_64.whl
- Upload date:
- Size: 297.6 kB
- Tags: CPython 3.6m, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28f491d8f1bcd58ed0a077cce84dd04965c37c71529cf3f64838b9a08301e7e8 |
|
MD5 | 7c8c5576d7ad8fc72e843f5b9d4b4ff7 |
|
BLAKE2b-256 | b9a316017cdc236a709cf1ce344672c002fa6d583befacbace8b69b51182be1b |