A portable document embedding using SWEM.
Project description
SWEM
Implementation of SWEM(Simple Word-Embedding-based Models)
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms (ACL 2018)
Installation
pip install swem
Example
Examples are available in examples directory.
Functional API
from typing import List
import numpy as np
import swem
from gensim.models import KeyedVectors
if __name__ == '__main__':
kv: KeyedVectors = KeyedVectors(vector_size=200)
tokens: List[str] = ['I', 'have', 'a', 'pen']
embed: np.ndarray = swem.infer_vector(
tokens=tokens, kv=kv, method='concat'
)
print(embed.shape)
Japanese
from typing import List
import swem
from gensim.models import KeyedVectors
if __name__ == '__main__':
kv: KeyedVectors = KeyedVectors(vector_size=200)
swem_embed = swem.SWEM(kv)
tokens: List[str] = ['すもも', 'も', 'もも', 'も', 'もも', 'の', 'うち']
embed = swem_embed.infer_vector(tokens, method='max')
print(embed.shape)
Results
(200,)
English
from typing import List
import swem
from gensim.models import KeyedVectors
if __name__ == '__main__':
kv: KeyedVectors = KeyedVectors(vector_size=200)
swem_embed = swem.SWEM(kv)
tokens: List[str] = ['This', 'is', 'an', 'implementation', 'of', 'SWEM']
embed = swem_embed.infer_vector(tokens, method='max')
print(embed.shape)
Results
(200,)
Set random seed
SWEM generates random vector when given token is out of vocaburary. To reproduce token's embeddings, you need to set seed of NumPy.
from typing import List
import numpy as np
import swem
from gensim.models import KeyedVectors
if __name__ == '__main__':
np.random.seed(0)
kv: KeyedVectors = KeyedVectors(vector_size=200)
tokens: List[str] = ['I', 'have', 'a', 'pen']
embed: np.ndarray = swem.infer_vector(
tokens=tokens, kv=kv, method='concat'
)
print(embed.shape)
Download pretained w2v and use it.
import swem
swem.download_w2v(lang='ja')
kv = swem.load_w2v(lang='ja')
Downloading w2v file to /Users/<username>/.swem/ja.zip
Extract zipfile into /Users/<username>/.swem/ja
Success to extract files.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
swem-0.4.1.tar.gz
(7.3 kB
view details)
Built Distribution
swem-0.4.1-py3-none-any.whl
(6.2 kB
view details)
File details
Details for the file swem-0.4.1.tar.gz
.
File metadata
- Download URL: swem-0.4.1.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81d3ef079f37734d844febe5cdada2560e6e572bc6224b239c8a19d8c7c5ca4b |
|
MD5 | a52a30bf8f0f00a3737cfdbcc11e0b52 |
|
BLAKE2b-256 | f028632747ec794634dbd7045f523c8d09e92a94b90cc5bd2797f29556764952 |
File details
Details for the file swem-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: swem-0.4.1-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0eb50b3a18488f3dd35b2169d48c8337417b01dbbb35cb05fa72f0564be56b22 |
|
MD5 | 4e85de3695e6ad133200d133a8ea87b0 |
|
BLAKE2b-256 | f09791d9c5379c6576862d680fffb44e8bd68ce3adab66dd30084ba94499d274 |