A portable document embedding using SWEM.
Project description
SWEM
Implementation of SWEM(Simple Word-Embedding-based Models)
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms (ACL 2018)
Installation
pip install swem
Example
Examples are available in examples directory.
Functional API
from typing import List
import numpy as np
import swem
from gensim.models import KeyedVectors
if __name__ == '__main__':
kv: KeyedVectors = KeyedVectors(vector_size=200)
tokens: List[str] = ['I', 'have', 'a', 'pen']
embed: np.ndarray = swem.infer_vector(
tokens=tokens, kv=kv, method='concat'
)
print(embed.shape)
Japanese
from typing import List
import swem
from gensim.models import KeyedVectors
if __name__ == '__main__':
kv: KeyedVectors = KeyedVectors(vector_size=200)
swem_embed = swem.SWEM(kv)
tokens: List[str] = ['すもも', 'も', 'もも', 'も', 'もも', 'の', 'うち']
embed = swem_embed.infer_vector(tokens, method='max')
print(embed.shape)
Results
(200,)
English
from typing import List
import swem
from gensim.models import KeyedVectors
if __name__ == '__main__':
kv: KeyedVectors = KeyedVectors(vector_size=200)
swem_embed = swem.SWEM(kv)
tokens: List[str] = ['This', 'is', 'an', 'implementation', 'of', 'SWEM']
embed = swem_embed.infer_vector(tokens, method='max')
print(embed.shape)
Results
(200,)
Set random seed
SWEM generates random vector when given token is out of vocaburary. To reproduce token's embeddings, you need to set seed of NumPy.
from typing import List
import numpy as np
import swem
from gensim.models import KeyedVectors
if __name__ == '__main__':
np.random.seed(0)
kv: KeyedVectors = KeyedVectors(vector_size=200)
tokens: List[str] = ['I', 'have', 'a', 'pen']
embed: np.ndarray = swem.infer_vector(
tokens=tokens, kv=kv, method='concat'
)
print(embed.shape)
Download pretained w2v and use it.
import swem
swem.download_w2v(lang='ja')
kv = swem.load_w2v(lang='ja')
Downloading w2v file to /Users/<username>/.swem/ja.zip
Extract zipfile into /Users/<username>/.swem/ja
Success to extract files.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swem-0.4.1.tar.gz.
File metadata
- Download URL: swem-0.4.1.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81d3ef079f37734d844febe5cdada2560e6e572bc6224b239c8a19d8c7c5ca4b
|
|
| MD5 |
a52a30bf8f0f00a3737cfdbcc11e0b52
|
|
| BLAKE2b-256 |
f028632747ec794634dbd7045f523c8d09e92a94b90cc5bd2797f29556764952
|
File details
Details for the file swem-0.4.1-py3-none-any.whl.
File metadata
- Download URL: swem-0.4.1-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0eb50b3a18488f3dd35b2169d48c8337417b01dbbb35cb05fa72f0564be56b22
|
|
| MD5 |
4e85de3695e6ad133200d133a8ea87b0
|
|
| BLAKE2b-256 |
f09791d9c5379c6576862d680fffb44e8bd68ce3adab66dd30084ba94499d274
|