Skip to main content

A portable document embedding using SWEM.

Project description

SWEM

GitHub Actions PyPI Version MIT License GitHub Starts GitHub Forks

Implementation of SWEM(Simple Word-Embedding-based Models)
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms (ACL 2018)

Installation

pip install swem

Example

Examples are available in examples directory.

Functional API

from typing import List

import numpy as np
import swem
from gensim.models import KeyedVectors

if __name__ == '__main__':
    kv: KeyedVectors = KeyedVectors(vector_size=200)
    tokens: List[str] = ['I', 'have', 'a', 'pen']

    embed: np.ndarray = swem.infer_vector(
        tokens=tokens, kv=kv, method='concat'
    )
    print(embed.shape)

Japanese

from typing import List

import swem
from gensim.models import KeyedVectors


if __name__ == '__main__':
    kv: KeyedVectors = KeyedVectors(vector_size=200)
    swem_embed = swem.SWEM(kv)

    tokens: List[str] = ['すもも', 'も', 'もも', 'も', 'もも', 'の', 'うち']
    embed = swem_embed.infer_vector(tokens, method='max')
    print(embed.shape)

Results

(200,)

English

from typing import List

import swem
from gensim.models import KeyedVectors


if __name__ == '__main__':
    kv: KeyedVectors = KeyedVectors(vector_size=200)
    swem_embed = swem.SWEM(kv)

    tokens: List[str] = ['This', 'is', 'an', 'implementation', 'of', 'SWEM']
    embed = swem_embed.infer_vector(tokens, method='max')
    print(embed.shape)

Results

(200,)

Set random seed

SWEM generates random vector when given token is out of vocaburary. To reproduce token's embeddings, you need to set seed of NumPy.

from typing import List

import numpy as np
import swem
from gensim.models import KeyedVectors

if __name__ == '__main__':
    np.random.seed(0)
    kv: KeyedVectors = KeyedVectors(vector_size=200)
    tokens: List[str] = ['I', 'have', 'a', 'pen']

    embed: np.ndarray = swem.infer_vector(
        tokens=tokens, kv=kv, method='concat'
    )
    print(embed.shape)

Download pretained w2v and use it.

import swem
swem.download_w2v(lang='ja')
kv = swem.load_w2v(lang='ja')
Downloading w2v file to /Users/<username>/.swem/ja.zip
Extract zipfile into /Users/<username>/.swem/ja
Success to extract files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swem-0.4.1.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

swem-0.4.1-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file swem-0.4.1.tar.gz.

File metadata

  • Download URL: swem-0.4.1.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for swem-0.4.1.tar.gz
Algorithm Hash digest
SHA256 81d3ef079f37734d844febe5cdada2560e6e572bc6224b239c8a19d8c7c5ca4b
MD5 a52a30bf8f0f00a3737cfdbcc11e0b52
BLAKE2b-256 f028632747ec794634dbd7045f523c8d09e92a94b90cc5bd2797f29556764952

See more details on using hashes here.

File details

Details for the file swem-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: swem-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for swem-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0eb50b3a18488f3dd35b2169d48c8337417b01dbbb35cb05fa72f0564be56b22
MD5 4e85de3695e6ad133200d133a8ea87b0
BLAKE2b-256 f09791d9c5379c6576862d680fffb44e8bd68ce3adab66dd30084ba94499d274

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page