A portable document embedding using SWEM.

These details have not been verified by PyPI

Project links

Homepage

Project description

SWEM

GitHub Actions GitHub Starts GitHub Forks

Implementation of SWEM(Simple Word-Embedding-based Models)
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms (ACL 2018)

Installation

pip install swem

Example

Examples are available in examples directory.

Functional API

from typing import List

import numpy as np
import swem
from gensim.models import KeyedVectors

if __name__ == '__main__':
    kv: KeyedVectors = KeyedVectors(vector_size=200)
    tokens: List[str] = ['I', 'have', 'a', 'pen']

    embed: np.ndarray = swem.infer_vector(
        tokens=tokens, kv=kv, method='concat'
    )
    print(embed.shape)

Japanese

from typing import List

import swem
from gensim.models import KeyedVectors


if __name__ == '__main__':
    kv: KeyedVectors = KeyedVectors(vector_size=200)
    swem_embed = swem.SWEM(kv)

    tokens: List[str] = ['すもも', 'も', 'もも', 'も', 'もも', 'の', 'うち']
    embed = swem_embed.infer_vector(tokens, method='max')
    print(embed.shape)

Results

(200,)

English

from typing import List

import swem
from gensim.models import KeyedVectors


if __name__ == '__main__':
    kv: KeyedVectors = KeyedVectors(vector_size=200)
    swem_embed = swem.SWEM(kv)

    tokens: List[str] = ['This', 'is', 'an', 'implementation', 'of', 'SWEM']
    embed = swem_embed.infer_vector(tokens, method='max')
    print(embed.shape)

Results

(200,)

Set random seed

SWEM generates random vector when given token is out of vocaburary. To reproduce token's embeddings, you need to set seed of NumPy.

from typing import List

import numpy as np
import swem
from gensim.models import KeyedVectors

if __name__ == '__main__':
    np.random.seed(0)
    kv: KeyedVectors = KeyedVectors(vector_size=200)
    tokens: List[str] = ['I', 'have', 'a', 'pen']

    embed: np.ndarray = swem.infer_vector(
        tokens=tokens, kv=kv, method='concat'
    )
    print(embed.shape)

Download pretained w2v and use it.

import swem
swem.download_w2v(lang='ja')
kv = swem.load_w2v(lang='ja')

Downloading w2v file to /Users/<username>/.swem/ja.zip
Extract zipfile into /Users/<username>/.swem/ja
Success to extract files.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4.1

May 29, 2021

0.4.0

Nov 20, 2020

0.3.2

Aug 25, 2020

0.3.1

Jul 30, 2020

0.3.0

Jul 15, 2020

0.2.0

Jun 25, 2020

0.1.5

May 28, 2020

0.1.4

May 28, 2020

0.1.3

Mar 31, 2020

0.1.2

Mar 23, 2020

0.1.1

Mar 15, 2020

0.1.0

Mar 9, 2020

0.0.1

Mar 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swem-0.4.1.tar.gz (7.3 kB view details)

Uploaded May 29, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

swem-0.4.1-py3-none-any.whl (6.2 kB view details)

Uploaded May 29, 2021 Python 3

File details

Details for the file swem-0.4.1.tar.gz.

File metadata

Download URL: swem-0.4.1.tar.gz
Upload date: May 29, 2021
Size: 7.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for swem-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`81d3ef079f37734d844febe5cdada2560e6e572bc6224b239c8a19d8c7c5ca4b`
MD5	`a52a30bf8f0f00a3737cfdbcc11e0b52`
BLAKE2b-256	`f028632747ec794634dbd7045f523c8d09e92a94b90cc5bd2797f29556764952`

See more details on using hashes here.

File details

Details for the file swem-0.4.1-py3-none-any.whl.

File metadata

Download URL: swem-0.4.1-py3-none-any.whl
Upload date: May 29, 2021
Size: 6.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for swem-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0eb50b3a18488f3dd35b2169d48c8337417b01dbbb35cb05fa72f0564be56b22`
MD5	`4e85de3695e6ad133200d133a8ea87b0`
BLAKE2b-256	`f09791d9c5379c6576862d680fffb44e8bd68ce3adab66dd30084ba94499d274`

See more details on using hashes here.

swem 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

SWEM

Installation

Example

Functional API

Japanese

English

Set random seed

Download pretained w2v and use it.

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes