Embeddings for software modeling
Project description
WordE4MDE
Installation 🛠
With conda (recommended)
This repo is written in Python and Java. Thus, I recommend to use conda. To initialize the conda environment, just execute:
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
conda env create --file=conda_venv.yml
conda activate word2vec-mde
python -m nltk.downloader all
After that, you need to download ModelSet dataset as all the experiments were run over this dataset.
python -m modelset.downloader
Without conda
You need to install:
- Python 3.8.X
- Openjdk 1.8
- Maven 3.8.6
Generate a virtual environment and then install the requirements.
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
python3.8 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m nltk.downloader all
After that, you need to download ModelSet dataset.
python -m modelset.downloader
Download trained embeddings 🚀
To download the WordE4MDE embeddings just run the following:
./scripts/download_embeddings.sh
Exploring embeddings 📋
Let us consider the following list of words:
['state', 'atl', 'dsl', 'grammar',
'petri', 'statechart', 'ecore', 'epsilon',
'qvt', 'transformation']
The commands below compute, for each word model, the top 10 similar words for each word of the previous list:
python main.py --test_similarity --model glove-mde
python main.py --test_similarity --model skip_gram-mde
python main.py --test_similarity --model glove-wiki-gigaword-300
python main.py --test_similarity --model word2vec-google-news-300
Using the embeddings for meta-model classification, clustering and recommendation 📋
Meta-model classification task:
python main.py --evaluation_metamodel_classification --remove_duplicates
Meta-model clustering task:
python main.py --evaluation_metamodel_clustering --remove_duplicates
Meta-model concepts task (the parser is applied to the ModelSet dataset, and then the recommendation systems are trained and evaluated):
cd java/parser
mvn compile
mvn exec:java
cd ../..
python main.py --evaluation_metamodel_concepts --remove_duplicates --device cpu --context_type EEnum
python main.py --evaluation_metamodel_concepts --remove_duplicates --device cpu --context_type EPackage
python main.py --evaluation_metamodel_concepts --remove_duplicates --device cpu --context_type EClass
Example of recommendations:
python main.py --example_recommendation --model glove-mde --context_type {EClass, EPackage, EEnum} --remove_duplicates
python main.py --example_recommendation --model skip_gram-mde --context_type {EClass, EPackage, EEnum} --remove_duplicates
python main.py --example_recommendation --model glove-wiki-gigaword-300 --context_type {EClass, EPackage, EEnum} --remove_duplicates
python main.py --example_recommendation --model word2vec-google-news-300 --context_type {EClass, EPackage, EEnum} --remove_duplicates
Playground
If you want to test the word embeddings without installation, you can use our Playground.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file worde4mde-1.1.tar.gz.
File metadata
- Download URL: worde4mde-1.1.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e36f5d020b8242293b28c2ea22a82e09a11f1e1919f040714607caee73d9192e
|
|
| MD5 |
adb5616e486cca720f3b77a7fbaf36f0
|
|
| BLAKE2b-256 |
7884979a05d8694e8faca299c23b20da9145f8726ac97b218e51fc871c13e3b6
|
File details
Details for the file worde4mde-1.1-py3-none-any.whl.
File metadata
- Download URL: worde4mde-1.1-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b14a4bac423bc92dc4fb78da9277219768b332f60f0cacffda34321638f55ced
|
|
| MD5 |
ed3e1651bde96f8203e49950f7158d2b
|
|
| BLAKE2b-256 |
dee7938ab99db3fb408186ab83a88626fdf2f523d9b006f7b0b6ac20176e7b32
|