AuDoLab

With AuDoLab you can perform Latend Direchlet Allocation on highly imbalanced datasets.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Natural Language
- English
Programming Language
- Python :: 3.8

Project description

AuDoLab

With AuDoLab you can perform Latend Direchlet Allocation on highly imbalanced datasets.

Installation

Stable release

To install AuDoLab, run this command in your terminal:

$ pip install AuDoLab

This is the preferred method to install AuDoLab, as it will always install the most recent stable release.

If you don’t have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for AuDoLab can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/ArneTillmann/AuDoLab

Or download the tarball:

$ curl -OJL https://github.com/ArneTillmann/AuDoLab/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

Usage

Before the actuall usage you want to download the stopwords for nltk by running:

import nltk
nltk.download('stopwords')

inside a python console. To use AuDoLab in a project:

from AuDoLab import AuDoLab
import asyncio
import nest_asyncio
nest_asyncio.apply()
from numpy import round as np_round
from numpy import arange as np_arange

Then you want to create an instance of the AuDoLab class

audo = AuDoLab.AuDoLab()

In this example we used publicly available data from the nltk package:

from nltk.corpus import reuters
import numpy as np
import pandas as pd

data = []

for fileid in reuters.fileids():
    tag, filename = fileid.split("/")
    data.append(
        (filename,
         ", ".join(
             reuters.categories(fileid)),
            reuters.raw(fileid)))

data = pd.DataFrame(data, columns=["filename", "categories", "text"])

Then you want to scrape abstracts, e.g. from IEEE with the abstract scraper:

async def scrape():
    return await audo.scrape_abstracts(
        url=None, keywords=["cotton"], in_data="all_meta", pages=5
    )

scraped_documents = asyncio.get_event_loop().run_until_complete(scrape())

The data as well as the scraped papers need to be preprocessed before use in the classifier:

preprocessed_target = audo.preprocessing(data=data, column="text")

preprocessed_paper = audo.preprocessing(
    data=scraped_documents, column="text")

target_tfidf, training_tfidf = audo.tf_idf(
    data=preprocessed_target,
    papers=preprocessed_paper,
    data_column="lemma",
    papers_column="lemma",
    features=100000,
)

Afterwards we can train and use the classifiers and choose the desired one:

classifier = audo.one_class_svm(
    training=training_tfidf,
    predicting=target_tfidf,
    nus=np.round(np.arange(0.01, 0.5, 0.01), 7),
    quality_train=0.9,
    min_pred=0.001,
    max_pred=0.05,
)

df_data = audo.choose_classifier(preprocessed_target, classifier, 2)

And finally you can estimate the topics of the data:

audo.lda_modeling(df_data, num_topics=2)

a = audo.lda_visualize_topics()
html = a.data
with open('html_file.html', 'w') as f:
    f.write(html)

Free software: GNU General Public License v3
Documentation: https://AuDoLab.readthedocs.io.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Natural Language
- English
Programming Language
- Python :: 3.8

Release history Release notifications | RSS feed

1.0.16

Oct 19, 2021

1.0.15

Oct 15, 2021

1.0.14

Oct 15, 2021

1.0.13

Oct 13, 2021

1.0.7

Aug 27, 2021

1.0.6

Aug 26, 2021

1.0.3

Aug 19, 2021

1.0.2

Aug 19, 2021

1.0.1

Aug 19, 2021

1.0.0

Aug 15, 2021

0.1.19

Jul 25, 2021

This version

0.1.15

Apr 30, 2021

0.1.14

Apr 30, 2021

0.1.13

Apr 30, 2021

0.1.12

Apr 30, 2021

0.1.10

Apr 24, 2021

0.1.9

Apr 19, 2021

0.1.8

Apr 19, 2021

0.1.7

Apr 13, 2021

0.1.6

Apr 13, 2021

0.1.5

Apr 13, 2021

0.1.3

Apr 13, 2021

0.1.2

Apr 13, 2021

0.1.1

Apr 13, 2021

0.1.0

Apr 13, 2021

0.0.40

Apr 13, 2021

0.0.39

Apr 13, 2021

0.0.38

Apr 13, 2021

0.0.37

Apr 13, 2021

0.0.36

Apr 7, 2021

0.0.35

Apr 7, 2021

0.0.34

Apr 7, 2021

0.0.33

Apr 7, 2021

0.0.32

Apr 7, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AuDoLab-0.1.15.tar.gz (5.2 MB view hashes)

Uploaded Apr 30, 2021 Source

Built Distribution

AuDoLab-0.1.15-py2.py3-none-any.whl (14.8 kB view hashes)

Uploaded Apr 30, 2021 Python 2 Python 3

Hashes for AuDoLab-0.1.15.tar.gz

Hashes for AuDoLab-0.1.15.tar.gz
Algorithm	Hash digest
SHA256	`5808eaf5bc6d143aa3215f5c3113f4a81ef434289de6bd11d0d202fce6bff65d`
MD5	`479ab86ea4111807a0b1ffb7e3253e1d`
BLAKE2b-256	`fd7d63527bddcf4b3906fa0aebcb247d0e9d91e24f3e2be896b1b6c58991e90f`

Hashes for AuDoLab-0.1.15-py2.py3-none-any.whl

Hashes for AuDoLab-0.1.15-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6c12383ab63d9c2502c2a3267f318b8494afc7c67a136d8e8f634f8603de018`
MD5	`9c93102bd8c615283b86d0fe1f6ac7b4`
BLAKE2b-256	`d66e653ab797e7962bc3dc47f0585d6818396716a0b6ea0626e68db9ae283cf4`