Downloading all files of a language from the OSCAR (Open Super-large Crawled Aggregated coRpus)
Project description
Downloading language data from OSCAR automated
Features
- Adds dodc and dodg as command line tools
dodc
: command line variant provided with arguments to download datadodg
: gui variant to download data, put arguments into input fields
Usage
dodc
To get help with the command line tool use dodc -h
from a shell.
The command line tool needs to be supplied with multiple arguments:
- user: The user used to login to the site providing OSCAR.
- password: The password used to login to the site providing OSCAR.
- base_url: The url where the language files are hosted.
- out: The folder where files should be downloaded to.
- chunk_size (optional): Defaults to 4096. The size of the chunks files are downloaded in.
dodg
The gui tool internally calls the command line tool dodc
.
Instead of providing arguments to the command line you can enter these into input fields directly and they will be passed downward to the command line tool.
Installation
Simple Installation
pip install download-oscar
will install the requirements and the tool with one command.
Installing from source
Requirements
- Requires Python in version 3.
- Requires Requests
- Requires html5lib
- Requires BeautifulSoup
- Requires PySimpleGUI
- Requires tqdm
Building
- install Python
git clone https://github.com/xamm/download_oscar.git
cd download_oscar
- (optional) create a virtual enironment
pip install -r requirements.txt
pip install -e .
will install the tool in development mode.
Release a new version
- All pushed git commits and pull requests on the
main
branch trigger an automatic build and packaging for pypi- commits without a tag only trigger packaging for TestPyPi
- commits with a tag will also push to PyPi
- A new version number must be specified in
setup.py
in order for publishing to work- publishing is trigerred on creation of a
tag
on themain
branch - e.g.
git tag -a v0.0.1 -m 'Release 0.1' and
git push origin v0.0.1` - easiest procedure:
- work on your code
- add & commit changes
- push changes
- create tag
- push tag
- publishing is trigerred on creation of a
Licence
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file download_oscar-2.1.tar.gz
.
File metadata
- Download URL: download_oscar-2.1.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd433a92c3cf0f1fb65de5138254ef7177fd0564b4b55883af90a289063e9b74 |
|
MD5 | 9a2f0b5bbf2edbb9923c2c9f3cbedb83 |
|
BLAKE2b-256 | cfe2562f0357bad02cbfc9f9bbd5fb642db867eaaa49623959c47cbbed8f7da0 |
File details
Details for the file download_oscar-2.1-py3-none-any.whl
.
File metadata
- Download URL: download_oscar-2.1-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6fae13e5c659ba74f060c43b9d98cd24e7dc6ea996cd58b56768fee79ca61a5 |
|
MD5 | a1b2cbd990d4ae9941f0d711ee5966c1 |
|
BLAKE2b-256 | 1476ca5fa57e559d5b6207509573e37251dc5b0e71017943cc5dd57ef8b6debf |