Downloading all files of a language from the OSCAR (Open Super-large Crawled Aggregated coRpus)
Project description
Downloading language data from OSCAR automated
Features
- Adds dodc and dodg as command line tools
dodc
: command line variant provided with arguments to download datadodg
: gui variant to download data, put arguments into input fields
Usage
dodc
To get help with the command line tool use dodc -h
from a shell.
The command line tool needs to be supplied with multiple arguments:
- user: The user used to login to the site providing OSCAR.
- password: The password used to login to the site providing OSCAR.
- base_url: The url where the language iles are hosted.
- out: The folder where files should be downloaded to.
- chunk_size (optional): Defaults to 4096. The size of the chunks files are downloaded in.
dodg
The gui tool internally calls the command line tool dodc
.
Instead of providing arguments to the command line you can enter these into input fields directly and they will be passed downward to the command line tool.
Installation
- Requires Python in version 3.
- Requires Requests
- Requires html5lib
- Requires BeautifulSoup
- Requires PySimpleGUI
- Requires tqdm
Building from source
- install Python
git clone https://github.com/xamm/download_oscar.git
cd download_oscar
- (optional) create a virtual enironment
pip install -r requirements.txt
python setup.py sdist
Release new version
- All pushed git commits and pull requests on the
main
branch trigger an automatic build and packaging for pypi- commits without a tag only trigger packaging for TestPyPi
- commits with a tag will also push to PyPi
- A new version number must be specified in
setup.py
in order for publishing to work- publishing is trigerred on creation of a
tag
on themain
branch - e.g.
git tag -a v0.0.1 -m 'Release 0.1' and
git push origin v0.0.1` - easiest procedure:
- work on your code
- add & commit changes
- push changes
- create tag
- push tag
- publishing is trigerred on creation of a
Licence
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
download_oscar-1.0.tar.gz
(8.3 kB
view hashes)
Built Distribution
Close
Hashes for download_oscar-1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cd1749d97c884995e503955d679c54829aa3e7b65cc74b74c22729544ca4ec6 |
|
MD5 | 9b76cf0702c8464f07ef3ff881f57833 |
|
BLAKE2b-256 | b5305e0df61f51b165fc4a323a20f62be04444238d2f81343a1c891426bde414 |