Data extraction and formatting for Scribe applications
Project description
Data extraction and formatting for Scribe applications
This repository contains the scripts for extracting and formatting data from Wikidata for Scribe applications. Updates to the language keyboard and interface data can be done using scribe_data/load/update_data.py.
Contents
Process ⇧
scribe_data/load/update_data.py is used to update all data for Scribe-iOS, with this functionality later being expanded to update Scribe-Android and Scribe-Desktop when they're active. The ultimate goal is that this repository will house language packs that are periodically updated with new Wikidata lexicographical data, with these packs then being available to download by users of Scribe applications.
Supported Languages ⇧
Scribe's goal is functional, feature-rich keyboards and interfaces for all languages. Check the data directory for queries for currently supported languages and those that have substantial data on Wikidata.
The following table shows the supported languages and the amount of data available for each on Wikidata:
Languages | Nouns | Verbs | Translations* | Adjectives† | Prepositions‡ |
---|---|---|---|---|---|
French | 15,788 | 1,246 | 67,652 | - | - |
German | 28,089 | 3,130 | 67,652 | - | 187 |
Italian | 783 | 71 | 67,652 | - | - |
Portuguese | 4,662 | 189 | 67,652 | - | - |
Russian | 194,394 | 11 | 67,652 | - | 12 |
Spanish | 9,452 | 2,062 | 67,652 | - | - |
Swedish | 41,187 | 4,138 | 67,652 | - | - |
*
Given the current beta
status where words are machine translated.
†
Adjective-preposition support is in progress (see issue).
‡
Only for languages for which preposition annotation is needed.
Contributing ⇧
Work that is in progress or could be implemented is tracked in the Issues. Please see the contribution guidelines if you are interested in contributing to Scribe-Data. Also check the -priority-
labels in the Issues for those that are most important, as well as those marked good first issue
that are tailored for first time contributors.
Ways to Help
- Join us in the Discussions 👋
- Reporting bugs as they're found
- Working on new features
- Documentation for onboarding and project cohesion
- Adding language data to Scribe-Data via Wikidata!
Data Edits
Scribe does not accept direct edits to the grammar JSON files as they are sourced from Wikidata. Edits can be discussed and the queries themselves will be changed and ran before an update. If there is a problem with one of the files, then the fix should be made on Wikidata and not on Scribe. Feel free to let us know that edits have been made by opening a data issue and we'll be happy to integrate them!
Featured By ⇧
Powered By
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scribe_data-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 029f5e3989cae336b6423a4305f7a515bf4084b4992efae8fe994b6c4168e43e |
|
MD5 | 1da9a4f4a43f4522d570e00bd547e2c0 |
|
BLAKE2b-256 | 54676795a6f7fe50e92b369b3619aaaff0a3d2935578ec2fe3f8ea929c5cb310 |