Skip to main content

Groningen Translation Environment

Project description

GroTE: Groningen Translation Environment 🐮

Demo example

An online GroTE demo is available at https://gsarti-grote.hf.space. You can use admin as a login code, and upload one of the files in assets/examples for the editing. The demo will log events to the repository grote/grote-logs.

https://github.com/user-attachments/assets/e31d0841-a480-4013-9f9f-2ee8f3885fc5

Running GroTE locally

  1. Install requirements: pip install -r requirements.txt.
  2. Make sure you have a local npm installation available to run the front-end.
  3. Edit the GroTE config to set your custom login_codes and event_logs_hf_dataset_id. By default, you will be able to access the demo using the admin code, and logs will be written to a local logs directory, and synchronized with a private grote-logs dataset on your user profile in the Hugging Face Hub.
  4. Run grote in your command line to start the server. You will need a Hugging Face token with Write permissions to log edits.
  5. Visit http://127.0.0.1:7860 to access the demo.
  6. Enter your login code and load an example document from assets/examples.
  7. Press "📝 Start" to begin editing the document.

Setting up a new GroTE instance on HF Spaces

  1. Use the "Duplicate this space" option from the original GroTE demo to create a copy in your user/organization profile.
  2. In Settings > Variables and secrets, change the default value of EVENT_LOGS_HF_DATASET_ID, HF_TOKEN and LOGIN_CODES to your desired values (see GroTE config for more details).
  3. Upon running the app and starting the editing, you should see the logs being written to the dataset having the id is specified in EVENT_LOGS_HF_DATASET_ID.

Use or modify the following code to create multiple copies of the app programmatically:

from huggingface_hub import duplicate_space, SpaceHardware

NUM_TRANSLATORS = 5
USER_OR_ORG = "<your_username_or_organization>"
YOUR_HF_TOKEN = "hf_<your_token>"

names = [f"translator-{idx}" for idx in range(1, NUM_TRANSLATORS + 1)]

for name in names:
    duplicate_space(
        from_id="gsarti/grote",
        to_id=f"{USER_OR_ORG}/grote-{name}",
        private=False,
        token=YOUR_HF_TOKEN,
        hardware=SpaceHardware.CPU_BASIC,
        secrets=[
            {
                "key": "HF_TOKEN",
                "value": YOUR_HF_TOKEN,
                "description": " Hugging Face token for logging purposes",
            },
            {
                "key": "LOGIN_CODES",
                "value": f"{name.lower()},admin",
                "description": "List of login codes for the users",
            },
        ],
        variables=[
            {
                "key": "MAX_NUM_SENTENCES",
                "value": "50",
            },
            {
                "key": "EVENT_LOGS_SAVE_FREQUENCY",
                "value": "50",
            },
            {
                "key": "EVENT_LOGS_HF_DATASET_ID",
                "value": f"{USER_OR_ORG}/grote-{name}",
            },
            {
                "key": "EVENT_LOGS_LOCAL_DIR",
                "value": "logs",
            },
            {
                "key": "ALLOWED_TAGS",
                "value": "minor,major",
            },
            {
                "key": "TAG_LABLES",
                "value": "Minor,Major",
            },
            {
                "key": "TAG_COLORS",
                "value": "#ffedd5,#fcd29a",
            }
        ]
    )
for name in names:
    print(f"URL: https://{USER_OR_ORG}-grote-{name}.hf.space\nLogin code: {name.lower()}")

Editing flow with GroTE

  1. Open the webpage of the GroTE interface
  2. Insert the provided login code
  3. Load one of the provided files
  4. Press “📝 Start”
  5. Perform the editing. If needed, use green checkmarks to remove highlights from a segment.
  6. When all segments for the file are finished, click “✅ Done”
  7. A message “Saving trial information. Don't close the tab until the download button is available!” will appear. Do not close the tab.
  8. When the message “Saving complete! Download the output file by clicking the 'Download translations' button below.” appears, click “📥 Download translations” to download the edited files. The file will have the name <LOGIN CODE>_<FILENAME>_output.txt
  9. Click “⬅️ Back to data loading” to return to the file loading page.
  10. If needed, pause and take a break

Steps 2-9 are repeated for each file, which represents a standalone document with ordered segments.

Future developments

While the current version of GroTE is functional, there are several improvements that could be made to enhance the user experience and functionality. I am unlikely to implement these changes in the near future, but I am happy to provide guidance and support to anyone interested in contributing to the project.

  • Separate rendering logic for loading/editing tabs (see ICLR 2024 Papers interface for an example)
  • Use latest Gradio version to integrate features like multi-page structure, client-side functions, and dynamic rendering of components.
  • Enable restoring the previous state of edited sentences if matching filename and user are found in the logs in the past 24 hours (with a modal to enable starting from scratch).
  • Possibly rethink logging format to reduce redundancy and improve readability.
  • Add optional tab to visualize the editing process (e.g., Highlighted diffs between original and edited sentences, replay of editing process by looping .then with time.sleep, download scoped logs for single text).
  • Change saving logic to use BackgroundScheduler
  • Change transition from editing to loading to preserve login code and possibly allow the pre-loading of several files for editing (would require a custom FileExplorer component to mark done documents).

Questions and feedback

If you have any questions or feedback, please feel free to reach out to me at gabriele.sarti996@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grote-0.1.11.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grote-0.1.11-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file grote-0.1.11.tar.gz.

File metadata

  • Download URL: grote-0.1.11.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.12 Darwin/24.5.0

File hashes

Hashes for grote-0.1.11.tar.gz
Algorithm Hash digest
SHA256 1cfbb8c6750600f92bc37741ad07efced0820ef65d8b8687c68e309be9bc88bf
MD5 0d5ade50a208de09788bc9c86c8f34e8
BLAKE2b-256 c929ce44ae20a8f063f108cd8b07750e274ea49efbb00fb84f6dd2b976ecb9f0

See more details on using hashes here.

File details

Details for the file grote-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: grote-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.12 Darwin/24.5.0

File hashes

Hashes for grote-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 d843cb16c8ab8cd206bc914f7cb3e2b621c8ccdc8711c7a990425ee5121c8731
MD5 b2bcf75a8e0f57e739566678d581eb06
BLAKE2b-256 0f9b923187be86e5fb8055d3899846551242f9e3810e7e882912834454d54921

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page