Skip to main content

Simple utility tools for translating fastapi documentation

Project description

Translation Toolkit

A collection of scripts to streamline the translation of markdown files using vector stores and deep learning.

Overview

This toolkit provides a set of Python scripts designed to simplify the translation process for markdown files. The scripts leverage embedding models to enhance the accuracy of document retrieval and improve the overall translation workflow.

Scripts

Initialization Script

search_word.py

The search_word.py script initializes a search engine for retrieving relevant documents based on embeddings. It is designed to work with markdown files in multiple languages. The CLI app is based on Typer.

Usage

  1. Installation:

    • Install the module itself:
      • install the toolset: pip install fatush
      • then run the script: fatush run
    • Install the whole project:
      • Clone this repository: git clone https://github.com/alperiox/fatush.git
      • Change the directory to the project folder: cd fatush
      • Install the required dependencies using pip or poetry: pip install -r requirements.txt or poetry install
  2. Configuration:

    • Run the initialization script: python fatush/search_word.py run
    • Follow the prompts to set up the initial configuration.
    • If the config.yaml file is not found, the script will fetch documents from the FastAPI repo and create the necessary configuration file.
  3. Processing Documents:

    • The script will process the documents based on the provided configuration.
    • It'll split the documents line by line and then calculate their embeddings to set up a FAISS vector store.
  4. Loading Embedding Model:

    • The used embedding model is all-MiniLM-L6-v2, which is rather popular for vectorstores.
  5. Vector Store:

    • If the vector store path is not found in the configuration, it will be constructed and loaded. It'll be loaded automatically otherwise.
    • Currently, the only implemented vector store is FAISS.
  6. Search Engine Initialization:

    • The search engine is initialized with the loaded embedding model and vector store.
    • The script will start the inference process to provide relevant search results.
  7. TODOs:

    • Integration tests
    • Hopefully, a web application based on FastAPI
    • Yet another tool for automatically suggesting initial translations for the given text
    • Currently, the scripts don't cover all exceptions. This may require you to reconfigure the script by deleting the config file and the downloaded repository if you do not run the script directly without configuring any options.

Configuration Options for search_word.py

  • source_lang: Source language code (e.g., 'en').
  • translation_lang: Translation language code (e.g., 'tr').
  • docs_path: Path to the documents (default is the current working directory).
  • vectorstore_path: Path to the vector store (default is the current working directory).

Note

Since the project is built on my experience with translating FastAPI documentation, a nicer abstraction is a must for a more generally usable toolset. That is because there are several hard-coded variables at the moment, like fetching the documentation from the FastAPI repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fatush-0.1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fatush-0.1.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file fatush-0.1.0.tar.gz.

File metadata

  • Download URL: fatush-0.1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/23.2.0

File hashes

Hashes for fatush-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dd3613694d49ac63f1060333b1005d227fc2e036a594ceeb5a5b5284c88f0c49
MD5 8deb3176309052bb7ab9def1da44253f
BLAKE2b-256 c3c4d55699952096b2d9b6067253518a1212917d41acd6b527a32395420aab73

See more details on using hashes here.

File details

Details for the file fatush-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fatush-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/23.2.0

File hashes

Hashes for fatush-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 15e6a7e224cb1d373c7acbe5766315d11f4b866197bc278caed6f8fb3326a15b
MD5 96264d6a9c7b5183d9f5dad50f1b786d
BLAKE2b-256 43b72b0736ddd0cec8082e8de0fb83e630ceac98be50b388141ebd02ad088e9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page