Gemma Template is a lightweight Python library for generating templates to fine-tune models like Gemma, LLaMa, and others. It supports multilingual frameworks, offers advanced customization, and ensures precise, dynamic template creation.

These details have not been verified by PyPI

Project links

Project description

Gemma Template

This library was developed for the Kaggle challenge: Google - Unlocking Global Communication with Gemma, sponsored by Google.

Credit Requirement

Important: If you are a participant in the competition and wish to use this source code in your submission, you must clearly credit the original author before the competition's end date, January 14, 2025.

Please include the following information in your submission:

Author: Tu Pham
Kaggle Username: [bigfishdev](https://www.kaggle.com/bigfishdev)
GitHub: [https://github.com/thewebscraping/gemma-template/](https://github.com/thewebscraping/gemma-template)
LinkedIn: [https://www.linkedin.com/in/thetwofarm](https://www.linkedin.com/in/thetwofarm)

Overview

Gemma Template is a lightweight and efficient Python library for generating templates to fine-tune models and craft prompts. Designed for flexibility, it seamlessly supports Gemma, LLaMA, and other language frameworks, offering fast, user-friendly customization. With multilingual capabilities and advanced configuration options, it ensures precise, professional, and dynamic template creation.

Learning Process and Acknowledgements

As a newbie, I created Gemma Template based on what I read and learned from the following sources:

Google Cookbook: Advanced Prompting Techniques
Google Cookbook: Finetune_with_LLaMA_Factory
Google Cookbook: Finetuning Gemma for Function Calling
Alpaca: Alpaca Lora Documention
Unsloth: Finetune Llama 3.2, Mistral, Phi-3.5, Qwen 2.5 & Gemma 2-5x faster with 80% less memory!

Gemma Template supports exporting dataset files in three formats: Text, Alpaca, and GPT conversions.

Multilingual Content Writing Assistant

This writing assistant is a multilingual professional writer specializing in crafting structured, engaging, and SEO-optimized content. It enhances text readability, aligns with linguistic nuances, and preserves original context across various languages.

Key Features:

1. Creative and Engaging Rewrites

Transforms input text into captivating and reader-friendly content.
Utilizes vivid imagery and descriptive language to enhance engagement.

2. Advanced Text Analysis

Processes text with unigrams, bigrams, and trigrams to understand linguistic patterns.
Ensures language-specific nuances and cultural integrity are preserved.

3. SEO-Optimized Responses

Incorporates keywords naturally to improve search engine visibility.
Aligns rewritten content with SEO best practices for discoverability.

4. Professional and Multilingual Expertise

Full support for creating templates in local languages.
Supports multiple languages with advanced prompting techniques.
Vocabulary and grammar enhancement with unigrams, bigrams, and trigrams instruction template.
Supports hidden mask input text. Adapts tone and style to maintain professionalism and clarity.
Full documentation with easy configuration prompts and examples.

5. Customize Advanced Response Structure and Dataset Format

Supports advanced response structure format customization.
Compatible with other models such as LLaMa.
Enhances dynamic prompts using Round-Robin loops.
Outputs multiple formats such as Alpaca, GPT, and STF text.

Installation

To install the library, you can choose between two methods:

1. Install via PyPI:

pip install gemma-template

2. Install via GitHub Repository:

pip install git+https://github.com/thewebscraping/gemma-template.git

Quick Start

Start using Gemma Template with just a few lines of code:

from gemma_template.models import *

template_instance = Template(
         structure_field=StructureField(
         title=["Custom Title"],
         description=["Custom Description"],
         document=["Custom Article"],
         main_points=["Custom Main Points"],
         categories=["Custom Categories"],
         tags=["Custom Tags"],
    ),
)   # Create fully customized structured reminders.

response = template_instance.template(
    title="Gemma open models",
    description="Gemma: Introducing new state-of-the-art open models.",
    document="Gemma open models are built from the same research and technology as Gemini models. Gemma 2 comes in 2B, 9B and 27B and Gemma 1 comes in 2B and 7B sizes.",
    main_points=["Main point 1", "Main point 2"],
    categories=["Artificial Intelligence", "Gemma"],
    tags=["AI", "LLM", "Google"],
    output="A new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety.",
    max_hidden_words=.1,  # set 0 if you don't want to hide words.
    min_chars_length=2,  # Minimum character of a word, used to create unigrams, bigrams, and trigrams. Default is 2.
    max_chars_length=0,  # Maximum character of a word, used to create unigrams, bigrams and trigrams.. Default is 0.
 )  # remove kwargs if not used.
print(response)

Output:

<start_of_turn>user

You are a multilingual professional writer.

Rewrite the text with a more engaging and creative tone. Use vivid imagery, descriptive language, and a conversational style to captivate the reader.

# Role:
You are a highly skilled professional content writer, linguistic analyst, and multilingual expert specializing in structured writing and advanced text processing.

# Task:
Your primary objectives are:
1. Your primary task is to rewrite the provided content into a more structured, professional format that maintains its original intent and meaning.
2. Enhance vocabulary comprehension by analyzing text with unigrams (single words), bigrams (two words), and trigrams (three words).
3. Ensure your response adheres strictly to the prescribed structure format.
4. Respond in the primary language of the input text unless alternative instructions are explicitly given.

# Additional Expectations:
1. Provide a rewritten, enhanced version of the input text, ensuring professionalism, clarity, and improved structure.
2. Focus on multilingual proficiency, using complex vocabulary, grammar to improve your responses.
3. Preserve the context and cultural nuances of the original text when rewriting.

Topics: Artificial Intelligence, Gemma
Keywords: AI, LLM, Google

# Text Analysis:
Example 1: Unigrams (single words)
and => English
built => English
from => English
the => English
research => English
Text Analysis 3: These are common English words, indicating the text is in English.

Example 2: Bigrams (two words)
technology as => English
Text Analysis 2: Frequent bigrams in Vietnamese confirm the language context.

Example 3: Trigrams (three words)
technology as Gemini => English
Text Analysis 3: Trigrams further validate the linguistic analysis and the necessity to respond in English.

# Conclusion of Text Analysis:
The linguistic analysis confirms the text is predominantly in English. Consequently, the response should be structured and written in English to align with the original text and context.

# Response Structure Format:
You must follow the response structure:
**Custom Title (Title):** Rewrite the title to make it concise, memorable, and optimized for SEO.
**Custom Description (Description):** Write description of the article in one or two sentences while focusing on reader benefits and engage curiosity.
**Custom Article (Article):** Rewrite this content to be SEO-friendly. Include relevant tags, optimize the title and subheadings, and ensure the text flows naturally for search engines and readers.
**Custom Main Points (Main Points):** Simplify the original key points to make them clearer and more reader-friendly.
**Custom Categories (Categories):** Assign appropriate categories to the article based text or target audience.
**Custom Tags (Tags):** Create tags to include relevant keywords. Ensure the tags align with popular search queries.

By adhering to this format, the response will maintain linguistic integrity while enhancing professionalism, structure and alignment with user expectations.

# Text:
Gemma open models are built from _____ same research _____ technology as Gemini models. Gemma 2 comes in 2B, 9B _____ 27B and Gemma 1 comes in 2B and 7B sizes.

<end_of_turn>
<start_of_turn>model

## **Custom Title**:
### Gemma open models

## **Custom Description**:
Gemma: Introducing new state-of-the-art open models.

## **Custom Article**:
A new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety.

## **Custom Main Points**:
- Main point 1
- Main point 2

## **Custom Categories**:
- Artificial Intelligence
- Gemma

## **Custom Tags**:
- AI
- LLM
- Google<end_of_turn>

Load Dataset

Returns: Dataset: A Hugging Face Dataset or DatasetDict object containing the processed prompts.

Load Dataset from local file path

prompt_instance = Template()
data_dict = [
    {
        "id": "JnZJolR76_u2",
        "title": "Sample title",
        "description": "Sample description",
        "document": "Sample document",
        "categories": ["Topic 1", "Topic 2"],
        "tags": ["Tag 1", "Tag 2"],
        "output": "Sample output",
        "main_points": ["Main point 1", "Main point 2"],
    }
]
dataset = prompt_instance.load_dataset(data_dict, output_format='text')   # enum: text, gpt, alpaca
print(dataset['text'][0])

Load Dataset from HuggingFace

dataset = gemma_template.load_dataset(
    "your_huggingface_dataset",
    # enum: `text`, `alpaca` and `gpt`.
    output_format='text',
    # Template for instruction the user prompt.
    instruction_template=INSTRUCTION_TEMPLATE,
    # Template for structuring the user prompt.
    structure_template=STRUCTURE_TEMPLATE,
    # Percentage of documents that need to be word masked.
    # Min: 0, Max: 1. Default: 0.
    max_hidden_ratio=.1,
    # Replace 10% of words in the input document with '_____'.
    # Use int to extract the correct number of words. The `max_hidden_ratio` parameter must be greater than 0.
    max_hidden_words=.1,
    # Minimum character of a word, used to create unigrams, bigrams, and trigrams. Default is 2.
    min_chars_length=2,
    # Maximum character of a word, used to create unigrams, bigrams and trigrams. Default is 0.
    max_chars_length=8,
)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

Jan 11, 2025

1.0.0

Jan 3, 2025

0.1.8

Jan 1, 2025

0.1.7

Jan 1, 2025

0.1.6

Dec 30, 2024

This version

0.1.5

Dec 30, 2024

0.1.4

Dec 29, 2024

0.1.3

Dec 29, 2024

0.1.2

Dec 29, 2024

0.1.1

Dec 29, 2024

0.1.0

Dec 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemma_template-0.1.5.tar.gz (32.9 kB view details)

Uploaded Dec 30, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gemma_template-0.1.5-py3-none-any.whl (31.4 kB view details)

Uploaded Dec 30, 2024 Python 3

File details

Details for the file gemma_template-0.1.5.tar.gz.

File metadata

Download URL: gemma_template-0.1.5.tar.gz
Upload date: Dec 30, 2024
Size: 32.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.20

File hashes

Hashes for gemma_template-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`4ce946e0cdb81cbfe5f5ffee995d332c8db0f85824eae01b0a2eb8603c7b75b6`
MD5	`5819c13c961372645e5d618be1e15afb`
BLAKE2b-256	`c637ff8d27a82023db77d336ab9e5314a7d47fa1bb3ae6b8c52c2aafdd5a0ecf`

See more details on using hashes here.

File details

Details for the file gemma_template-0.1.5-py3-none-any.whl.

File metadata

Download URL: gemma_template-0.1.5-py3-none-any.whl
Upload date: Dec 30, 2024
Size: 31.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.20

File hashes

Hashes for gemma_template-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e893433a02ba76be5b76723cda87c018ed647187c32a224b36c39ae2ca17d2cf`
MD5	`1b3ad8edb88ca7eea41092d84d615dc8`
BLAKE2b-256	`786a06432378aea5bf1ac8de44892a704f2251c88180600596f10bde0ce35cc5`

See more details on using hashes here.

gemma-template 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Gemma Template

Credit Requirement

Overview

Learning Process and Acknowledgements

Multilingual Content Writing Assistant

Key Features:

1. Creative and Engaging Rewrites

2. Advanced Text Analysis

3. SEO-Optimized Responses

4. Professional and Multilingual Expertise

5. Customize Advanced Response Structure and Dataset Format

Installation

1. Install via PyPI:

2. Install via GitHub Repository:

Quick Start

Output:

Load Dataset

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes