A text processing package with various scenarios and checkers.
Project description
CrafText
CrafText is an extension of the Craftex environment (https://github.com/MichaelTMatthews/Craftax). This extension modifies the environment to be goal-oriented, where the agent's objectives are defined by natural language instructions. The extension includes:
- A set of scenarios: These represent the possible goals an agent might have.
- A set of instructions: These are descriptions of the goals. Each goal can have multiple descriptive variants.
- A set of scenario completion checks: Code corresponding to a specific scenario that takes the agent's state as input and returns a boolean value indicating whether the agent has successfully achieved the goal.
Installation
-
Clone the repository.
-
Create a virtual environment and install the dependencies from
requirements.txt:conda create --name craftext python=3.9 conda activate craftext pip install -r requirements.txt
-
Navigate to the repository and install the dataset:
cd CrafText pip install -e .
Run the PPO Baseline
-
Navigate to the
baselinesdirectory:cd baselines
-
Run the
ppo_with_instruction.pyscript:python ppo_with_instruction.py -
You can configure the settings for the CrafText dataset (i.e., which instructions to use for training) by setting the
--craftext_settingsflag. You can specify your own configuration or choose one from the./craftext/configsdirectory.python ppo_with_instruction.py --craftext_settings simple_build
-
Important: Make sure to specify the same environment for training that is defined in your dataset configuration. For example, if your configuration file specifies
base_environment: Craftax-Classic-Pixels-v1-Text, you need to include the--env_nameargument when running the script to match the environment:python ppo_with_instruction.py --craftext_settings simple_build --env_name "Craftax-Classic-Pixels-v1-Text"
This ensures that the correct environment is used during training, matching the one defined in your dataset configuration.
CrafText dataset configuration file
You can configure a subset of the CrafText dataset for training by specifying different scenario settings. Examples of predefined configurations can be found in the craftext/configs folder. To create your own custom configuration, you need to define 4 fields in a YAML file:
-
dataset_key: Specifies the scenario name. This can be the full name of a specific scenario, such asbuild_square, which will load all tasks involving square structures. Alternatively, you can use broader names likebuildto load all tasks where the agent is required to build something. -
subset_key: Defines the complexity of the instructions. Available options include:ONE: Simple one-step tasks.EASY: Relatively simple instructions.MEDIUM: Tasks with moderate complexity.
Choose the appropriate subset based on the training difficulty you want.
-
base_environment: The environment to use during training. For example:Craftax-Classic-Pixels-v1-Text– this defines the classic Craftax environment with pixel-based visuals and text instructions.
Ensure that this matches the environment your task is designed for.
-
use_paraphrases: A boolean field (TrueorFalse). Set this toTrueif you want to include paraphrased instructions in your training process, orFalseif you prefer using only the original instructions.
Example YAML configuration
dataset_key: build_square
subset_key: EASY
base_environment: Craftax-Classic-Pixels-v1-Text
use_paraphrases: True
Alternative Configuration Using Environment Variables
Instead of specifying the configuration in a YAML file, you can use the CRAFTEXT_SETTINGS environment variable for simpler setups. The format for this variable is as follows:
<scenario> && <instruction_type> && <subset>
Where:
<scenario>: The specific scenario or task type to use (e.g.,build_line,collect_items).<instruction_type>: Choose betweenpure_instructionfor the original set of instructions orinstruction_with_paraphrasesfor instructions with variations.<subset>: Select the subset to train on, such assmall_trainfor simpler instructions, or another custom subset.
Example usage
#!/bin/bash
export CRAFTEXT_SETTINGS="build_line&&pure_instruction&&small_train"
export CRAFTEXT_SETTINGS="build_square&&instruction_with_paraphrases&&medium"
In these examples:
- The first setting configures training to use tasks related to building lines with original instructions from the
small_trainsubset. - The second setting loads square building tasks, using paraphrased instructions from the
mediumsubset.
This method provides flexible control over the dataset, allowing you to adjust scenarios, instruction types, and subsets on the fly without needing to modify YAML files.
Dataset Generation Details
Instruction and Checker Generation Pipeline
- Come up with the scenario.
- Use the standard checker functions and scenario format to write the code for verifying the scenario. Look at the examples (https://github.com/ZoyaV/CrafText/blob/main/checkers/scenarius.py)
- Use the Instruction Generation Prompt and AskTheCode(ChatGPT4o) to create examples of scenario instructions.
Instruction Generation Prompt
The code for verifying played scenarios can be found at the following repository link:
https://github.com/ZoyaV/CrafText/blob/main/checkers/scenarius.py
A scenario consists of instructions provided by Player 1 to Player 2. Player 2 follows these instructions, which are then validated by a corresponding function. For the scenario.py function, please provide realistic examples of instructions that Player 1 might give, along with 5 paraphrases for each.
Requirements:
- When specifying target objects (objects with which the player will interact), use different synonyms in the paraphrases to assess Player 2's vocabulary range.
- Present the target objects in varying orders to evaluate how well Player 2 understands different language structures.
- Sort the paraphrases for each instruction from simplest to most complex language.
- Ensure the instructions are as varied as possible, utilizing a broad vocabulary.
Format your answer as a Python dictionary with the following structure:
instructions = {
'instruction_id': {
'instruction': "Example instruction here",
'instruction_paraphrases': [
"Paraphrase 1 here",
"Paraphrase 2 here",
"Paraphrase 3 here",
"Paraphrase 4 here",
"Paraphrase 5 here"
],
'check_lambda': lambda ...: scenario_function(...): ... # Example usage of the function
}
}
Replace instruction_id with a unique identifier for each instruction, and complete the check_lambda to demonstrate how you would verify the given instruction using the function.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file craftext-0.1.4.tar.gz.
File metadata
- Download URL: craftext-0.1.4.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c6a7ab54bdb576996b09f1238c8e4bccf323b7c23eecb60fda309304417cdd0
|
|
| MD5 |
db244108942605add3a51b3fff547234
|
|
| BLAKE2b-256 |
72f5f6bbb88f9c0f96dd236b885b1d1194a57f10943c40af49e9f42eb3760a1a
|
File details
Details for the file craftext-0.1.4-py3-none-any.whl.
File metadata
- Download URL: craftext-0.1.4-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4bf81a7a121c7e1e1f1ae2f2850192c819c4ba8ad6eacba2df35c928cc76179
|
|
| MD5 |
d31299fee4efff11d717b89c24e9293a
|
|
| BLAKE2b-256 |
713968213d15096fc4a318520332eac235fb94226e72b2cfb4d998d409e2fed3
|