Skip to main content

Open world survival game for reinforcement learning.

Project description

Status: Stable release

Crafter

PyPI

Open world survival environment for reinforcement learning.

Crafter Terrain

If you find this code useful, please reference in your paper:

@misc{hafner2021crafter,
  title = {Crafter: An Open World Survival Benchmark},
  author = {Danijar Hafner},
  year = {2021},
  howpublished = {\url{https://github.com/danijar/crafter}},
}

Overview

Crafter is an open world survival game with visual inputs that evaluates a variety of general abilities within a single environment. It features a randomly generated open-ended worlds with where the player discovers resources and builds tools, all while struggling for survival by finding water, foraging for food, and building shelter to sleep. Crafter aims for be a fruitful benchmark for reinforcement learning research throug these features:

  • Research challenges Procedural generation tests strong generalization, the technology tree tests wide and deep exploration, image observations test representation learning, and hierarchical reasoning can help with repeated subtasks and long-term credit assignment.

  • Meaningful evaluation Agents are evaluated by a list of achievements they can unlock in each episode. The achievements correspond to meaningful milestones in behavior, allowing insights into the spectrum of abilities, both for agents with rewards and unsupervised agents.

  • Computational savings Crafter evaluates a variety of agent abilities within a single environment, vastly reducing the computational requirements of benchmarks suites that require training on many different environments, while making it more likely that the measured performance is representative of new domains.

Play Yourself

You can play the game yourself with an interactive window and keyboard input. The mapping from keys to actions, health level, and inventory state are printed to the terminal.

pip3 install crafter        # Install Crafter
pip3 install pygame         # Needed for human interface
python3 -m crafter.run_gui  # Start the game

Crafter Video

Additional command line options are available for recording the games (--record directory), changing the window resolution (--window 600 600), and pausing the game between key presses (--wait True).

Training Agents

Installation: pip3 install -U crafter

The environment follows the OpenAI Gym interface:

import crafter

env = crafter.Env(seed=0)
obs = env.reset()
assert obs.shape == (64, 64, 3)

done = False
while not done:
  action = env.action_space.sample()
  obs, reward, done, info = env.step(action)

Environment Details

Constructor

To ensure comparability across research papers, we recommend using the environment in its default configuration. Nonetheless, the environment can be configured via its constructor:

crafter.Env(area=(64, 64), view=(9, 9), size=(64, 64), length=10000, seed=None)
Parameter Default Description
area (64, 64) Size of the world in grid cells.
view (9, 9) Layout size in cells; determines view distance.
size (64, 64) Render size of the images in pixels.
length 10000 Time limit for the episode, can be None.
seed None Interger that determines world generation and creatures.

Reward

The reward can either be given to the agent or used as a proxy metric for evaluating unsupervised agents.

The reward is +1 when the agent unlocks a new achievement, -0.1 when its health level decreases, +0.1 when it increases, and 0 for all other time steps. The achievements are as follows:

  • collect_coal
  • collect_diamond
  • collect_drink
  • collect_iron
  • collect_sapling
  • collect_stone
  • collect_wood
  • defeat_skeleton
  • defeat_zombie
  • eat_cow
  • eat_plant
  • make_iron_pickaxe
  • make_iron_sword
  • make_stone_pickaxe
  • make_stone_sword
  • make_wood_pickaxe
  • make_wood_sword
  • place_furnace
  • place_plant
  • place_stone
  • place_table
  • wake_up

The sum of rewards per episode can range from -0.9 (losing all health without any achievements) to 22 (unlocking all achievements and keeping or restoring all health until the time limit is reached). A score of 21.1 or higher means that all achievements have been unlocked.

Termination

The episode terminates when the health points of the agent reach zero. Episodes also end when reaching a time limit, which is 10000 steps by default.

Observation Space

Each observation is an RGB image that shows a local view of the world around the player, as well as the life statistics and inventory state of the agent.

Action Space

The action space is categorical. Each action is an integer index representing one of the possible actions:

Integer Name Requirement
0 noop Always applicable.
1 move_left Flat ground left to the agent.
2 move_right Flat ground right to the agent.
3 move_up Flat ground above the agent.
4 move_down Flat ground below the agent.
5 do Facing creature or material and have necessary tool.
6 sleep Energy level is below maximum.
7 place_stone Stone in inventory.
8 place_table Wood in inventory.
9 place_furnace Stone in inventory.
10 place_plant Sapling in inventory.
11 make_wood_pickaxe Nearby table. Wood in inventory.
12 make_stone_pickaxe Nearby table. Wood, stone in inventory.
13 make_iron_pickaxe Nearby table, furnace. Wood, coal, iron an inventory.
14 make_wood_sword Nearby table. Wood in inventory.
15 make_stone_sword Nearby table. Wood, stone in inventory.
16 make_iron_sword Nearby table, furnace. Wood, coal, iron an inventory.

Info Dictionary

The step function returns an info directionary with additional information about the environment state. It can be used for evaluation and debugging but should not be provided to the agent. The following entries are available:

Key Type Description
inventory dict Mapping from item names to inventory counts.
achievements dict Mapping from achievement names to their counts.
discount float 1 during the episode and 0 at the last step.
semantic np.array Categorical representation of the world.
player_pos tuple X and Y position of the player in the world.

Baselines

Crafter is designed to be challenging for current learning algorithms but not completely out of reach. To verify how challenging the environment is, we trained the DreamerV2 agent on Crafter with rewards. We recommend training for 5M environment steps and reporting the mean score. During this time, the agent makes consistent learning progress.

When training over 10 times longer, the agent also rarely unlocks all achievements during an episode, including finding a diamond. The open research challenge ahead of us is to drastically accelerate the exploration and learning progress and increase the average score by consistently unlocking all achievements.

Questions

Please open an issue on Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crafter-1.6.1.tar.gz (108.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page