Skip to main content

A fixed-wing UAV environment based on gymnasium.

Project description

fly-craft

An efficient goal-conditioned reinforcement learning environment for fixed-wing UAV velocity vector control based on Gymnasium.

PyPI version Downloads GitHub

Demos

The policies are trained by "Iterative Regularized Policy Optimization with Imperfect Demonstrations (ICML2024)". Code

Target velocity vector (v, $\mu$, $\chi$) from (200, 0, 0) to (140, -40, -165)

target velocity vector (v, $\mu$, $\chi$) from (200, 0, 0) to (140, -40, -165)

Target velocity vector (v, $\mu$, $\chi$) from (200, 0, 0) to (120, 50, 170)

target velocity vector (v, $\mu$, $\chi$) from (200, 0, 0) to (120, 50, 170)

Installation

Using PyPI

pip install flycraft

From source

git clone https://github.com/GongXudong/fly-craft.git
pip install -e fly-craft

Usage

Basic usage

import gymnasium as gym
import flycraft

env = gym.make('FlyCraft-v0')  # use default configurations
observation, info = env.reset()

for _ in range(500):
    action = env.action_space.sample() # random action
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()

env.close()

The four methods to initialize environment

# 1.use default configurations
env = gym.make('FlyCraft-v0')

# 2.pass configurations through config_file (Path or str)
env = gym.make('FlyCraft-v0', config_file=PROJECT_ROOT_DIR / "configs" / "NMR.json")

# 3.pass configurations through custom_config (dict), this method will load default configurations from default path, then update the default config with custom_config
env = gym.make(
    'FlyCraft-v0', 
    custom_config={
        "task": {
            "control_mode": "end_to_end_mode",
        }
    }
)

# 4.pass configurations through both config_file and custom_config. FlyCraft load config from config_file firstly, then update the loaded config with custom_config
env = gym.make(
    'FlyCraft-v0',
    config_file=PROJECT_ROOT_DIR / "configs" / "NMR.json",
    custom_config={
        "task": {
            "control_mode": "end_to_end_mode",
        }
    }
)

Configuration Details

Here is an example of the configuration, which consists of 4 blocks:

task

The configurations about task and simulator, including:

  • control_mode Str: the model to be trained, guidance_law_mode for guidance law model, end_to_end_mode for end-to-end model
  • step_frequence Int (Hz): simulation frequency.
  • max_simulate_time Int (s): maximum simulation time, max_simulate_time * step_frequence equals maximum length of an episode.
  • h0 Int (m): initial altitude of the aircraft.
  • v0 Int (m/s): initial true air speed of the aircraft.

goal

The configurations about the definition and sampling method of the desired goal, including:

  • use_fixed_goal Boolean: whether to use a fixed desired goal.
  • goal_v Float (m/s): the true air speed of the fixed desired goal.
  • goal_mu Float (deg): the flight path elevator angle of the fixed desired goal.
  • goal_chi Float (deg): the flight path azimuth angle of the fixed desired goal.
  • sample_random Boolean: if don't use fixed desired goal, whether sample desired goal randomly from ([v_min, v_max], [mu_min, mu_max], [chi_min, chi_max])
  • v_min Float (m/s): the min value of true air speed of desired goal.
  • v_max Float (m/s): the max value of true air speed of desired goal.
  • mu_min Float (deg): the min value of flight path elevator angle of desired goal.
  • mu_max Float (deg): the max value of flight path elevator angle of desired goal.
  • chi_min Float (deg): the min value of flight path azimuth angle of desired goal.
  • chi_max Float (deg): the max value of flight path azimuth angle of desired goal.
  • available_goals_file Str: path of the file of available desired goals. If don't use fixed desired goal and don't sample desired goal randomly, then sample desired goal from the file of available desired goals. The file is a .csv file that has at least four columns: v, mu, chi, length. The column 'length' is used to indicate whether the desired goal represented by the row can be achieved by an expert. If it can be completed, it represents the number of steps required to achieved the desired goal. If it cannot be completed, the value is 0.
  • sample_reachable_goal Boolean: when sampling desired goals from available_goals_file, should only those desired goals with length>0 be sampled.
  • sample_goal_noise_std Tuple[Float]: a tuple with three float. The standard deviation used to add Gaussian noise to the true air speed, flight path elevation angle, and flight path azimuth angle of the sampled desired goal.

rewards

The configurations about rewards, including:

  • dense Dict: The configurations of the dense reward
    • use Boolean: whether use this reward;
    • b Float: indicates the exponent used for each reward component;
    • angle_weight Float [0.0, 1.0]: the coefficient of the angle error component of reward;
    • angle_scale Float (deg): the scalar used to scale the error in direction of velocity vector;
    • velocity_scale Float (m/s): the scalar used to scale the error in true air speed of velocity vector.
  • sparse Dict: The configurations of the sparse reward
    • use Boolean: whether use this reward;
    • reward_constant Float: the reward when achieving the desired goal.

terminations

The configurations about termination conditions, including:

  • RT Dict: The configurations of the Reach Target Termination (used by non-Markovian reward)
    • use Boolean: whether use this termination;
    • integral_time_length Integer (s): the number of consecutive seconds required to achieve the accuracy of determining achievement;
    • v_threshold Float (m/s): the error band used to determine whether true air speed meets the requirements;
    • angle_threshold Float (deg): the error band used to determine whether the direction of velocity vector meets the requirements;
    • termination_reward Float: the reward the agent receives when triggering RT.
  • RT_SINGLE_STEP Dict: The configurations of the Reach Target Termination (used by Markovian reward)
    • use Boolean: whether use this termination;
    • v_threshold Float (m/s): the error band used to determine whether true air speed meets the requirements;
    • angle_threshold Float (deg): the error band used to determine whether the direction of velocity vector meets the requirements;
    • termination_reward Float: the reward the agent receives when triggering RT_SINGLE_STEP.
  • C Dict: The configurations of Crash Termination
    • use Boolean: whether use this termination;
    • h0 Float (m): the altitude threshold below which this termination triggers;
    • is_termination_reward_based_on_steps_left Boolean: whether calculate the reward (penalty) based on the max_episode_step and the current steps;
    • termination_reward Float: the reward when triggers this termination under the condition of 'is_termination_reward_based_on_steps_left == False'.
  • ES Dict: The configurations of Extreme State Termination
    • use Boolean: whether use this termination;
    • v_max Float (m/s): the maximum value of true air speed. when the true air speed exceeding this value, this termination triggers;
    • p_max Float (deg/s): the maximum value of roll angular speed. when the roll angular speed exceeding this value, this termination triggers;
    • is_termination_reward_based_on_steps_left Boolean: whether calculate the reward (penalty) based on the max_episode_step and the current steps;
    • termination_reward Float: the reward when triggers this termination under the condition of 'is_termination_reward_based_on_steps_left == False'.
  • T Dict: The configurations of Timeout Termination
    • use Boolean: whether use this termination;
    • termination_reward Float: the reward when triggers this termination.
  • CMA Dict: The configurations of Continuously Move Away Termination
    • use Boolean: whether use this termination;
    • time_window Integer (s): the time window used to detect whether this termination condition will be triggered;
    • ignore_mu_error Float (deg): when the error of flight path elevator angle is less than this value, the termination condition will no longer be considered;
    • ignore_chi_error Float (deg): when the error of flight path azimuth angle is less than this value, the termination condition will no longer be considered;
    • is_termination_reward_based_on_steps_left Boolean: whether calculate the reward (penalty) based on the max_episode_step and the current steps;
    • termination_reward Float: the reward when triggers this termination under the condition of 'is_termination_reward_based_on_steps_left == False'.
  • CR Dict: The configurations of Continuously Roll Termination
    • use Boolean: whether use this termination;
    • continuousely_roll_threshold Float (deg): when the angle of continuous roll exceeds this value, this termination condition is triggered;
    • is_termination_reward_based_on_steps_left Boolean: whether calculate the reward (penalty) based on the max_episode_step and the current steps;
    • termination_reward Float: the reward when triggers this termination under the condition of 'is_termination_reward_based_on_steps_left == False'.
  • NOBR Dict: The configurations of Negative Overload and Big Roll Termination
    • use Boolean: whether use this termination;
    • time_window Integer (s): the time window used to detect whether this termination condition will be triggered;
    • negative_overload_threshold Float: when the overloat exceeds this value for at least 'time_window' seconds, this termination condition is triggered;
    • big_phi_threshold Float (deg): when the roll angle exceeds this value for at least 'time_window' seconds, this termination condition is triggered;
    • is_termination_reward_based_on_steps_left Boolean: whether calculate the reward (penalty) based on the max_episode_step and the current steps;
    • termination_reward Float: the reward when triggers this termination under the condition of 'is_termination_reward_based_on_steps_left == False'.

Applications

Examples

  1. Examples based on StableBaselines3 and Imitation: https://github.com/GongXudong/fly-craft-examples

Researches on FlyCraft

  1. Xudong, Gong, et al. "Iterative Regularized Policy Optimization with Imperfect Demonstrations." Forty-first International Conference on Machine Learning. 2024.

  2. Xudong, Gong, et al. "Goal-Conditioned On-Policy Reinforcement Learning." Advances in Neural Information Processing Systems. 2024.

  3. Dawei, Feng, et al. "Think Before Acting: The Necessity of Endowing Robot Terminals With the Ability to Fine-Tune Reinforcement Learning Policies." IEEE International Symposium on Parallel and Distributed Processing with Applications. 2024.

Citation

Cite as

@article{gong2024flycraft,
  title        = {FlyCraft: An Efficient Goal-Conditioned Reinforcement Learning Environment for Fixed-Wing UAV Velocity Vector Control},
  author       = {Gong, Xudong and Wang, Hao and Feng, Dawei and Wang, Weijia},
  year         = 2024,
  journal      = {},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flycraft-0.1.3.tar.gz (432.4 kB view details)

Uploaded Source

Built Distribution

flycraft-0.1.3-py3-none-any.whl (487.8 kB view details)

Uploaded Python 3

File details

Details for the file flycraft-0.1.3.tar.gz.

File metadata

  • Download URL: flycraft-0.1.3.tar.gz
  • Upload date:
  • Size: 432.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for flycraft-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8f57deee244570e879ef9dfe11568d518388924d788361bc2a69b9e1bdbf3ac2
MD5 a7e3e03b05c751f8b142f92180ab26b7
BLAKE2b-256 87c9ce7dbc792e769ada0f742a0df648ab683538585fb70c5a6fa7f3bff38327

See more details on using hashes here.

File details

Details for the file flycraft-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: flycraft-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 487.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for flycraft-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 17ce7e08285838fd2d5e75e4852cc8f63b17a24e76f7f9ad00b42ecb32a16140
MD5 38e12b0c1a4ff0230ed46489f2bc87f8
BLAKE2b-256 0eace2bdd9fad0cac676335da31d0e8e07a6fdca120a64caa850058f77582858

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page