An environment to train drones to search and find a shipwrecked person lost in the ocean using reinforcement learning.

Project description

Drone Swarm Search

Quick Start

Install

pip install DSSE

Use

from DSSE.env import DroneSwarmSearch

PyPi Package Page

https://pypi.org/project/DSSE/

About

The Drone Swarm Search project is an environment, based on PettingZoo, that is to be used in conjunction with multi-agent (or single-agent) reinforcement learning algorithms. It is an environment in which the agents (drones), have to find the targets (shipwrecked people). The agents do not know the position of the target, and do not receive rewards related to their own distance to the target(s). However, the agents receive the probabilities of the target(s) being in a certain cell of the map. The aim of this project is to aid in the study of reinforcement learning algorithms that require dynamic probabilities as inputs. A visual representation of the environment is displayed below. To test the environment (without an algorithm), run basic_env.py.

Outcome

If drone is found	If drone is not found

Basic Usage

from DSSE.env import DroneSwarmSearch

env = DroneSwarmSearch(
    grid_size=50, 
    render_mode="human", 
    render_grid = True,
    render_gradient = True,
    n_drones=4, 
    vector=[0.2, 0.2],
    person_initial_position = [8, 8],
    disperse_constant = 3)

def policy(obs, agent):
    actions = {
        "drone0": 1, # Right
        "drone1": 3, # Down
        "drone2": 0, # Left
        "drone3": 2, # Up
    }
    
    return actions


observations = env.reset(drones_positions=[[5, 5], [43, 5], [43, 43], [5, 43]])
rewards = 0
done = False

while not done:
    actions = policy(observations, env.get_agents())
    observations, reward, _, done, info = env.step(actions)
    rewards += reward["total_reward"]
    done = True if True in [e for e in done.values()] else False

print(rewards)

Installing Dependencies

Using Python version above or equal to 3.10.5.

In order to use the environment download the dependencies using the following command pip install -r requirements.txt.

General Info

Import	from core.environment.env import DroneSwarmSearch
Action Space	Discrete (5)
Action Values	[0,1,2,3,4]
Agents	N
Observation Space	{droneN: {observation: ((x, y), probability_matrix}

Action Space

Value	Meaning
0	Move Left
1	Move Right
2	Move Up
3	Move Down
4	Search Cell

Inputs

Inputs	Possible Values	Default Values
`grid_size`	`int(N)`	`7`
`render_mode`	`"ansi" or "human"`	`"ansi"`
`render_grid`	`bool`	`False`
`render_gradient`	`bool`	`True`
`n_drones`	`int(N)`	`1`
`vector`	`[float(x), float(y)`	`(-0.5, -0.5)`
`person_initial_position`	`[int(x), int(y)]`	`[0, 0]`
`disperse_constant`	`float`	`10`
`timestep_limit`	`int`	`100`

`grid_size`:

The grid size defines the area in which the search will happen. It should always be an integer greater than one.

`render_mode`:

There are two available render modes, ansi and human.

Ansi: This mode presents no visualization and is intended to train the reinforcement learning algorithm.

Human: This mode presents a visualization of the drones actively searching the target, as well as the visualization of the person moving according to the input vector.

`render_grid`:

The render_grid variable is a simple boolean that if set to True along with the render_mode = “human” the visualization will be rendered with a grid, if it is set to False there will be no grid when rendering.

`render_gradient`:

The render_gradient variable is a simple boolean that if set to True along with the render_mode = “human” the colors in the visualization will be interpolated according to the probability of the cell. Otherwise the color of the cell will be solid according to the following values, considering the values of the matrix are normalized between 0 and 1: 1 > value >= 0.75 the cell will be green | 0.75 > value >= 0.25 the cell will be yellow | 0.25 > value the cell will be red.

`n_drones`:

The n_drones input defines the number of drones that will be involved in the search. It needs to be an integer greater than one.

`vector`:

The vector is a list with two values that defines the direction in which the person will drift over time. It is a list with two components where the first value of the list is the displacement in the x axis and the second value is the displacement in the y axis. A positive x value will result in a displacement to the right and vice versa, and a positive y value will result in a displacement downward. A value equal to 1 will result in a displacement of 1 cell per timestamp, a value of 0.5 will result in a displacement of 1 cell every 2 timesteps, and so on.

`person_initial_position`:

The person_initial_position defines the starting point of the target, it should be a list with two values where the first component is the x axis and the second component is the y axis. The y axis is directed downward. The values have to be integers.

`disperse_constant`:

The disperse_constant is a float that defines the dispersion of the probability matrix. The greater the number the quicker the probability matrix will disperse.

`timestep_limit`:

The timestep_limit is an integer that defines the length of an episode. This means that the timestep_limit is essentially the amount of steps that can be done without resetting or ending the environment.

Built in Functions:

`env.reset`:

env.reset() will reset the environment to the initial position. If you wish to choose the initial positions of the drones an argument can be sent to the method. To do so, the following syntax should be considered. env.reset(drones_positions=[[5, 5], [25, 5], [45, 5], [5, 15], [25, 15], [45, 15], [10, 35], [30, 35], [45, 25], [33, 45]])

Each value of the list represents the [x, y] initial position of each drone. Make sure that the list has the same number of positions as the number of drones defined in the environment.

Additionally, to change the vector, a tuple (representing the vector) can be sent as an argument. This can be done using the following syntax: env.reset(vector=(0.3, 0.3)). This way, the person's movement will change according to the new vector.

In the case of no argument env.reset() will simply allocate the drones from left to right each in the next adjacent cell. Once there are no more available cells in the row it will go to the next row and do the same from left to right. The vector will also remain the same as before, when there is no argument in the reset function.

The method will also return a observation dictionary with the observations of all drones.

`env.step`:

The env.step() method defines the drone's next movement. When called upon, the method receives a dictionary with all the drones names as keys and the action as values. For example, in an environment initialized with 10 drones: env.step({'drone0': 2, 'drone1': 3, 'drone2': 2, 'drone3': 5:, 'drone4’: 1, 'drone5': 0, 'drone6': 2, 'drone7': 5, 'drone8': 0, 'drone9': 1}). All drones must be in the dictionary and have an action value associated with it, every drone receives an action in every step, otherwise an error will be raised.

The method returns the observation, the reward, the termination state, the truncation state and info, in the respectful order.

Person movement:

The person's movement is done using the probability matrix and the vector. The vector essentially dislocates the probabilities, which in turn defines the position of the person. The chances of a person being in a cell is determined by the probability of each cell. Moreover, the person can only move one cell at a time. This means that in every step, the person can only move to one of the cells adjacent to the one he is currently at. This was done in order to create a more realistic movement for the shipwrecked person.

Observation:

The observation is a dictionary with all the drones as keys. Each drone has a value of another dictionary with “observation” as key and a tuple as its value. The tuple follows the following pattern, ((x_position, y_position), probability_matrix). An output example can be seen below.

{
    'drone0': 
        {'observation': ((5, 5), array([[0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        ...,
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.]]))
        }, 
    'drone1': 
        {'observation': ((25, 5), array([[0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        ...,
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.]]))
        }, 
    'drone2': 
        {'observation': ((45, 5), array([[0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        ...,
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.]]))
       }, 
       
       .................................
       
    'drone9': 
        {'observation': ((33, 45), array([[0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        ...,
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.]]))
        }
}

Reward:

The reward returns a dictionary with the drones names as keys and their respectful rewards as values, as well as a total reward which is the sum of all agents rewards. For example {'drone0': 1, 'drone1': 89.0, 'drone2': 1, 'total_reward': 91.0}

The rewards values goes as follows:

1 for every action by default
-100000 if the drone leaves the grid
(sum_of_rewards * -1) -100000 if the drone does not find the person after timestep exceeds timestep_limit
-100000 if the drones collide
(probability of cell * 10000) if (probability of cell * 100 > 1) else -100 for searching a cell
10000 + 10000 * (1 - timestep / timestep_limit) if the drone searches the cell in which the person is located

Termination & Truncation:

The termination and truncation variables return a dictionary with all drones as keys and boolean as values. For example {'drone0': False, 'drone1': False, 'drone2': False}. The booleans will be False by default and will turn True in the event of the conditions below:

If two or more drones collide
If one of the drones leave the grid
If timestep exceeds timestep_limit
If a drone searches the cell in which the person is located

Info:

Info is a dictionary that contains a key called "Found" that contains a boolean value. The value begins as False, and is only changed to True once any drone finds the shipwrecked person. The info section is to be used as an indicator to see if the person was found. For example, before finding the shipwrecked person, the dictionary will be {"Found": False}. Once the person is found, the dictionary will be {"Found": True}.

`env.get_agents`:

env.get_agents() will return a list of all the possible agents initialized in the scene, you can use it to confirm that all the drones exist in the environment. For example ['drone0', 'drone1', 'drone2', 'drone3', 'drone4', 'drone5', 'drone6', 'drone7', 'drone8', 'drone9'] in an environment with 10 drones.

`env.close`:

env.close() will simply close the render window. Not a necessary function but may be used.

How to cite this work

If you use this package, please consider citing it with this piece of BibTeX:

@misc{castanares2023dsse,
      title={DSSE: a drone swarm search environment}, 
      author={Manuel Castanares and Luis F. S. Carrete and Enrico F. Damiani and Leonardo D. M. de Abreu and José Fernando B. Brancalion and Fabrício J. Barth},
      year={2023},
      eprint={2307.06240},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      doi={https://doi.org/10.48550/arXiv.2307.06240}
}

Project details

Release history Release notifications | RSS feed

1.1.9

May 30, 2024

1.1.8

May 23, 2024

1.1.7

May 2, 2024

1.1.6

May 2, 2024

1.1.5

May 2, 2024

1.1.4

May 2, 2024

1.1.3

May 2, 2024

1.1.2

May 2, 2024

1.1.1

May 2, 2024

1.0.3

Apr 25, 2024

1.0.2

Apr 18, 2024

1.0.1

Apr 18, 2024

1.0.0

Apr 18, 2024

0.2.17

Apr 12, 2024

0.2.16

Apr 12, 2024

0.2.15

Apr 12, 2024

0.2.14

Apr 12, 2024

0.2.4

Mar 12, 2024

0.2.3

Mar 12, 2024

0.2.2

Mar 12, 2024

0.2.1

Feb 23, 2024

0.2

Feb 23, 2024

0.1.17.20

Feb 23, 2024

0.1.17.19

Feb 23, 2024

0.1.17.18

Feb 23, 2024

0.1.17.16

Feb 22, 2024

0.1.17.15

Feb 22, 2024

0.1.17.14

Feb 20, 2024

0.1.17.13

Feb 20, 2024

0.1.17.12

Feb 20, 2024

0.1.17.11

Feb 20, 2024

0.1.17.10

Feb 20, 2024

0.1.17.9

Feb 20, 2024

0.1.17.8

Feb 20, 2024

0.1.17.7

Feb 20, 2024

This version

0.1.17.6

Feb 20, 2024

0.1.17.5

Feb 20, 2024

0.1.17.4

Feb 20, 2024

0.1.17.3

Feb 19, 2024

0.1.17.2

Feb 19, 2024

0.1.17.1

Feb 15, 2024

0.1.17

Feb 15, 2024

0.1.16

Feb 15, 2024

0.1.15

Feb 15, 2024

0.1.14

Feb 15, 2024

0.1.13

May 25, 2023

0.1.12

May 22, 2023

0.1.11

May 22, 2023

0.1.10

May 22, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DSSE-0.1.17.6.tar.gz (6.2 kB view hashes)

Uploaded Feb 20, 2024 Source

Built Distribution

DSSE-0.1.17.6-py3-none-any.whl (5.6 kB view hashes)

Uploaded Feb 20, 2024 Python 3

Hashes for DSSE-0.1.17.6.tar.gz

Hashes for DSSE-0.1.17.6.tar.gz
Algorithm	Hash digest
SHA256	`a540282c04d7359e4e9fe75fe5f67ba69d46e24c4132e97f8ddc354e64230e69`
MD5	`f021dc053d52dc932bfc380455ea467c`
BLAKE2b-256	`01e4f67c411af7eaa688db0d43c415fc8b9fd4cf1de2f60d9d77c49ddbda5608`

Hashes for DSSE-0.1.17.6-py3-none-any.whl

Hashes for DSSE-0.1.17.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`538902871706956e88aa5bf42ecdc4fc130affafe2ef301d4689212a729933d2`
MD5	`a0ad6de38257617342ade3f15397f3f7`
BLAKE2b-256	`a767bf72ff4e955bec7e72a6557bf26d7ce84e44b905d00961160ec30f461268`

DSSE 0.1.17.6

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Project description

Drone Swarm Search

Quick Start

Install

Use

PyPi Package Page

About

Outcome

Basic Usage

Installing Dependencies

General Info

Action Space

Inputs

grid_size:

render_mode:

render_grid:

render_gradient:

n_drones:

vector:

person_initial_position:

disperse_constant:

timestep_limit:

Built in Functions:

env.reset:

env.step:

Person movement:

Observation:

Reward:

Termination & Truncation:

Info:

env.get_agents:

env.close:

How to cite this work

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`grid_size`:

`render_mode`:

`render_grid`:

`render_gradient`:

`n_drones`:

`vector`:

`person_initial_position`:

`disperse_constant`:

`timestep_limit`:

`env.reset`:

`env.step`:

`env.get_agents`:

`env.close`: