A re-implementation of NACE, as a pypi package, with a cleaner more general interface.
Project description
An observational learner, creating a model of the world from subsequent observations, which can resolve
conflicting information, and plan many steps ahead, in an extremely sample efficient manner.
Background
This project builds upon an implementation of X's NACE work (Paper under review) observational learner, which in turn was based on Berick Cook's AIRIS, with added support for partial observability, capabilities to handle non-deterministic and non-stationary environments, as well as changes external to the agent. X achieved this by incorporating relevant components of Non-Axiomatic Logic (NAL).
The aim of this project is to convert the above work, into a foundation that extra experiments can be performed on.
Examples
import sys
import nace
print("Welcome to NACE!")
# This example uses the code from the original nace.world_module which hard codes
# effects of actions on the 'world'. This complicates the example code, but
# ensures that the use of global variables do not let the planning code to 'cheat'.
if __name__ == "__main__":
# Configure hypotheses to use Euclidean space properties if desired
nace.hypothesis.Hypothesis_UseMovementOpAssumptions(
nace.world_module.left,
nace.world_module.right,
nace.world_module.up,
nace.world_module.down,
nace.world_module.drop,
"DisableOpSymmetryAssumption" in sys.argv,
)
nace.world_module.set_traversable_board_value(' ')
# set the mapping of the movements, the rest are expected to be learnt. (these could be learnt from watching gym
# action and this and last worlds.)
nace.world_module.set_full_action_list(
[nace.world_module.up, nace.world_module.right, nace.world_module.down, nace.world_module.left])
view_dist_x = 3
view_dist_y = 2
num_time_steps = 300
print(
"""
(1) Food collecting +1 for food (f)
(2) cup on table challenge
(3) doors and keys +1 for battery (b) max score==2
(4) food collecting with moving object
(5) pong
(6) bring eggs to chicken
(7) soccer +1 per goal
(8) shock world
(9) interactive world """)
_challenge = input()
if _challenge == "1":
view_dist_x = 3
view_dist_y = 2
if _challenge == "2":
nace.world_module.World_objective = nace.world_module.World_CupIsOnTable
num_time_steps = 1000
if _challenge == "6":
nace.world_module.set_full_action_list(
[nace.world_module.up, nace.world_module.right, nace.world_module.down,
nace.world_module.left, nace.world_module.pick,
])
external_world_nace_format, _, __, ___ = nace.world_module.build_initial_world_object(
_challenge=_challenge,
unobserved_code="."
)
external_npworld = nace.world_module_numpy.NPWorld(
with_observed_time=False,
name="external_npworld",
view_dist_x=100,
view_dist_y=100)
agent_xy_loc, modified_count, _pre_action_world = external_npworld.update_world_from_ground_truth_nace_format(
external_world_nace_format[nace.world_module.BOARD]) # pass in only the board
external_npworld.multiworld_print([{"World": external_npworld}])
global_agent = nace.agent_module.Agent(agent_xy_loc, 0, [])
stepper = nace.stepper_v4.StepperV4()
status = {"score": {"v": 0}}
last_score = 0.0
print_workings = True
for time_counter in range(num_time_steps):
action, behaviour = stepper.get_next_action(
None,
agent_xy_loc,
print_debug_info=print_workings,
available_actions=nace.world_module.get_full_action_list(),
view_dist_x=view_dist_x,
view_dist_y=view_dist_y
)
print("About to enact action ", action, behaviour)
agent_xy_loc, external_world_nace_format, _ = nace.world_module._act(
agent_xy_loc,
external_world_nace_format,
action,
inject_key=None,
external_reward_for_last_action=None)
# copy state from nace format into NPformat
new_xy_loc, ____, _____ = external_npworld.update_world_from_ground_truth_nace_format(
external_world_nace_format[nace.world_module.BOARD]) # pass in only the board
# let stepper update it's internal world state
stepper.set_world_ground_truth_state(external_npworld, new_xy_loc, time_counter)
# let stepper get the latest agent state
status = stepper.set_agent_ground_truth_state(
xy_loc=agent_xy_loc,
score=external_world_nace_format[nace.world_module.VALUES][0],
values_exc_score=external_world_nace_format[nace.world_module.VALUES][1:]
)
if status["score"]["v"] > last_score:
print("Status:", status, "on task", _challenge, "time", time_counter)
last_score = status["score"]["v"] # place breakpoint here to observe when score increases
stepper.predict_and_observe(print_out_world_and_plan=print_workings)
print("Status:", status, "on task", _challenge, "time", time_counter)
Data Structures
= Rule Object =:
Action_Value_Precondition: Prediction: State Value Deltas
Action State Preconditions (old world) y x board score key
values precondition0 precondition1 precondition2 value delta delta
excl y x y x
score
((left, (0,), (0, 0, ' '), (0, 1, 'x'), (0, 2, 'u')), (0, 0, 'x', (0, 0))),
((right, (0,), (0, -1, 'x'), (0, 0, 'o')), (0, 0, 'o', (0, 0))),
The following Action_Value_Precondition:
((right, (0,), (0, -1, 'x'), (0, 0, 'o'))
can be read: Match if there is a 'o' at the focus point, and a 'x' to the left of it, and the action is right.
The following Action_Value_Precondition, Prediction:
((left, (0,), (0, 0, ' '), (0, 1, 'x'), (0, 2, 'u')), (0, 0, 'x', (0, 0))),
can be read: Match if there is a ' ' at the focus point,
and a 'x' to the right of it,
and a 'u' to the right of the 'x',
and the action is left
And the prediction after the action is:
the 'x' will appear at 0,0 relative to the focus point.
and there is no change to our score
The following Action_Value_Precondition, Prediction:
((right, (0,), (0, -1, 'x'), (0, 0, 'f')), (0, 0, 'x', (1, 0))),
can be read: Match if there is a 'f' at the focus point,
and a 'x' to the left of it,
and the action is right
And the prediction after the action is:
the 'x' will appear at 0,0 relative to the focus point.
the first State Delta (score) will be +1
the first State Delta (key) will be +0
Rule_Evidence Object Dictionary
positive negative
evidence evidence
counter counter
{ ((right, ... )) : ( 1, 0 ) }
{ ((left, (), (0, 0, ' '), (0, 1, 'x')), (0, 0, 'x', (0,))): (1,0) }
Positive Evidence, and Negative Evidence can be used to calculate:
Frequency = positive_count / (positive_count + negative_count)
Confidence = (positive_count + negative_count) / (positive_count + negative_count + 1)
Truth_expectation = confidence * (frequency - 0.5) + 0.5
Location:
xy_loc tuple (x,y) not (0,0) is top left
State Values
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file nace-0.0.11.tar.gz
.
File metadata
- Download URL: nace-0.0.11.tar.gz
- Upload date:
- Size: 45.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbc44dfb8e5d2cd1f0ac888f724c184de8cd6c3496362b35a735c02e542e3ec6 |
|
MD5 | 7e3e43b474cd02bddaac532072c5c375 |
|
BLAKE2b-256 | 7e77fc57202f62b93e8b5a2964eb78ad8ee149fc38a02fb3180f7c180fb94cc0 |