Another dot in the blogosphere?

CrashCourse AI episode 9

Posted on: October 14, 2019

Video source

This week’s episode on artificial intelligence (AI) focused on reinforcement learning. This reminded me of the very old school of behaviourism. In this form of learning, AI is “rewarded” for learning how to do something on its own.

The example in the video was learning how to walk. Instead of providing a robot with exact instructions on limb angles, speeds, forces, etc., it learns to walk by trial and error. If it stays up longer and moves further, it gets simple rewards equivalent to “”good job” and “do that again”.

The episode introduced new concepts of agent, environment, state, value, policy, and actions.

If an AI like a robot played a game, the robot is the agent and the game space is its environment. The AI’s state might include its location and what it senses. Values are attached to the AI’s iterations of trial and error — higher values for good attempts, lower values for bad ones.

A policy seems like an overall strategy that the AI uses to get a reward efficiently. It might rely on different actions to do this. It might exploit an existing successful strategy or it might explore a new one.

