Another dot in the blogosphere?

CrashCourse AI episode 9

Posted on: October 14, 2019


Video source

This week’s episode on artificial intelligence (AI) focused on reinforcement learning. This reminded me of the very old school of behaviourism. In this form of learning, AI is “rewarded” for learning how to do something on its own.

The example in the video was learning how to walk. Instead of providing a robot with exact instructions on limb angles, speeds, forces, etc., it learns to walk by trial and error. If it stays up longer and moves further, it gets simple rewards equivalent to “”good job” and “do that again”.

The episode introduced new concepts of agent, environment, state, value, policy, and actions.

If an AI like a robot played a game, the robot is the agent and the game space is its environment. The AI’s state might include its location and what it senses. Values are attached to the AI’s iterations of trial and error — higher values for good attempts, lower values for bad ones.

A policy seems like an overall strategy that the AI uses to get a reward efficiently. It might rely on different actions to do this. It might exploit an existing successful strategy or it might explore a new one.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

http://edublogawards.com/files/2012/11/finalistlifetime-1lds82x.png
http://edublogawards.com/2010awards/best-elearning-corporate-education-edublog-2010/

Click to see all the nominees!

QR code


Get a mobile QR code app to figure out what this means!

My tweets

Archives

Usage policy

%d bloggers like this: