read 4 mts
From the amazing results of Atari games to deep minds victory with alpha go, from the stunning breakthroughs in robotic arm manipulation to even beating the professional players at Dota games, the field of reinforcement learning has exploded in the recent years. Let’s understand how ‘reinforcement learning’ is different from the other paradigms of machine learning.
Reinforcement learning is a goal-oriented sequential decision-making process where the learners’ actions impact the later decisions. The learner is not told what actions to take, but should instead discover himself with the help of trial and error, in order to achieve the maximum reward. For example, a robot left in an isolated room must figure out ways to walk properly without hitting the walls.
Let’s understand the basic terminology of reinforcement learning:
- Agent: An agent takes actions. Brain in the above diagram represents the agent. The robot that must figure out how to walk is an agent. Reinforcement problem tries to build an algorithm to the agent.
- Action: Action is a set of all the actions that the agent can take. The directions in which the robot can move are the set of actions it can take.
- State: State means the history of a system which determines its future evolution. A state is an immediate situation in which the agent finds itself after taking an action. The absolute position of the robot in the room is the state it can be in.
- Environment: The environment is a space in which the agent moves. The isolated room is the environment in which the robot must figure out ways to walk. The environment takes the current state of agent and action as an input and returns the reward for that action and the next state of the agent as an output.
- Reward: A reward is a numerical feedback signal by which we measure the success or failure of an agent’s action. It evaluates the agent’s action. If the robot walks properly without hitting the wall, then a positive reward is given to that action. On the other hand, if the robot hits the wall, a negative reward is given.
So, at any time-step, when our agent sees an observation of the environment, it chooses an action for which the environment gives a reward and the next observation. So, our goal is to develop an algorithm for our agent to make decisions which maximizes the overall reward at the end.
Let’s see how is this different from the Supervised and Unsupervised Machine Learning Algorithms.
This differs from the supervised paradigm as it learns from the training set of labelled examples provided by knowledge of an external supervisor. This kind of learning is for the system to extrapolate its responses, such that it acts correctly in conditions not present in the training set. In both supervised and reinforcement learning, the inputs are mapped to an output. But in reinforcement learning, there is a reward function which acts as a feedback to the agent contrary to supervised learning.
In interactive problems, it is not feasible to get instances of the desired behaviour that are correct and representative of all the situations in which the agent can act. So, a heuristic approach will be more beneficial.
Moreover, reinforcement learning is different from unsupervised learning, as it focuses on the extraction of patterns and useful information hidden in the unlabeled data. In unsupervised learning, the algorithms rely on examples of correct behavior, while reinforcement learning tries to maximize a cumulative reward of the agent.
Example of scenarios where these different types of machine learning are used:
|Unsupervised Learning ||Reinforcement |
segmentation to analyze specific
patterns of customers
|Playing board games |
like chess on a
|Recommend movies |
based on the watch history
|Learning driving |
|Anomaly detection ||Investment portfolio |
Features that make reinforcement learning unique to other paradigms of machine learning are:
- Importance of Time: Reinforcement learning is a sequential decision process, where after every step, the agent gets to make decisions and precautions to see how much reward the agent gets and optimize these rewards to get the best possible outcomes.
- Concept of Delayed Rewards: The goal of reinforcement learning is to achieve a cumulative maximum reward. To model the long-run optimality, the agent considers not only the immediate reward but also the next state. The agent will have to learn from a long sequence of actions and insights from the environment, to finally achieve a state with high reward. The agent must be able to learn which of its actions are desirable based on the reward that can take place arbitrarily far in the future.
- Agents action affects its next Input: In reinforcement learning, the agent gets to take actions for the subsequent data it receives in the future. The agent can influence its environment only through its actions.
Hope this article was helpful in giving a fair idea about how Reinforcement Learning is different from other paradigms of Machine Learning.
We will talk about different types of algorithms that are developed for the agent to achieve the goal of maximum cumulative reward in the next article.
Any questions, feedback, suggestions for improvement are most welcome. 🙂