« Back

Design and Analysis of a Self-Learning Adaptive Ramp Metering on a Test Case in Toronto

Problem Definition

Freeway congestion forms when demand exceeds the freeway capacity. While congestion result in increased travel time, decreased throughput due to congestion will have a more significant effect on freeway performance. Ramp metering is considered the most effective traffic control measure and has the potential to prevent congestion by limiting the inflow to freeway. Carrying around 420,000 vehicles a day at its busiest segment, highway 401 is one of the busiest freeways in North America. A section of eastbound collector of highway 401 at the merging point of the on-ramps from Keele Street was selected for this study. There is not any bottlenecks immediately downstream of the on-ramps meaning the freeway breakdown at this section is due to excessive demand from on-ramps, which makes this section a plausible choice for this study.
Reinforcement Learning (RL), in which RL agents continuously learn from their interaction with the environment, has attracted significant attention in the transportation field. In the RL algorithm an autonomous agent learns the optimal actions to achieve its long-term goal through trial and error. The major advantage of RL is that it does not require an explicit mathematical model of the network. At each time step, the agent perceives the state of the environment and takes an action, which transits the environment into a new state, and the agent receives a scalar reward signal that evaluates the quality of this transition. The mapping from the environment’s state to agent’s action is known as the agent’s policy which defines the agent’s behavior. The RL agent’s goal is to find the policy that maximizes the reward received over time. The objective of this project is to develop a design guideline for applying RL to ramp metering problem.

Aerial photo and Paramics network of study area. The photo shows the eastbound highway 401 collector at the merging point of Keele St. The markers on the aerial photo show the position of available physical loop detectors for data collection.

Approach and Impact

The project went through three steps, as follows:

Step 1: Development of a Microscopic Simulation Testbed
A Microscopic Simulation Model was built using Paramics (a microscopic simulation platform) to replicate the real traffic behaviour. This network is used for both training and evaluation of different controllers. The network was built using satellite images and GIS maps. The model was thoroughly calibrated using detailed loop detector information (vehicle counts and average speed in 20 sec) to reflect dynamic behaviour of drivers and traffic flow.

Step 2: Design Guidelines for RL-based Ramp Metering
Different strategies and choices for state, action, and reward were implemented and evaluated. By comparing the results a guideline for designing RL-based ramp metering that achieve fast and reliable training with plausible controller performance.

Step 3: Controller Evaluation and Comparison with Benchmark Algorithms
Using the calibrated Paramics network three scenarios were simulated and copard: 1) the network with a controller designed based on proposed guidelines, 2) base case network with ramp metering, and 3) the network with a benchmark ramp metering controller that is proven to have reasonable performance (in this case ALINEA).

Findings and Conclusion

It has been observed that using traffic throughput as the reward will result in minimizing the total travel time of the freeway network. Furthermore, the traffic density downstream of the on-ramp and density of on-ramp traffic are found to be sufficient to model traffic dynamics for RL-based ramp metering controller. Simulation results show that adding a penalty to the reward for severe congestions guides the agent to learn traffic dynamics much faster. The table below shows the performance of the ramp metering controller based on RL compared to ALINEA controller (a well-known ramp metering controller) and the base case with no ramp metering.

Performance Measures
Control Method
No ramp metering
RL-based ramp metering
TTT (veh.hr)
TTT savings
Mainline TTT (veh.hr)
Average on-ramp waiting time (min)


The video below show the effectiveness of ramp metering compared to base case model.



Freeway traffic changes under different controller operations: (a) density variation downstream of the on-ramp, (b) on-ramp queue length

Network Characteristics
  • No. of Trips: 30000 Trips
  • Period of Analysis: AM Peak Period (6:00 – 10:00)
  • No of Nodes: 23
  • No of Links: 25
  • No of Zones: 7
  • Length of Roads: 4 km

  •  PARAMICS: Modeller, Processor, Programmer
  • MATLAB - Parallel Processing Toolbox

Publications / Reports
  • Published Paper: Rezaee, K. , Abdulhai, B., and Abdelgawad, H., “Self-Learning Adaptive Ramp Metering: Analysis of Design Parameters on a Test Case in Toronto”, to be Published in The Transportation Research Record (TRR), 2013.
  • Conference Paper: Rezaee, K. , Abdulhai, B., and Abdelgawad, H., “Application of Reinforcement Learning with Continuous State Space to Ramp Metering in Real-world Conditions”, in Proceedings of the IEEE ITS Conference, Anchorage, September 2012.


Average (0 Votes)

Add Comment Add Comment