« Back

A Novel Self-Learning Intelligent Traffic Signal Control System for Congested Urban Areas

Problem Definition

The population is steadily increasing worldwide; consequently the demand for mobility is increasing, especially in times of good economy. When the growth in social and economic activities outpaces the growth of transportation infrastructure, congestion is inevitable. Severe congestion and long commutes plague many large urban areas around the word, and the Greater Toronto Area (GTA) is no exception. Congestion wastes time, hampers social and economic activity and harms the environment, all of which deteriorate the quality of our lives. Traffic congestion, as a major player in the economic cycle that has a direct impact on the national GDP, is costing the GTA $6B a year according to 2008 statistics (Metrolinx, 2008). Infrastructure improvements have been primarily used to alleviate traffic congestion until relatively recently. However, tight constraints on financial resources and physical space, as well as environmental considerations, have accentuated the need for alternative options to mitigate traffic congestion.

Therefore, the emphasis has shifted towards improving the existing infrastructure by optimising the utilisation of the available capacity. Intelligent Transportation Systems (ITS) achieve efficient operation of the transportation system – using telecommunication, information technology, and advanced control techniques – without expanding the existing infrastructure or building more roads. In urban areas, advancements in ITS and traffic signal control have the potential to substantially alleviate traffic congestion and long queues at intersections.

Pre-timed and actuated traffic signal control systems are the most commonly used control systems. Pre-timed signal control implements optimised but fixed timing plans; therefore it is not designed to adapt to rapid fluctuations in traffic flow. Although simple and not requiring skilled staff, the old practices of pre-timing traffic signals is laborious, time-consuming and tedious. Moreover, signal timing plans are known to age with time, i.e. many traffic lights operate with timing plans that were designed months or even years ago. Actuated signal control, on the other hand, reacts to changes in the demand patterns by implementing a window of green time (minimum green to maximum green) as opposed to the fixed green time in pre-timed signal control. Although proven to perform better than pre-timed signal control in most cases, actuated signal control does not offer any real-time optimisation of right-of-way allocation to properly adapt to traffic fluctuations. Therefore, actuated signal control is not adaptive to traffic fluctuations and might result in very long queues in grid-like networks.

Traffic Congestion at Signalized Intersections

Approach and Impact

 Adaptive Traffic Signal Control (ATSC) has the potential to efficiently alleviate traffic congestion by adjusting the signal timing parameters in response to traffic fluctuations to achieve a certain objective (e.g. to minimise delay); therefore it has a great potential to outperform both pre-timed and actuated controls (McShane et al., 1998). Several ATSC systems have been implemented worldwide. ATSC, in general, evolved through 4 generations of research and development.

This research presents the development and testing of a novel system of Multi-Agent Reinforcement Learning for an Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC). The MARLIN-ATSC control system is developed to provide self-learning ATSC using a synergetic combination of RL approaches and Game Theory (GT) concepts (from Artificial Intelligence) that enables traffic lights to time themselves in a manner that minimises motorist delay and other externalities.

MARLIN-ATSC, is a 4th generation ATSC system that addresses the challenging limitations associated with previous generations by employing the latest from Artificial Intelligence and Game Theory methods. MARLIN-ATSC is a software program that offers the following innovative features to the ATSC:

1) Distributed design and operation- typical capital cost of centralized ATSC is reported to range from $40,000-80,000 per intersection (as high as $150K in some cases), while decentralized ATSC cost in the range of $10,000 to $30,000 due to the much lower communication requirements;

2) Scalable to accommodate a gradual expansion of the size of network of intersections/controllers as municipalities expand their ATSC coverage;

3) Robust – has no single point of failure as in centralized systems;

4) Model-free - does not require a sophisticated model of the traffic system;

5) Self-learning - reduces human intervention (the most costly component of operating ATSCs in the market);

6) Coordinated – coordinates the operation of intersections in two-dimensional road networks, another new feature that no existing technology offers.

The MARLIN-ATSC was developed in two stages: the first stage is the generic development of the software control logic of Multi-Agent Reinforcement Learning for Integrated Network (MARLIN) for any distributed controllers. The second stage is the design of the input/output parameters of MARLIN for the ATSC specific problem (MARLIN-ATSC). The basic concept of MARLIN is that each controller is represented by an intelligent software agent (at each signalized intersection). Each agent interacts with its environment (e.g., traffic network) in a closed-loop system. The agent iteratively observes the state of the environment (e.g., queue lengths), takes an action accordingly (e.g., switch to another phase), and receives a feedback reward (e.g., delay reduction) for the actions taken. The agent adjusts the policy until it converges to the optimal mapping from states to optimal actions (optimal policy) that maximizes the cumulative reward (e.g., minimizes total delay). MARLIN works in two possible modes: 1) independent mode, i.e. each controller has a smart agent working independently of other agents; and (2) integrated mode, in which each controller observes the states of the neighbouring agents and coordinates their signal control actions.

Achieving coordination between agents in Game Theory (GT) is proven to be infeasible for large number of players (agents) because the state and action spaces are increasing exponentially and the learning speed decreases dramatically with the number of agents, which is known as the curse of dimensionality. MARLIN competitively attains the challenging compromise of achieving coordination-based decentralized control without suffering from the curse of dimensionality, which represents the theoretical contribution of this technology. In the integrated mode, each agent plays a game with all its adjacent intersections in its neighbourhood in which the agent not only learns its optimal control policy but also considers the policies of the neighbours and acts accordingly. A backup independent-mode operation is always in place in case the communication signal is lost between neighbouring intersections.

The system was tested on three networks (i.e., small, medium, large-scale) to ensure the generality of the system design and results.


MARLIN-ATSC: (a) Independent Mode, (b) Integrated Model (r=reward; s=state; a=action)

Literature and Background

In general, ATSC evolved through 4 generations1 of research and development by improving the level of intelligence in each generation. The first generation adopted centralized systems with pre-optimized (off-line) signal timing plans (e.g., SCATS that is currently installed in more than 50 cities worldwide) while the second generation employed centralized systems with on-line optimization (e.g., SCOOT that is currently installed in more than 170 cities worldwide). The third generation employed model-based decentralized systems that require a pre-defined model for the traffic environment (e.g., OPAC and PRODYN that are installed and tested in more than 5 cities in USA). A self-learning Fourth Generation control is seen as the next generation of the ATSC as it maintains the same advantages of the 3rd generation while addressing the issues faced the development of the 3rd generation. The fourth generation systems employ self -learning techniques that are based on direct interaction with the traffic environment, while keeping the system complexity and computational effort reasonable to implement in real-time.


The existing ATSC systems (from generation 1 to 3) suffer from the following limitations that motivated the development of our intelligent software solution: 1) employing centralized control systems that are expensive, not scalable or robust, especially in cases of communication failures; 2) relying on an accurate traffic modelling framework; the accuracy of which is questionable; 3) limiting the coordination to intersections along important arterials in a primitive time-lag or offset based manner without considering the two-dimensional network-wide effect; 4) increasing the complexity of the system exponentially with the increase in the number of controlled intersections; 5) requiring highly skilled labour to operate due to their system complexity.


ATSC Evolution and Future Trend

Values for Customers/Users

The intelligent MARLIN-ATSC integrated system will offer the following values and benefits to end customers/users:

Value Proposition for Urban Traffic Control Management Departments (Customers)

1. Lower Capital Cost: MARLIN-ATSC requirements can be satisfied using inexpensive communications networks, such as wireless networks between intersections, which significantly reduce the capital cost of MARLIN-ATSC system. Our main competitor, SCOOT, requires wired communication networks (e.g. fibre optic)

2. Lower Operation Cost: MARLIN-ATSC has three features that help reduce the operating cost: Scalability, Robustness, and Reduced Human Intervention.

3. Short Payback Period: From an economic return on investment perspective for municipalities based on travel time savings alone, the MARLIN system can pay for itself in less than one month based on our testing on a computer model of downtown Toronto. MARLIN-ATSC was compared against the pretimed and actuated timing plans provided by City of Toronto for 60 intersections. The daily economic benefits (i.e., travel time savings) were estimated to be around $53,000/day. MARLIN-ATSC would cost approximately $1.2M to implement across a network of 60 intersections; consequently the payback period is 23 days!

- Benefits for Travellers (End Users)

In addition to overall average travel time savings and delay reduction, MARLIN-ATSC is intelligent enough to automatically protect critical intersections by "holding back" or "metering" approaching traffic at upstream intersections. Travellers and their communities benefit from: 1) delay savings, 2) travel time savings, 3) savings in vehicle operating cost and fuel consumption, 4) increase mobility, 5) reduction in emissions and 6) overall improvement is mobility, mobility reliability (lower risk of getting stuck in traffic) and improved quality of life.

Summary of Results and Findings

 The MARLIN-ATSC large-scale application was conducted on the lower downtown of Toronto network of 59 signalised intersections. The results were reported for the Base Case (BC) control systems from the field (simulated using signal timing sheets provided by the City of Toronto), MARL-I (represents MARLIN-ATSC Independent Mode with no communication between agents), and MARLIN-C (represents MARLIN-ATSC Integrated Mode with coordination between agents). The analysis of these experiments led to the following findings:

· MARLIN-ATSC algorithms resulted in lower average delay, throughput, queue length, and stop timecompared to those from the BC. The most notable improvements over the BC were reductions in average delay (38%), Std of average queue length (31%), and CO2 emissions (30%);

· MARL-I outperformed the BC in all the MOEs, most notably were the reductions in the average intersection delay (27%) and the C02 emissions (28%). However, in comparing MARLIN-C to MARL-I it was found that the latter experienced relatively higher delays because in MARL-I the actions are only based on local states, thereby resulting in more vehicles being retained in the network at the end of the simulation (6% throughout improvement in MARLIN-C vs 2.8% throughput improvement in MARL-I);

· It was found that MARLIN-ATSC Integrated Mode is essential in the cases of oversaturation when spillback occurs from one intersection to the upstream intersections;

· Agents implementing MARLIN-ATSC Independent Mode struggled to converge under oversaturated conditions due to the violation of the stationary property of the environment associated with the multi-agent learning problem;

· In oversaturated conditions, MARLIN-ATSC Integrated Mode was found to be successful in “metering” traffic to critical intersections. This metering effect resulted in lower green time being allocated for some phases of the upstream intersections (18% in MARLIN-C vs 26% in MARL-I, i.e. MARLIN-C allows 8% less green time for upstream intersections); but interestingly with higher overall vehicle throughput (2573 in MARLIN-C vs 2379 in MARL-I, i.e. MARLIN-C allows 8% more vehicles to leave the network). Metering is an established practice in signal control but is often conducted manually by an experienced operator. MARLIN-C not only automates this metering process but also achieves optimality;

· The effect of coordination in MARLIN-C was more noticeable in intersections with more than two phases (e.g. with advanced protected LT phases) compared to the typical two phase intersections (e.g. North/South, East/West) because agents are optimising both the phasing sequence and the split times for each phase;


Spatial Distribution of Average Delay Improvements

Path to Market: Existing Opportunity

The University of Toronto has jointly agreed with CIMA+, a leading Canadian engineering firm, to work with City of Burlington on a Pilot Study to evaluate the performance of MARLIN-ATSC at two of the busiest areas of the City. This is a strategic opportunity to prove the potential of the system in a real-life environment and will drive additional adoption of our technology with other customers. The sales cycle for this type of product can be long and requires significant validation from other customers; we hope that the City of Burlington will be that first validating customer to leverage.

Our path to market therefore capitalizes on a strong partnership between the University, the industry, and municipalities. The early results from our pilot study the City of Burlington's busiest intersection have been proven to be successful. MARLIN outperformed the existing actuated controller by an average of 76% in travel time, 93% in average speed and 13% in CO2 emissions. 

The City of Burlington is currently considering significant infrastructure expansion for that intersection at a cost exceeding $8M. As a result our team advocated the need to evaluate such massive infrastructure expansion in the simulation environment and compare its results to MARLIN. The findings of the experimental setup concluded that MARLIN outperformed the costly infrastructure expansion by a range of average of 3% in travel time, 3 % in average travel speed, and 4% in CO2 emission factors. The estimated cost of MARLIN is less than $200K, i.e. less than 3% of the expansion cost. At the moment, the City of Burlington is willing to undertake field operation testing at the same intersections, which is a golden opportunity for us. Finding the first customer is often the greatest challenge that start-ups face.

These results were found inspiring and resulted in a Hardwarein- the-loop pilot study that the Toronto ITS Centre and Testbed is currently conducting jointly with the MaRS Innovation Program at the University of Toronto.

Average (1 Vote)

Add Comment Add Comment