PREMISE
“Detecting vehicles within a road corridor using computer vision techniques for the purposes of automated traffic counts.”
SYNOPSIS
Town planning is a profession that dramatically influences the way in which people live their lives. However, through my practical experience within the field I have become increasingly aware of the shortcomings of the profession, particularly in relation to decision making processes. Unlike similar professions within the field, for example engineering, town planning decisions are often made based on qualitative assessments of public spaces, with only limited considerations given to quantitative measures. This is often due to the fact that the use of quantitative data is laborious and expensive to collect, analyse and utilise. Artificial intelligence, and specifically computer vision, presents the potential to massively reduce the cost and difficulty of utilising quantitative data. I aimed to use this technology to conduct one of the most basic forms of public infrastructure analysis, traffic counts.
SUBSTANTIATION
Cities around the world prioritise and plan their cities in different ways, in keeping with local culture and contextual opportunities and limitations. One element that is critical in all these cities is their transportation networks. The intrinsic complexity of planning transport networks often lies in a lack of fundamental rights and wrongs in decision making processes. In order to justify decision making planners are often reliant on modelling and assessment offered by external contractors. These solutions are often tightly kept secrets, in terms of how they operate and the metrics that assist in the formation of the model. Providing planners with the opportunity to collect, model and use their own data would dramatically change the way in which decision are made.
Currently, there exist multiple forms of computer vision techniques that are used for object detection, classification and tracking. One of the most commonly used form of computer vision technology utilises background and foreground separation techniques in order to detect movement in a video frame (Ismail, 2010). This technique is extremely accurate at detecting objects and is excellent at the tracking of objects as they move (Ismail, 2010). Unfortunately, this technology currently lacks the ability to effectively classify objects without considerable refinement and fine-tuning (Ismail, 2010).
Interestingly, despite a general lack of uptake practically, computer vision is identified by AustRoads, the authority on road planning and design in Australia, as having considerable theoretical potential for assessing road corridors (Green, Lewis, Head, Ward, & Munro, 2020). The application of computer vision technology and its demonstrated ability to classify and track vehicles, encourages the idea that it is only a matter of time before it forms the basis for traffic monitoring (Ismail, 2010). When considering the fac that current technologies used for traffic monitoring have limited capabilities and are often both costly and disruptive to install, computer vision presents a way to redefine the space (Setchell, 1997). In relation to this, computer vision also presents the ability for application to significantly reduce manual workloads, by creating the potential for automated proactive decisions (Mandal, Mussah, Jin, & Adu-Gyamfi, 2020). An additional advantage of computer vision technologies is the fact they would also be able to perform effectively in varied conditions without being impacted. This would require considered set up and calibration, however it would allow for data collection in different weather conditions and light scenarios (Mandal, Mussah, Jin, & Adu-Gyamfi, 2020).
PREPARATION
Before diving headfirst into attempting to create a prototype I conducted some research to see if I could find similar applications of computer vision. Through my research I was able to find multiple examples on YouTube of vehicle tracking with varying approaches and methodologies for the collection of data. From my investigation I found that in most scenarios the roads being monitored were filmed from a bridge crossing a highway. This mean that vehicles moved in relatively simple ways, generally moving in straight lines. While useful, I determined that for the practical use of this technology I was more interested in using a camera that didn’t have as clear imagery, in order to test the capabilities of what was possible with the technology. As such, I recorded a roundabout intersection from a 2 story apartment in my hometown. The angle of recording was also at a slight angle so as to really test the detection and tracking abilities of the technology. An image of the footage used is viewable below.
[aesop_image img=”https://cmdstudio.nl/mddd/wp-content/uploads/sites/58/2021/03/car.png” panorama=”off” align=”center” lightbox=”on” captionsrc=”custom” caption=”This is the intersection of interest and the frame in which will be used for the prototype. This image depicts the complexity of movement through the intersection, both in terms of mode and direction.” captionposition=”center” revealfx=”off” overlay_revealfx=”off”]
PROTOTYPE ITERATION 1 – DETECT AND TRACK
The primary goal of the first prototype was to develop a method for the detection and tracking of objects as they moved through the frame. As mentioned in the substantiation for this project I identified background subtraction as one of the most effective ways for tracking objects. As such, using CV2 functions I was able to extract and analyse moving objects as they moved through the frame.
[aesop_image img=”https://cmdstudio.nl/mddd/wp-content/uploads/sites/58/2021/03/MASK_FRAME.png” panorama=”off” align=”center” lightbox=”on” captionsrc=”custom” caption=”Here is the intersection again with background subtraction applied. The moving objects were detected as contours that moved (in white) in contrast the static background (black)” captionposition=”center” revealfx=”off” overlay_revealfx=”off”]
With the objects now separated from the background I went about applying a methodology to detect and track this objects as they moved through the frame. Using CV2 I was able to add bounding boxes and a central point for each. This effectively allowed me to track all moving objects within the frame.
[aesop_image img=”https://cmdstudio.nl/mddd/wp-content/uploads/sites/58/2021/03/BBOX_V1.png” panorama=”off” align=”center” lightbox=”on” captionsrc=”custom” caption=”The first attempt at adding bounding boxes to detections had a few flaws. Bounding boxes and centre points were added ALL contours, even ones to small to be a car OR for every single contour within the car. As such further calibration was required.” captionposition=”center” revealfx=”off” overlay_revealfx=”off”]
This iteration of the prototype was effective at extracting the background and detecting moving objects. It did however detect things in a visually awkward way. In addition to this detections were not solely of cars, but also pedestrians, something that wasn’t intended for this prototype. As such, parameters to filter the detected contours were added. This included filtering out smaller contours and adjusting the detection area to a smaller region of interest (roi).
[aesop_image img=”https://cmdstudio.nl/mddd/wp-content/uploads/sites/58/2021/03/BBOX_V2.png” panorama=”off” align=”center” lightbox=”on” captionsrc=”custom” caption=”The introduction of a size filter and a filter relating to where detections would occur improved the accuracy of bounding boxes and their centre points.” captionposition=”left” revealfx=”off” overlay_revealfx=”off”]
With this the first stage of the prototyping ended, with the output being a model able to detect and track objects as they moved through the frame.
PROTOTYPE ITERATION 2 – COUNTING OBJECTS
With the prototype now detecting and tracking objects relatively effectively, I moved to attempting to get the model to also count vehicles. In order to do this, I implemented a tracking algorithm, that worked by comparing detections frame to frame and determining if they were the same based on the distance between detections. This ideally would allow for detections to be tracked through frames with a unique ID attached to each detection. Unfortunately, despite efforts to troubleshoot the ID system was not flawless, as detections were not always perfect. This led to the same vehicles sometimes being detected as different objects, leading to errors in counting.
[aesop_image img=”https://cmdstudio.nl/mddd/wp-content/uploads/sites/58/2021/03/CAR_ID0.png” panorama=”off” align=”center” lightbox=”on” captionsrc=”custom” caption=”This image depicts how the first car is initially identified as ID:0.” captionposition=”left” revealfx=”off” overlay_revealfx=”off”]
[aesop_image img=”https://cmdstudio.nl/mddd/wp-content/uploads/sites/58/2021/03/CAR_ID5.png” panorama=”off” align=”center” lightbox=”on” captionsrc=”custom” caption=”This image depicts how the first car initially identified as ID:0, is now identified as ID:5 as it moves further down the road. This indicates that the model has ‘detected’ a ‘new vehicle’ 6 times, when it should have only been detected and identified once.” captionposition=”right” revealfx=”off” overlay_revealfx=”off”]
In order to solve this a new methodology was implemented. By using a line of interest, cars were counted as they passed over it (or at least as they passed the coordinates associated with it). In order to do this the previously recorded ID’s were collected as they passed specific coordinate ranges. Using this method, the model was able to count vehicles at with 86% accuracy. The model was also 100% accurate in its exclusion of both pedestrians, and vehicles within the region of interest, but not crossing the line of interest. At this point text was added in order to visualise the count and the area at which the count occurs. The output for this is visible below.
[aesop_video src=”youtube” id=”TZjLPrGPXqQ” align=”center” caption=”The output of prototype 2.” disable_for_mobile=”on” loop=”on” controls=”on” mute=”off” autoplay=”off” viewstart=”on” viewend=”off” show_subtitles=”off” revealfx=”off” overlay_revealfx=”off”]
REFLECTION
In learning about and using computer vision for this project I have learnt alot about the opportunities and limitations of the technology. Personally, the largest opportunity I see in the use of computer vision technology for transport planning purposes lies in the infinite potential for new uses. While this prototype focused on applying computer vision technology to detect, track and count traffic, further potential uses include counts of different modes of transport and metrics such as speed. I have already begun experimenting with classifying different modes of transport using the YOLOv3 of which a small example can be seen below.
[aesop_video src=”youtube” id=”zIf5tKAkWLM” align=”center” caption=”Example of YOLOv3 mode detection.” disable_for_mobile=”on” loop=”on” controls=”on” mute=”off” autoplay=”off” viewstart=”off” viewend=”off” show_subtitles=”off” revealfx=”off” overlay_revealfx=”off”]
One limitation that cannot be overlooked is the difficulty of understanding the complexity of human behaviour. While the prototype functions well in situations of relative normality, the model struggles to deal with humans behaving in unusual ways, such as stopping in the middle of an intersection, or tailgating one another (both of which are viewable in the output of the final prototype). For me this highlighted that while the system might work in 95% of situations, there will always be that breakable/exploitable 5% that can cause issues if attempts are not made to mitigate issues. Personally, I think this points towards the importance of collecting large amounts of data so that you can plan for as many unusual instances as possible, and work that into your solution.
LAST THOUGHTS
Prior starting this project, the idea of tackling a computer vision project was daunting to say the least. As someone who comes from a relatively non-technical background in town planning, I was genuinely concerned at my ability to even get remotely close to an output that worked or that I was proud of. However, through spending time learning, trying things, failing and then trying again I was able to develop the skills to create something of real value. This project has inspired me to learn further about computer vision, and find more complicated and extensive ways of applying it to improve town planning as a profession.
REFERENCES