In Fall 2018, I took a Computer Vision course that required us to think of a unique application of Computer Vision and implement a prototype of the same. I was fortunate that I found some amazing teammates such as Dipak Narayan, Akanksha Sunalkar and Rajith Weerasinghe with whom I completed this project. Following is a brief description of our project:
Our Professor, Prof. Jason Corso put out the word in his lecture that he is interested in developing an app that would automatically let us know the locations of available parking spots near our current location. It is an ambitious idea given that it would be a live map that would need to be updated continuously in some automated way.
We came up with a plan that if such a project needs to be executed then first we need a way by which we can automatically detect where a parking spot is and whether it is vacant or not. Clearly, installing cameras near parking spots isn’t an economical solution. Therefore, we thought how about we use cars and the dashboard cameras on them to detect these spaces? Sure, we would need these cameras to also relay the information to the internet in some way but that is a different question. Hence our goal was to automatically detect parking spaces from dashboard cameras.
Canny Edge Detection
In order to find parking regions, we first need to locate the parking lines in an image. To do this, we first use a Canny Edge detector to try and find parking spots.
Unfortunately, this method also detects other blemishes in the road.
Next, we apply the Hough transform on the edge image, to find straight lines within the image. This data includes several lines that are not useful for the task, so this data is filtered by parameters such as location and angle.
The issue with the lines given by the Hough transform is that those lines are usually broken. Also, if the parking lines are too big, there can be two Hough Lines for each parking line. Our goal is to detect each parking line as one single line. We take sample points from the lines generated by the Hough Transform and generate a scatter plot of points below. We can see that the lines are broken and some parking lines are represented by double lines in the image.
Now we try to group these points such that all points in a group belong to a single parking line. This is done by a classical Machine Learning algorithm called hierarchical clustering. We choose the number of clusters based on the steepest drop-off in the dendogram. We then say each parking line can be identified as a cluster.
We should also make sure that while detecting a parking spot we can identify a road so that lines which could potentially appear anywhere else in the image are excluded. In order to find just the ones relevant to this task, some sort of segmentation is needed. This is a challenging problem, however state-of-the-art models are able to provide strong results. There are a number of networks that tackle this task, but for this project, we chose to use an implementation of DeepLabV3 pretrained on the CityScapes dataset. It uses semantic segmentation to automatically identify different objects such as tress, roads and vehicles in the picture.
After getting the labels for every pixel, we need to postprocess the image in order to better capture the road. First of all, we only care about the road label, so other labels are discarded. After, we take the largest connected component among the road segments. This is to prevent misclassified pixels from causing problems later in the pipeline. Also, we dilate the connected component in order to fill in some of the gaps within the image. The resulting binary image is used as a mask during the parking region identification step in order to make sure the parking regions detected are actually in the road, and not, for example, on a building.
For identifying where the vehicles are located within an image, we use a deep network with the Mask-RCNN architecture trained on Microsoft COCO, a database of common objects. This architecture in particular was used because of its ability to identify different instances of the same object. In other words, if there are two cars in an image, the network will be able to differentiate between the two. This is in contrast to other networks like DeepLabV3, which only labels a pixel as car, without being able to tell how many cars are in the image.
Classifying Parking Spots as Vacant or available:
We get the coordinates of the bounding boxes from the instance segmentation. From this we extract the base of the bounding box. We consider the midpoint of the base of the bounding box. This point is chosen because it lies between the wheels of the car and gives accurate results. We then check if this point lies within any parking region. If this point lies within a parking region, we say that parking spot is filled. If a parking spot has no vehicles in it, we say that it is empty.
Despite some misclassifications and non-detections, the prototype demonstrates that this is a viable product which, given more time to develop, would be effective in performing its task.