How Does Camera Detect Distance

Vehicle Detection and Distance Estimation

Udacity SDCND

When driving a machine, two very important things to do is to a) stay in your lane b) avoid other cars. To do so we demand to know where the road lane is and where the other cars are. Aforementioned story is for self-driving cars, thus the guys at Udacity decided that the last 2 projects of Self-Driving Car Nannodegree. In my previous post I have talked how to notice lane lines, and make it robust against lighting changes and noise. This post tackles the trouble of finding vehicles on an image and estimation of information technology's distance from our car.

The goals/steps of this project are the following:

Extract the features used for classification
Build and train the classifier
Slide the window and identify car on an image
Filter out the false positives
Summate the distance
Run the pipeline on the video

Some parts from advanced lane finding used here, so if you are interested for more in-depth description y'all can read my previous post here .

All the lawmaking and training information tin be constitute at this Github repository

The Car Classifier

Kickoff thing that we demand to do is to create a classifier which classifies auto against not-cars. To do so the dataset is needed, and I have used the dataset provided by Udacity. ( Download: vehicles, not-vehicles). The dataset is a combination of KITTI vision benchmark suite and GTI vehicle image database. GTI motorcar images are grouped into far, left, correct, middle close. The examples of cars and non cars follow:

Car images from left to correct: 1)KITTI, 2)GTI Far three)GTI Nearly four)GTI Left 5)GTI Right

To build a classifier, starting time, the features have to be identified. The features that are going to be used is a mixture of histograms, full images, and Hog-s.

Extracting features

Colour infinite

Colour space is related to the representation of images in the sense of color encodings. There are encodings that are more suitable for ane purpose merely bad for the others. For example, RBG is skilful from the hardware standpoint of view, since information technology is how the pixels are captured and displayed (Bayer filter is i good example) but it does not capture the way humans perceive the colors, which is important for classification tasks. For the chore of classifying cars, I am sure that there is no prescribed colour space which works the all-time. So it has to exist chosen by trial and fault. What I have done, is that I have built the classifier, based on Grunter, color histograms, and total prototype and then changed the color infinite until I got the all-time nomenclature result on a test set. Maybe I am describing problem from top to bottom, but the color space is quite important for explaining and visualizing features. Later on some trial, I found that LUV color infinite works the best. It has the brilliance component 50, also every bit ii (u and 5) chromaticity components. That color space consistently gave ameliorate classification results.

Subsampled and normalized paradigm as a characteristic

The outset and nigh elementary feature is the subsampled paradigm. Once again, by trying and checking classification result the size of a subsampled image is chosen to be 20x20. Likewise, the image is gamma-normalised. That came as an idea while looking to this YouTube video explaining Pig. It was stated that taking a square root of the image normalizes it and gets uniform brightness thus reducing the outcome of shadows. I gave information technology a endeavor and it creates a infinitesimal improvement on the classification. Since it is quite a elementary operation information technology stayed in my code since information technology provides additional robustness. Subsequently normalization and subsampling the image is reshaped into a vector instead of matrix. The original image normalized converted to LUV are:

Histogram of colors

The second grouping of features is color histograms. Initially, I have tried to use simply the histogram of the luminescence channel 50. The cars tin accept different colors, so omitting the chromaticity channels looked similar a natural choice to me. Afterward some testing, I found out that including a histogram of all three color channels improved the test accuracy for a couple of percents which can make a lot of difference. A number of bins in a histogram is selected based on the testing accurateness and 128 bins produce the best result. Here are samples of histograms for paradigm previously shown:

HOG

The concluding, simply probably the most important feature is a histogram of oriented gradients - Grunter. The main idea effectually the HOG, is that histogram is calculated based on the orientations at each pixel calculated using some edge detector. Pixels that are on the edges contribute much more than to the histogram than the edges that aren't. The image is divided in a number of cells and for each cell the orientations are binned. So the HOG basically shows the dominant orientation per cell. You can notice detailed information nigh the Sus scrofa in this YouTube video. The epitome on which the Grunter is calculated is of size 64x64. The number of pixels per jail cell is 8, while the number of cells per block is 1. The number of orientations is 12. The HOG is calculated on all three channels of a normalized image. I have tested these parameters a lot and finally found that this was the optimal option. What I have considered in selection was the number of features generated this way and the obtained accuracy on a examination set. When this fix of parameters were used, a total of 768 features per aqueduct are created. If the number of cells per block is increased to 2, the number of features blows upwards to 2352 per channel. Increase in classification accuracy when using 2 cells per cake wasn't substantial and then I accept chosen to employ one cell per block. Also, I accept tried a higher number of pixels per prison cell, in what case lot of data is lost and accuracy drops while lowering the number of pixels per cell increases the number of features. Images visualizing Hog for each channel are:

Training the classifier

The classifier used is linear back up vector classifier. The dataset is obtained past looping through images of vehicles and non-vehicles and calculating features for those images. Next thing that was performed is to scale the features, which is quite of import for any auto learning algorithm. Afterwards that, the dataset is split into a training and test set, where the test set is 10% of all the information. The classifier is trained with C=1e-4, where this characteristic was selected based on the accuracies of train and test ready. If the departure between the two accuracies is high the training is overfitting the data so the C was lowered. When the test accuracy is low but aforementioned equally training accurateness, the underfitting has occurred so the value of the C was increased. The final accuracy obtained on the test set was 99.55%. Afterwards the preparation, the classifier and scaler were pickled and saved, so that they can be reused when images coming from the camera were processed.

Finding cars on images/videos

The pipeline for finding cars in images and videos is very similar. In fact, the finding cars in videos follow the same pipeline for finding cars in nonetheless images with some additional features. For that reason, the pipeline for a single paradigm will be described offset.

Sliding the window

The first thing to do is to slide the window beyond the screen and endeavour to identify the areas which produce a positive striking at defined classifier. The window that is going to get slid is always the size 64x64 with the overlap of 75%. In some cases the automobile can be bigger than 64x64 pixels so to encompass those cases, the whole image is downscaled. As a consequence, the car is searched on original, and v downscaled images, selected and then that the cars on the original image would be of sizes 80x80, 96x96, 112x112, 128x128 and 160x160. Grunter is calculated merely once per downscaled image and the subregion of Grunter is used when each of windows gets tested if there is a car. Later on the window is slid the whole batch of features calculated for each window gets classified. Hither is the case of an original image, ii regions that become searched for cars and regions with detected cars:

Search region for cars of dimension from left: 1) 64x64, two) 160x160 pixels. The dimension of the automobile is calculated with respect to original image

Images from left: 1) Original paradigm 2)image with detected cars

Calculating the heatmap and identifying cars

Since at that place are multiple detections of the same motorcar the windows had to be grouped somehow. For that example, the heatmap is used. Each pixel in the heatmap holds the number of windows with identified cars which contain that pixel. The college the value of the pixel in the heatmap the more likely it is the office of the auto. The heatmap is thresholded with a threshold of 1, which removes any possible false car detections. After that, the connected components get labeled and the bounding box is calculated. The resulting images are:

Images from left 1)Heatmap 2)Thresholded 3) Heatmap Labels

Removal of the road

The reason for this is that we want to estimate how far ahead of usa is identified machine. In our previous project the perspective transform was plant which maps the road surface to the image and enables us to measure out distances. Perspective transform assumes that object transformed is planar, so to measure distance accurately nosotros need a signal which is on the road surface. To measure out the altitude the midpoint of the lower edge of a bounding box is used. The route surface is removed so that the measurement is performed to the back wheels of the identified car. To do so, the median color of the terminal viii lines of the bounding box is constitute. The first line from the bottom in which more than than 20% of the points are far from the median color is regarded as the new bottom edge of the bounding box. Points 'far' in color are represented in purple colour in the figure below.

Estimating the distance

Before the rectangles around the detected cars are drawn, the lane line is identified. Also, we'll try to assess the distance to the car. Once we get the bounding box and midpoint of its bottom edge, using perspective transform, we can calculate its position on the warped image from the Advanced lane finding project. Nosotros practise non demand to warp the whole image, but just recalculate the position of single on the warped image. On that paradigm, there is a direct correlation betwixt the pixel position and distance in meters, so the distance between the calculated position of the midpoint and the bottom of the image multiplied by number of meters per pixel represents the distance between our car and the auto we have detected. By looking how that altitude changes from frame to frame, we can calculate machine's relative speed, by multiplying the difference between two frames by frames per second and 3.6 to catechumen it to kilometers per hour, instead of meters per second. The epitome with detected car and warped paradigm where distance is measured are:

Images from the left ane) Detected cars ii) Warped epitome

For more than data about perspective transform read my previous post .

The final pace is but to describe everything on a single image. Here is how the concluding result looks like:

Finding the car in videos

For the videos, the pipeline follows the basic pipeline practical to single images. Additionally, because of the temporal dimension, some additional filtering is applied. Hither is what is done:

The bounding boxes of all already detected cars are used when the heatmap is calculated. Those bounding boxes are regarded as if the car has been identified on that spot. That helps avoid flicker and loosing of already identified cars.
The bounding box is averaged over last 21 frames
If the auto is non establish in 5 consecutive frames it is has disappeared. New cars, demand to exist constitute in v consecutive frames to be drawn and considered as existing.

The pipeline is run on both provided videos and works great. No fake detections or not identifying existing cars occur. The video results are:

Discussion

The described pipeline works neat for the provided videos, but that needs to be thoroughly tested on more videos in irresolute lighting conditions. What I found interesting, is that there is a role in projection video where two cars are classified equally 1. The start car partially occludes the second 1, but it still gets classified as a machine. The auto didn't disappear, it is but occluded. More than robust procedure regarding this effect has to be found.

Calculating distance to the machine works quite nice, even ameliorate than I accept expected. However, there are still some issues when the color of the route surface is irresolute, the removal of the route from the bottom of identified bounding box gives the false readings. As well, the speed is quite jumpy, so it likewise has to exist filtered, just even in this form, it can give data of whether the detected car is endmost or moving away from us.

The final thing is that this procedure is very time-consuming. On average it takes about one.4 seconds per iteration (Ubuntu 16.06, 2xIntel(R) Core(TM) i7–4510U CPU @ 2.00GHz, 8GB DDR3) to notice cars and lanes. It is far from existence existent-fourth dimension so that it tin can be employed in a existent cocky-driving auto. By profiling, I have noted that about 50% of the time goes to calculating histograms. This lawmaking needs to be optimized and mayhap rewritten in C/C++.

Source: https://towardsdatascience.com/vehicle-detection-and-distance-estimation-7acde48256e1

Posted by: hodgesnount1981.blogspot.com