Training code and pre-trained models are available at: *
Most of the current works including Faster RCNNs and YOLO consist of some post-processing techniques such as removing near duplicated with NMS (Non-Max suppression).
They also consist of initial pre-computation of anchor boxes.
DETR comes up with an alternative pipeline for object detecting removing the need for anchor boxes and non max suppression.
Learning Objective :
Here N denotes the total predictions that your model will make and is chosen to be a typically large value ( For example N=100 ).
Summary of pipeline:
Hungarian algorithm is used to solve the assignment problem. Suppose there are 3 potholes and 3 workers and we know the cost it takes for each worker to reach every pothole, what would be the best assignment such that the total cost is minimized? This can be solved with the Hungarian algorithm.
where L_box is:
DETR doesn’t perform as great as compared to Faster RCNN on smaller objects. However it performs better on larger objects.
Larger training time and computationally heavy.
Attention in object detection pipeline improves performance as can be looked at from the Non-Local neural networks example.
There are plenty of other drawbacks and a lot of room for improvement.