Abstract:The photometric loss has been playing an important role in the training of video-based unsupervised monocular depth estimation models. However, there are large errors in special areas such as texture-less regions and edge regions, and a more robust photometric loss function is proposed to solve this problem. The photometric loss on the image gradient is calculated to eliminate the unreasonable supervision caused by local brightness changes. At the same time, the difference between successive pixels is used to define the blurry pixels, and then the false supervision caused by the blurred pixels on the target frame and the reconstructed target frame is eliminated based on the binary mask. In the test results of the KITTI dataset, multiple indicators such as the average relative error, the square relative error and the root mean square error have improved, the average relative error and the squared relative error are reduced to 0.075 and 0.548 respectively. The experimental results show that the proposed method further improves the performance of the existing models.