Preset method
AdaBins
Introduction
This method is applicable to backbone with multi-level structure such as CNN and ViT. It fuses feature maps at different scales extracted by the backbone to mine the spatial geometric information of the scene. This information is then used to adaptively compute the binary width vector and the corresponding probability distribution to discretely characterize the depth range. Finally, more accurate depth maps are obtained by linear combinations of probability distributions and width vectors.
Citation
@inproceedings{bhat2021adabins,
title={Adabins: Depth estimation using adaptive bins},
author={Bhat, Shariq Farooq and Alhashim, Ibraheem and Wonka, Peter},
booktitle={Computer Vision and Pattern Recognition},
pages={4009--4018},
year={2021}
}
NeWCRFs
Introduction
This method is applicable to backbone with multi-level structure such as CNN and ViT. Multi-scale features are extracted using the backbone, and then neural window fully-connected conditional random fields are employed to model the dependencies between these features for an accurate depth maps.
Citation
@inproceedings{yuan2022neural,
title={Neural window fully-connected crfs for monocular depth estimation},
author={Yuan, Weihao and Gu, Xiaodong and Dai, Zuozhuo and Zhu, Siyu and Tan, Ping},
booktitle={Computer Vision and Pattern Recognition},
pages={3916--3925},
year={2022}
}