Dog Face Detection Using YOLO Network
Abstract
This work presents the real-world application of the object detection which belongs to one of the current research lines in computer vision. Researchers are commonly focused on human face detection. Compared to that, the current paper presents a challenging task of detecting a dog face instead that is an object with extensive variability in appearance. The system utilises YOLO network, a deep convolution neural network, to~predict bounding boxes and class confidences simultaneously. This paper documents the extensive dataset of dog faces gathered from two different sources and the training procedure of the detector. The proposed system was designed for realization on mobile hardware. This Doggie Smile application helps to snapshot dogs at the moment when they face the camera. The proposed mobile application can simultaneously evaluate the gaze directions of three dogs in scene more than 13 times per second, measured on iPhone XR. The average precision of the dogface detection system is 0.92.
References
Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., Loy, C. C., and Lin, D. Hybrid task cascade for instance segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), pp. 4969-4978.
Chollet, F., et al. Keras. https://github.com/fchollet/keras, 2015.
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A. C. Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017).
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770-778.
Holik, T. Doggiesmile: An ios/iphone application for video dog detection. Master's thesis, Tomas Bata University in Zlin, faculty of Applied Informatics, 2020.
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 7310-7311.
Kazemi, V., and Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition (2014), pp. 1867-1874.
Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (2017), pp. 2980-2988.
Liu, J., Kanazawa, A., Jacobs, D., and Belhumeur, P. Dog breed classication using part localization. In European conference on computer vision (2012), Springer, pp. 172-185.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. Ssd: Single shot multibox detector. In European conference on computer vision (2016), Springer, pp. 21-37.
Parkhi, O. M., Vedaldi, A., Zisserman, A., and Jawahar, C. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition (2012), IEEE, pp. 3498-3505.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. You only look once: Unied, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 779-788.
Redmon, J., and Farhadi, A. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 7263-7271.
Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (2015), pp. 91-99.
Uijlings, J. R., Van De Sande, K. E., Gevers, T., and Smeulders, A. W. Selective search for object recognition. International journal of computer vision 104, 2 (2013), 154-171.
Vlachynska, A., Oplatkova, Z. K., and Turecek, T. Dogface detection and localization of dogface's landmarks. In Computer Science Online Conference (2018), Springer, pp. 465-476.
Yamada, A., Kojima, K., Kiyama, J., Okamoto, M., and Murata, H. Directional edge-based dog and cat face detection method for digital camera. In 2011 IEEE International Conference on Consumer Electronics (ICCE) (2011), pp. 87-88.
Copyright (c) 2020 MENDEL
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
MENDEL open access articles are normally published under a Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA 4.0) https://creativecommons.org/licenses/by-nc-sa/4.0/ . Under the CC BY-NC-SA 4.0 license permitted 3rd party reuse is only applicable for non-commercial purposes. Articles posted under the CC BY-NC-SA 4.0 license allow users to share, copy, and redistribute the material in any medium of format, and adapt, remix, transform, and build upon the material for any purpose. Reusing under the CC BY-NC-SA 4.0 license requires that appropriate attribution to the source of the material must be included along with a link to the license, with any changes made to the original material indicated.