Text-Guided Neural Network Training for Image Recognition in Natural Scenes and Medicine.

Zizhao Zhang , Pingjun Chen , Xiaoshuang Shi , Lin Yang

IEEE Trans Pattern Anal Mach Intell

Published: May 2021

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Convolutional neural networks (CNNs) are widely recognized as the foundation for machine vision systems. The conventional rule of teaching CNNs to understand images requires training images with human annotated labels, without any additional instructions. In this article, we look into a new scope and explore the guidance from text for neural network training. We present two versions of attention mechanisms to facilitate interactions between visual and semantic information and encourage CNNs to effectively distill visual features by leveraging semantic features. In contrast to dedicated text-image joint embedding methods, our method realizes asynchronous training and inference behavior: a trained model can classify images, irrespective of the text availability. This characteristic substantially improves the model scalability to multiple (multimodal) vision tasks. We also apply the proposed method onto medical imaging, which learns from richer clinical knowledge and achieves attention-based interpretable decision-making. With comprehensive validation on two natural and two medical datasets, we demonstrate that our method can effectively make use of semantic knowledge to improve CNN performance. Our method performs substantial improvement on medical image datasets. Meanwhile, it achieves promising performance for multi-label image classification and caption-image retrieval as well as excellent performance for phrase-based and multi-object localization on public benchmarks.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2019.2955476	DOI Listing

Publication Analysis

Top Keywords

neural network

network training

text-guided neural

training

training image

image recognition

recognition natural

natural scenes

scenes medicine

medicine convolutional

A PHP Error was encountered