Graphic association learning: Multimodal feature extraction and fusion of image and text using artificial intelligence techniques.

Guangyun Lu , Zhiping Ni , Ling Wei , Junwei Cheng , Wei Huang

Heliyon

College of automotive Engineering, Liuzhou Institute of Technology, 545616, Liuzhou, Guangxi, China.

Published: September 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

With the advancement of technology in recent years, the application of artificial intelligence in real life has become more extensive. Graphic recognition is a hot spot in the current research of related technologies. It involves machines extracting key information from pictures and combining it with natural language processing for in-depth understanding. Existing methods still have obvious deficiencies in fine-grained recognition and deep understanding of contextual context. Addressing these issues to achieve high-quality image-text recognition is crucial for various application scenarios, such as accessibility technologies, content creation, and virtual assistants. To tackle this challenge, a novel approach is proposed that combines the Mask R-CNN, DCGAN, and ALBERT models. Specifically, the Mask R-CNN specializes in high-precision image recognition and segmentation, the DCGAN captures and generates nuanced features from images, and the ALBERT model is responsible for deep natural language processing and semantic understanding of this visual information. Experimental results clearly validate the superiority of this method. Compared to traditional image-text recognition techniques, the recognition accuracy is improved from 85.3% to 92.5%, and performance in contextual and situational understanding is enhanced. The advancement of this technology has far-reaching implications for research in machine vision and natural language processing and open new possibilities for practical applications.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11417159	PMC
http://dx.doi.org/10.1016/j.heliyon.2024.e37167	DOI Listing

Publication Analysis

Top Keywords

natural language

language processing

artificial intelligence

advancement technology

image-text recognition

mask r-cnn

recognition

graphic association

association learning

learning multimodal

A PHP Error was encountered