Ki-Cook: clustering multimodal cooking representations through knowledge-infused learning.

Front Big Data

Department of Computer Science, Artificial Intelligence Research Institute, University of South Carolina, Columbia, SC, United States.

Published: July 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Cross-modal recipe retrieval has gained prominence due to its ability to retrieve a text representation given an image representation and vice versa. Clustering these recipe representations based on similarity is essential to retrieve relevant information about unknown food images. Existing studies cluster similar recipe representations in the latent space based on class names. Due to inter-class similarity and intraclass variation, associating a recipe with a class name does not provide sufficient knowledge about recipes to determine similarity. However, recipe title, ingredients, and cooking actions provide detailed knowledge about recipes and are a better determinant of similar recipes. In this study, we utilized this additional knowledge of recipes, such as ingredients and recipe title, to identify similar recipes, emphasizing attention especially on rare ingredients. To incorporate this knowledge, we propose a knowledge-infused multimodal cooking representation learning network, Ki-Cook, built on the procedural attribute of the cooking process. To the best of our knowledge, this is the first study to adopt a comprehensive recipe similarity determinant to identify and cluster similar recipe representations. The proposed network also incorporates ingredient images to learn multimodal cooking representation. Since the motivation for clustering similar recipes is to retrieve relevant information for an unknown food image, we evaluated the ingredient retrieval task. We performed an empirical analysis to establish that our proposed model improves the Coverage of Ground Truth by 12% and the Intersection Over Union by 10% compared to the baseline models. On average, the representations learned by our model contain an additional 15.33% of rare ingredients compared to the baseline models. Owing to this difference, our qualitative evaluation shows a 39% improvement in clustering similar recipes in the latent space compared to the baseline models, with an inter-annotator agreement of the Fleiss kappa score of 0.35.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10406211PMC
http://dx.doi.org/10.3389/fdata.2023.1200840DOI Listing

Publication Analysis

Top Keywords

multimodal cooking
12
recipe representations
12
knowledge recipes
12
compared baseline
12
baseline models
12
recipe
8
retrieve relevant
8
relevant unknown
8
unknown food
8
cluster recipe
8

Similar Publications

Development of oat-derived biomimetic macrocapsules via hierarchically crosslinked polysaccharide matrix.

Int J Biol Macromol

September 2025

State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China. Electronic address:

The rapid digestion of starch can cause blood sugar spikes, contributing to health issues like diabetes. Encapsulating starch to control its digestibility is a promising strategy in functional food development. A hierarchical core-shell microarchitecture was designed through sequential encapsulation, co-encapsulating oat starch and protein within a nutrient-dense core, followed by the assembly of tunable polysaccharide shells.

View Article and Find Full Text PDF

Noodle quality is intricately regulated through mechanisms during multi-stage resting. Insufficient mechanistic understanding often leads to simplified resting stages in industrial production, resulting in elevated cooking loss and deteriorated textural properties. Consequently, a systematic elucidation of multi-stage resting mechanisms and strategically optimizing resting stages are imperative for quality enhancement and industrial advancement.

View Article and Find Full Text PDF

Objective: This study investigated the impact of nutritional interventions on glycemic and lipid profile factors among workers.

Design & Participants: This prospective before-after study was conducted on 1097 employees of Arfa Iron and Steel Company, Yazd, Iran.

Setting: At baseline, anthropometric indices, and laboratory parameters including lipid profiles, liver enzymes, glucose factors, and blood pressure were measured for all participants.

View Article and Find Full Text PDF

: Cardiometabolic comorbidities are common in multiple sclerosis (MS), and lifestyle interventions are effective in managing these conditions in the general population, though evidence in the MS patient population is limited. : To evaluate the effect of a multimodal lifestyle intervention on serum apolipoproteins (Apo), creatine kinase (CK), glucose, and insulin in people with progressive MS (PwPMS). : This study included = 19 PwPMS who participated in a 12-month multimodal lifestyle intervention (including a modified Paleolithic diet, exercise, neuromuscular electrical stimulation, supplements, and stress reduction).

View Article and Find Full Text PDF

Enhancing robotic skill acquisition with multimodal sensory data: A novel dataset for kitchen tasks.

Sci Data

March 2025

National Key Laboratory of Autonomous Intelligent Unmanned Systems, Shanghai, 201109, China.

The advent of large language models has transformed human-robot interaction by enabling robots to execute tasks via natural language commands. However, these models primarily depend on unimodal data, which limits their ability to integrate diverse and essential environmental, physiological, and physical data. To address the limitations of current unimodal dataset problems, this paper investigates the novel and comprehensive multimodal data collection methodologies which can fully capture the complexity of human interaction in the complex real-world kitchen environments.

View Article and Find Full Text PDF