Contrastive language and vision learning of general fashion concepts.

Patrick John Chia , Giuseppe Attanasio , Federico Bianchi , Silvia Terragni , Ana Rita Magalhães , Diogo Goncalves , Ciro Greco , Jacopo Tagliabue

Sci Rep

South Park Commons, New York, USA.

Published: November 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The steady rise of online shopping goes hand in hand with the development of increasingly complex ML and NLP models. While most use cases are cast as specialized supervised learning problems, we argue that practitioners would greatly benefit from general and transferable representations of products. In this work, we build on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model adapted for the fashion industry. We demonstrate the effectiveness of the representations learned by FashionCLIP with extensive tests across a variety of tasks, datasets and generalization probes. We argue that adaptations of large pre-trained models such as CLIP offer new perspectives in terms of scalability and sustainability for certain types of players in the industry. Finally, we detail the costs and environmental impact of training, and release the model weights and code as open source contribution to the community.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643437	PMC
http://dx.doi.org/10.1038/s41598-022-23052-9	DOI Listing

Publication Analysis

Top Keywords

contrastive language

language vision

vision learning

learning general

general fashion

fashion concepts

concepts steady

steady rise

rise online

online shopping

Similar Publications

Right hemisphere language network plasticity in aphasia.

Brain

September 2025

Center for Brain Plasticity and Recovery, Center for Aphasia Research and Rehabilitation, Departments of Neurology and Rehabilitation Medicine, Georgetown University Medical Center, Washington, DC, 20057 USA.

Peter E Turkeltaub , Kelly C Martin , Alycia B Laks , Andrew T DeMarco

The role of the right hemisphere in aphasia recovery has been controversial since the 19th century. Imaging studies have sometimes found increased activation in right hemisphere regions homotopic to canonical left hemisphere language regions, but these results have been questioned due to small sample sizes, unreliable imaging tasks, and task performance confounds that affect right hemisphere activation levels even in neurologically healthy adults. Several principles of right hemisphere language recruitment in aphasia have been proposed based on these studies: that the right hemisphere is recruited primarily by individuals with severe left hemisphere damage, that transcallosal disinhibition results in recruitment of right hemisphere regions homotopic to the lesion, and that increased right hemisphere activation diminishes to baseline levels over time.

View Article and Find Full Text PDF

Similar Publications

Influence of second dialect use on the production of first dialect lexical tonesa).

J Acoust Soc Am

September 2025

Department of Linguistics, University of Iowa, Iowa City, Iowa 52242, USA.

Wenqi Zeng , Christine Shea

This study focuses on suprasegmental features and investigates how the use of a second tonal dialect influences the production of tones in the first dialect among bidialectal speakers of Chengdu Mandarin (CM) and Standard Mandarin (SM). Using a word-naming task, this study analyzed the acoustic differences between tones in SM and CM that share similar pitch contours and assessed the impact of SM use on CM tone production. How bidialectal listeners perceptually map SM tones onto CM categories was further evaluated using a dissimilarity rating task.

View Article and Find Full Text PDF

Similar Publications

Medical SAM-Clip Grafting for brain tumor segmentation.

Comput Biol Med

August 2025

The First People Hospital of Foshan, Foshan City CN, China. Electronic address:

Xinjun Yu , Zhoushan Feng , Xiaohong Wu , Jianqiu Chen , Weidong Chen

Brain Tumor Segmentation (BTS) is crucial for accurate diagnosis and treatment planning, but existing CNN and Transformer-based methods often struggle with feature fusion and limited training data. While recent large-scale vision models like Segment Anything Model (SAM) and CLIP offer potential, SAM is trained on natural images, lacking medical domain knowledge, and its decoder struggles with accurate tumor segmentation. To address these challenges, we propose the Medical SAM-Clip Grafting Network (MSCG), which introduces a novel SC-grafting module.

View Article and Find Full Text PDF

Similar Publications

Assessing the diagnostic and treatment accuracy of Large Language Models (LLMs) in Peri-Implant Diseases: a clinical experimental study.

J Dent

September 2025

Dental Clinic Post-Graduate Program, University Center of State of Pará, Belém, Pará, Brazil. Electronic address:

Igor Amador Barbosa , Mauro Sergio Almeida Alves , Paloma Rayse Zagalo de Almeida , Patricia de Almeida Rodrigues , Roberta Pimentel de Oliveira

Objective: This study evaluated the coherence, consistency, and diagnostic accuracy of eight AI-based chatbots in clinical scenarios related to dental implants.

Methods: A double-blind, clinical experimental study was carried out between February and March 2025, to evaluate eight AI-based chatbots using six fictional cases simulating peri-implant mucositis and peri-implantitis. Each chatbot answered five standardized clinical questions across three independent runs per case, generating 720 binary outputs.

View Article and Find Full Text PDF

Similar Publications

Do large language models learn like humans: Interleaved and spaced practice in morphological learning.

Acta Psychol (Amst)

September 2025

Shanghai Jiao Tong University, China. Electronic address:

Ying Xiong , Shiyu Wu

This study investigates fundamental differences in the acquisition of morphological patterns by humans and large language models (LLMs) within an artificial language learning paradigm. Specifically, it compares how each system responds to variations in input structure-blocked versus interleaved sequences and juxtaposed versus spaced presentation-across verb classification and inflection tasks. While LLMs (GPT4mini, DeepSeek_V3, Llama3.

View Article and Find Full Text PDF

Similar Publications