Exploring whether GPT-4o Mini and Gemini 2.0 Flash can predict fine-grained fashion product attributes in a zero-shot setting, reshaping e-commerce catalogs
Fashion e-commerce thrives on detailed product attribution, but manual labeling is unsustainable at scale. This article explores a zero-shot study comparing GPT-4o Mini and Gemini 2.0 Flash across 18 fashion attribute categories using the DeepFashion-MultiModal dataset. Results show Gemini 2.0 Flash leading with an F1 score of 56.79%, while GPT-4o Mini trails at 43.28%. Through error analysis, case studies, and industry insights, we highlight how LLMs are reshaping cataloging, personalization, and future retail AI strategies.
In the digital-first retail world, product attribution—the labeling of characteristics like fabric type, neckline, sleeve length, or style—is what powers product discovery. Without accurate attributes, customers searching for “silk red maxi dress” may face irrelevant or empty results, leading to frustration and abandoned carts.
For e-commerce giants like Amazon, ASOS, or Zalando, accurate attribution means the difference between frictionless browsing and catalog chaos. But with millions of new SKUs every season, manual labeling is impossible. Traditional computer vision models helped, but they struggled with fine-grained differences like “midi vs. maxi dress” or “crew neck vs. round neck.”
This is where multimodal large language models (LLMs) like GPT-4o Mini and Gemini 2.0 Flash step in.
LLMs were first praised for text-based tasks, but recent generations integrate vision + language understanding, enabling models to “see” an image and describe it in natural language. For fashion, this translates into:
The challenge? Can they achieve this in a zero-shot setting—without fine-tuning—when presented with raw fashion images?
Zero-shot learning (ZSL) is the holy grail for scaling fashion AI. Instead of retraining models for every new collection or brand, ZSL allows models to generalize from prior knowledge and classify unseen categories.
Imagine:
This is the promise tested in the comparison of GPT-4o Mini and Gemini 2.0 Flash.
To evaluate, researchers used the DeepFashion-MultiModal dataset, one of the most robust fashion AI datasets.
Key Features:
This constraint mirrored real-world retail pipelines, where images often arrive before structured metadata.
ModelMacro F1 ScoreStrengthsWeaknessesGPT-4o Mini43.28%Fast, cost-effectiveConfuses similar categories (e.g., crew vs. round neck)Gemini 2.0 Flash56.79%Strong attribute recognition, robust to patterns & materialsHigher cost, slower at scale
Takeaway: Gemini 2.0 Flash significantly outperformed GPT-4o Mini, especially in pattern recognition (floral, striped, polka dots), fabric classification (denim, silk, cotton), and shape-related features (V-neck vs. square neck).
Zara’s fast fashion model requires weekly catalog updates. Automating attributes with Gemini could cut turnaround time by 50%, keeping pace with trends.
Amazon lists millions of items daily. A two-tiered pipeline could use GPT-4o Mini for bulk processing and Gemini for refinement in high-value categories.
Fine-grained product attribution is the invisible engine of fashion e-commerce. While GPT-4o Mini offers speed and efficiency, Gemini 2.0 Flash currently outperforms in accuracy and robustness.
For businesses, the optimal strategy is hybrid pipelines—low-cost bulk tagging with GPT-4o Mini, refined by Gemini for critical categories.
This study is not the finish line but the starting point: with domain fine-tuning, multimodal fusion, and expanded datasets, AI can truly revolutionize the fashion discovery experience.
Digital Kulture Team is a passionate group of digital marketing and web strategy experts dedicated to helping businesses thrive online. With a focus on website development, SEO, social media, and content marketing, the team creates actionable insights and solutions that drive growth and engagement.
A dynamic agency dedicated to bringing your ideas to life. Where creativity meets purpose.
Assembly grounds, Makati City Philippines 1203
+1 646 480 6268
+63 9669 356585
Built by
Sid & Teams
© 2008-2025 Digital Kulture. All Rights Reserved.