•

🍽️ From Menu Text to Mouthwatering Food Pics: Building an AI-Powered Food Visualization Service in One Weekend

Imagine you’re browsing a restaurant menu in a new city. The names sound intriguing but vague: “Lobster Thermidor Crumpet” or “Piri Piri Chicken Skewers.” You don’t know what these actually look like

September 19, 2025

Introduction: Why Visualizing Food Matters

Imagine you’re browsing a restaurant menu in a new city. The names sound intriguing but vague: “Lobster Thermidor Crumpet” or “Piri Piri Chicken Skewers.” You don’t know what these actually look like.

Now imagine opening the same menu—but every dish is paired with a beautifully styled food photo that makes your mouth water instantly. Suddenly, decision-making is effortless.

Food visualization is a game changer:

  • Restaurants and ghost kitchens can add polished visuals without expensive photoshoots.
  • Travelers navigating foreign-language menus can see dishes instead of guessing.
  • Food tech startups can prototype quickly with visual assets instead of waiting weeks for professional photography.

With today’s AI stack—OCR, LLMs, and generative image models—we can turn this vision into reality. And the best part? I built a working prototype in a single weekend.

This blog will show you how.

🥘 The Idea

Input:
Grilled salmon with mango salsa and jasmine rice

Output:
A photorealistic dish image, styled like a professional menu photo.

Use Cases

  1. Restaurants & Ghost Kitchens
    No need to hire photographers every time the menu changes.
  2. Travelers & Foodies Abroad
    Take a picture of a local menu → get dish translations and visuals.
  3. Food Tech Startups
    Rapidly generate food images for apps, websites, or product pitches.

🔧 The Stack

This project uses a modern AI + web development stack, with simple, modular components.

Frontend:

  • Lovable.dev – landing page builder for quick marketing copy
  • React (TypeScript) – frontend framework
  • Vite – blazing-fast build tool
  • Tesseract.js – OCR engine in the browser
  • Axios – HTTP client for API calls
  • Tailwind CSS – utility-first styling

Backend:

  • Flask (Python) – lightweight backend
  • OpenAI API – GPT model for menu parsing
  • Replicate API – Stable Diffusion for image generation
  • Flask-CORS – for cross-origin requests

⚡ Landing Page in Minutes

Using Lovable.dev, I spun up a simple landing page in under 15 minutes.

CTA example:

“Visualize your dishes before they hit the table. Bring your menus to life with AI.”

Why it matters:

  • Establishes credibility quickly
  • Provides a shareable demo link
  • Helps validate the idea with early testers

🧠 Core Logic: The Three-Step Pipeline

The entire system can be broken into three simple steps:

  1. Extract menu text (OCR)
  2. Structure menu data (LLM)
  3. Generate food images (Gen AI)

Let’s dive deeper.

1. 📸 Text Extraction with Tesseract.js

When a user uploads a menu photo, we first run OCR to detect dish names and descriptions.

const { createWorker } = require("tesseract.js");

const extractText = async (imagePath) => {
 const worker = await createWorker("eng");
 const {
   data: { text },
 } = await worker.recognize(imagePath);
 await worker.terminate();
 return text;
};

Output:

BAR SNACKS
Lobster thermidor crumpet, pink grapefruit salad 7.7
Buttermilk fried chicken, Korean BBQ sauce (to share) (v) 5.3
Teriyaki chicken skewers

2. 🧠 Structuring Text with OpenAI

The raw OCR text is messy. GPT helps turn it into clean, structured JSON.

const response = await openai.chat.completions.create({
 model: "gpt-4",
 messages: [
   {
     role: "user",
     content: `You are a helpful assistant that extracts dish names
               and short descriptions from menus. Output JSON only.`,
   },
 ],
});

const structuredMenu = JSON.parse(response.choices[0].message.content);

Example Output:

[
 {
   "name": "Lobster Thermidor Crumpet",
   "description": "Delicious lobster on a crumpet with a refreshing pink grapefruit salad"
 },
 {
   "name": "Buttermilk Fried Chicken",
   "description": "Crispy fried chicken with flavorful Korean BBQ sauce, perfect for sharing"
 }
]

3. 🖼️ Generating Images with Replicate

Finally, we send each dish description to Stable Diffusion XL (via Replicate).

const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });

const output = await replicate.run("stability-ai/sdxl:latest", {
 input: {
   prompt: `A photorealistic image of {dish name}. {description}.
            Restaurant menu style, professional food photography,
            shallow depth of field, vibrant colors, 4k.`,
 },
});

const imageUrl = output[0];

Boom! Now we have a realistic image ready for display.

🔐 Secure Environment & Docker Setup

To keep things portable and secure, I:

  • Containerized the app with Docker
  • Used .env files for API keys (excluded from GitHub)
  • Ensured backend services run consistently on any environment

This setup lets you deploy to Render, Railway, Fly.io, or even cloud servers.

🚀 Deployment

Frontend

Options:

  • Lovable.dev → easiest for no-code
  • Vercel → best for React/Next.js
  • Netlify → flexible for static sites
  • Cloudflare Pages → free, fast

Backend

Options:

  • Render – fast Node/Python hosting
  • Railway – easy Docker + env management
  • Fly.io – global edge deployment
  • Serverless – Supabase/Firebase functions

Architecture flow diagram idea (for inline image):

[Frontend (Vercel)] → [Backend (Render)] → [OpenAI + Replicate APIs]

📊 Real-World Use Cases

  1. Restaurants
    • Generate full menus with visuals overnight.
    • Save thousands on photoshoots.
  2. Delivery Apps
    • Instantly visualize new dishes added by restaurants.
    • Increase user conversion with visuals.
  3. Travel Tools
    • Translate and visualize foreign menus in real-time.
  4. Food Startups
    • Prototype apps faster with AI-generated assets.

⚠️ Challenges & Lessons Learned

  • OCR isn’t perfect → menu images with bad lighting require cleanup.
  • Hallucinations → GPT sometimes invents dish names. HITL validation is needed.
  • Image consistency → Stable Diffusion varies in style (plate shapes, colors).
  • Compute cost → Generating 100s of dishes = $$$. Need batching & caching.

🌍 Future Extensions

  • Style Customization → e.g., rustic plates vs fine dining photography.
  • Localization → Translate dish descriptions for tourists.
  • Fleet Mode → Deploy across multiple restaurants.
  • Digital Twin Menus → Entire restaurants visualized virtually.

📂 Resources

Conclusion

In just one weekend, I proved that AI can bridge the gap between text menus and visual menus. By combining OCR, LLMs, and image generation, restaurants and startups can now bring menus to life instantly.

AI won’t replace the artistry of professional food photography, but for speed, scale, and accessibility, it’s a game-changer.

‍

Digital Kulture

Digital Kulture Team is a passionate group of digital marketing and web strategy experts dedicated to helping businesses thrive online. With a focus on website development, SEO, social media, and content marketing, the team creates actionable insights and solutions that drive growth and engagement.