Skip to content

Visual Ordering

In this level, you’ll enable image-based ordering.
Users upload a photo of a pizza, your agent analyzes it (base + toppings), maps it to a menu item, and creates the order—then confirms. 📸🍕

📋 Tasks

  • [ ] Add the capability to accept an image as input (Playground upload or app UI).
  • [ ] Use the image to infer the pizza and create an order (“Order a pizza like this image”).
  • [ ] Extract toppings and base from the image, match the closest menu pizza, and add extra toppings.

✅ Pass Criteria

  • The agent identifies the pizza from the image and creates the order.
  • The user receives a confirmation (text and/or voice) with order details.

🛠️ Hints & Tips

  • Deploy a multimodal model that supports image input in Microsoft Foundry.
  • Use the Agent Playground to test image uploads quickly.
  • Design the pipeline: Image → Vision analysis (base + toppings) → Menu match → Confirmation → Order.
  • If confidence is low, ask clarifying questions (“Is that pepperoni or spicy salami?”).

📚 Resources