Visual Ordering
In this level, you’ll enable image-based ordering.
Users upload a photo of a pizza, your agent analyzes it (base + toppings), maps it to a menu item, and creates the order—then confirms. 📸🍕
📋 Tasks
- [ ] Add the capability to accept an image as input (Playground upload or app UI).
- [ ] Use the image to infer the pizza and create an order (“Order a pizza like this image”).
- [ ] Extract toppings and base from the image, match the closest menu pizza, and add extra toppings.
✅ Pass Criteria
- The agent identifies the pizza from the image and creates the order.
- The user receives a confirmation (text and/or voice) with order details.
🛠️ Hints & Tips
- Deploy a multimodal model that supports image input in Microsoft Foundry.
- Use the Agent Playground to test image uploads quickly.
- Design the pipeline: Image → Vision analysis (base + toppings) → Menu match → Confirmation → Order.
- If confidence is low, ask clarifying questions (“Is that pepperoni or spicy salami?”).