What is a Visual Chatbot and How to Create It in 2024?

By Joren Wouters Updated on

What is a Visual Chatbot and How to Create It in 2024?

In this visual chatbot tutorial, I will explain everything you need to know about image recognition chatbots in 2024.

We will cover:

  • What Visual Chatbots are (+ examples)
  • Use cases for Visual Chatbots
  • And how to create one yourself (using GPT-4 Vision)

Let’s get started!

What is a Visual Chatbot?

A visual chatbot is an automated conversation partner that understands images and videos.

With many chatbots, it is possible to send images and videos, but almost all of them cannot understand the image or video itself.

This is where visual chatbots come in, that can understand the content of the image/video and automatically reply to it.

Example of a Visual Chatbot

An example of a visual chatbot is the Visual Dialog chatbot created by scientists of Virginia Tech.

With the Visual Dialog chatbot, you can upload any image and the chatbot can reply to questions you ask about the image:


You can try the Visual Dialog chatbot yourself, using this link.

Although there are many examples of image recognition technology in apps, there aren’t many real-life examples of visual chatbots.

An example of an app using image recognition technology is Vivino. With this app, you can scan the label of a wine bottle and the app will automatically give you the latest information about that wine:

Source: Apple

A visual chatbot is an automated conversation partner that understands images and videos. Visual chatbot meaning

Use case examples for visual chatbots

So, now we’ve covered what visual chatbots are and provided an example, how can we use them in our business?

Let’s consider some use cases for visual chatbots:

Car insurance

Let’s say you just have been part of an accident with another car, leading your car to look a bit like this:

Photo by Michael Jin on Unsplash

Normally, you would need to call up the insurance company, where a human creates your claim manually and probably will come to your house to have a look at the car. But this could also be handled with a visual chatbot…

Instead of all this manual handling, you could just take some photos with your mobile phone and upload them to the visual chatbot.

Then, the visual chatbot makes an estimate of the costs of repair, and you can decide whether you want to pursue the claim or handle the repair yourself:


And all of that done just in a matter of minutes.


Now, let’s look at a more entertaining example.

Ever seen people around a museum wearing headsets and listening to an audio course of the museum?

Yeah, me too. But these audio courses are often delivered in a specific order you need to follow on a separate device (you got from the museum).

Wouldn’t it be much more entertaining with a visual chatbot?

Instead of following a specific order, you can just go to any piece of art and create a picture of it with your phone. Like this:

Photo by Ståle Grut on Unsplash

Then, the chatbot will automatically tell you what the piece of art is about and provide more information about it:


Way better, right?

How To Create a Visual Chatbot with GPT-4 Vision

Now you know what a visual chatbot is, let’s get started with creating one.

And we are going to create this by using GPT-4 Vision.

I’ll even share a template with you at the end of the article that you can use so you don’t have to build anything on your own.

What is GPT-4 Vision?

GPT-4 Vision is one of the new models of OpenAI that understands images.

You can basically give it a prompt and an image, and GPT-4 Vision can automatically answer questions about this image.

And this is great, because it can actually save businesses a lot of time and money.

Because employees don’t have to look at all the images anymore, AI can do the work for you.

With that being said, here’s what we are going to build today.

What We Will Build 

Today, we’re going to make a WhatsApp chatbot for a car insurance company.

What normally happens with a car insurance company is that a car gets broken, the customer needs to file a claim saying that the car has damages, and then a car insurance agent needs to look at those damages and needs to determine what they are going to do with it.

We can actually make this process way better by using a Visual chatbot chatbot on WhatsApp.

So instead of filling in a boring form, customers of the car insurance company can send an image of the broken car directly in WhatsApp.

Then, GPT-4 Vision analyzes the image and determines if the car is damaged. And if the car is damaged, it can list all the damages.

And this has three key benefits:

  1. First, talking with a WhatsApp chatbot is way less boring than filling in a boring form
  2. Second, the chatbot can list all the damages and return it back to the customer. Then, we can ask if the chatbot missed anything. If it did, the customer can add additional information to their claim. So, this improves the process
  3. Third, it can save a lot of time for the insurance company, because the AI chatbot has already listed all the damages and verified these with the customer.

What We Need to Create the Visual Chatbot

To create our Visual Chatbot, we need three things:

  1. Access to the OpenAI to use the GPT-4 Vision model
  2. Manychat to send and receive all messages on WhatsApp
  3. Make.com to connect Manychat to OpenAI

What we need to make a Visual Chatbot with GPT-4 Vision

Step 1: Create Your Chatbot

The first thing you should do is create your own WhatsApp chatbot. And I’ve already created a tutorial on this by using Manychat:

And the setup we will create today can also be used on any other channel. Such if you prefer to have a Facebook Chatbot or an Instagram chatbot, you can checkout these posts:

So that’s the first step that you need to do you need to create your own WhatsApp chatbot with Manychat. Once you have done that, we can get started with creating our WhatsApp automation.

Step 2: Create The WhatsApp Automation

So, for this automation, we actually need to store some data in Manychat Fields. To do that just go to Settings and then click Fields.

ManyChat settings for Fields Option
We will create three user fields for the car’s images, the analysis, and any extra information. We can name these fields:

  1. File Claim > Car Image – used for the image of the car.
  2. File Claim > Car Image Analysis – used for the car image analysis by GPT-4 Vision
  3. File Claim > Extra info – used to store any extra information provided by the customer

To create each of these fields, click on New User Field:

New User Fields For Manychat

And enter the name of the fields and click Create:

New User Fields For ManyChat

When you have created all three fields, you can just go to Automation and click on New Automation:

Add a New Automation in Manychat

In Manychat, each automation starts with a trigger. In this case, we want to start the automation when somebody says something related to filing a claim. To create the automation, we need to click on Add Trigger:

ManyChat Add Trigger button

Now, click on WhatsApp and choose the User sends a message:

User Sends Message on ManyChat

Then click on the Detect specific word in a message:

Detect Specific Word in a Message of Manychat

Now, we can choose keywords that will trigger the automation. To do that, click + Keyword and then enter “File” then click the + Message Condition option and enter “Claim”, and click Create:

Whatsapp Keywords Preview in Manychat

Now, we need to create our first message. To do that, click on WhatsApp:

Choose first step WhatsApp option for ManyChat automation

For our first message, we’ll just say something like:

Then we need to collect some information from the user. To do that, we can add a User Input, and then we’ll set the Reply Type to Email, and then it will automatically be saved to the Email System Field in Manychat:

Now, when someone gives their email, we can also send them a second message with a User Input that says:

“Could you also send an image of the damaged car?”

For this User Input, we need to set the Reply Type to Image and enter the Custom Field “File Claim > Car Image” we created earlier.

Once we have collected the information, we need to send this data to OpenAI. And in order to do that, we will use Make.

Make is an integration platform you can use to connect applications to each other without using any code.

And in this case, we will use it to connect Manychat to OpenAI.

If you haven’t used Make before, you can just click the button below to create an account and get 30 days of the Pro plan for Free:

Create Free Make Account

Once you’ve created your Make account, click Scenarios and then click Create a New Scenario:

A scenario in Make is a connection between two applications. In the scenario builder, the first thing you need to do is choose Manychat and then choose Watch Incoming Data as the trigger:

Now, you need to click on Create a webhook. I already have my Manychat account connected to Make, but if you don’t, you can just click the Add button and fill in your Manychat account name. Then you need to add the Access Token which is the API key you can get from your Manychat account’s settings:

Once the two accounts are connected, we can go back to our Manychat automation and add an Action:

Now, click the +Action button:

After that, we need to click Make and the Trigger Make:

Once you have done that, you need to choose the File Claim with Open AI option (or the name you gave your webhook):

So now our WhatsApp automation should look like this and you can click on Set Live to set it live:

Step 3: Creat the integration with GPT-4 Vision in Make

The next step is to create the integration between Manychat and OpenAI via Make.

So whenever someone sends an image to your Manychat chatbot, it will be sent to OpenAI to analyze.

To create this integration, you need to have an OpenAI account with some money in it. Once you’ve created your account, you can go to Make and connect it using your OpenAI API key.

In my video, I explain the integration more in detail, but here’s what it looks like:

It works in a few simple steps:

  1. The automation starts when we send an image from Manychat
  2. Then, OpenAI looks at the image and lists all the damages
  3. This information will then be sent back to Manychat, where we can update the custom field containing the data from Open AI
  4. And then we start a Manychat automation to send back the damages to the customer

And the prompt that we are using for OpenAI will be part of my free template as well! (that you can get at the end of this tutorial)

Step 4:  Build The Response Automation In Manychat

Lastly, we need to create the automation that we send back to the customer. For this automation, there are three things that can happen:

  1. The car can be damaged
  2. It can not be damaged
  3. Or the GPT-4 Vision cannot determine whether it’s damaged or not

To build this automation, the first thing we’ll need to do is add a condition:

Now, we can filter the car image analysis. To do that click the +Condition button and choose the Car Image Analysis custom field:

Now, click contains and add a filter based on one the possible outcomes mentioned above. So based on this one, we’ll add the filter of the car not being damaged.

Now, the OpenAI model will think that the car is not damaged. So, here, we can add a WhatsApp message that provides information on the analysis and asks the user whether they want to proceed or not:

Then, we will create another message if the user wishes to proceed:

And another one if the user does not want to proceed:

We also have the possibility of the model not being able to determine if the car is damaged. To add that in our automation, we’re going add another condition which will send the customer directly to an agent:

The third option we have is that it can actually do an analysis. In that case, we’re going to send another WhatsApp message that says:

After this, we’ll send them another message containing the Car Image Analysis custom field. To do that, we need to add a new message, click on the bracket, and choose the car image analysis custom user field:

This will provide the complete analysis that OpenAI provided. Now, we can ask the user if the analysis contains all the information by sending another message that says:

If they say it does, we can send them a message to thank them:

However, if they say it doesn’t, we can send them another message using the User Input and save it to the custom field Extra info:

Once they have provided that information, we can send them the same thank you message:

Now, we need to forward this conversation to a human agent. To do that, we need to add an action to our flow:

So you can just click on + Action, select Live Chat, and then click on Mark conversation as Open:

Add the action Mark Conversation as Open in Manychat

After that, you can just add another action, select Live Chat and click on Assign Conversation:

Add the action Assign Conversation in Manychat
And then select the team member you want to assign the conversation to:

Assign the conversation to a specific team member or group

And lastly, we will notify the person that is assigned to this conversation. Just add another action, choose Live Chat, and click on Notify Assignee:

Add the Notify Assignee action in Manychat

After that, you can create a notification text that provides all the details to the agent.

And that’s it! Now, if you click on Set Live and your visual chatbot for automatically handling car insurance claims is working.

Get my Free Visual Chatbot Template

To get started quickly with creating your visual chatbot using Manychat and OpenAI, you can just download my free template by filling in the form below 👇

When you download my FREE template, you will get the following:

  • Free Manychat Template
  • Make Integration Template
  • OpenAI Prompt

What do you think of visual chatbots?

Now, let me ask you a question: What do you think of visual chatbots?

Have you used a visual chatbot before?

Or do you know a company that already uses a visual chatbot?

Let me know by leaving a comment below!

Comments (2)

  1. Asit

    Very good idea. Starting with a defined/narrow set will help make the chat deliver more value. So museum artifacts are defined although in 1000s , so are wines. So product discovery can be a monetisable use case.

    1. Joren Wouters

      Definitely agree, product discovery is a very good use case for visual chatbots.

Leave a Reply