What is a Visual Chatbot?

By Joren Wouters Updated on

What is a Visual Chatbot?

A visual chatbot is an automated conversation partner that understands images and videos.

With many chatbots, it is possible to send images and videos, but almost all of them cannot understand the image or video itself.

This is where visual chatbots come in, that can understand the content of the image/video and automatically reply to it.

In this post, I will discuss examples of visual chatbots, possible use cases and how they work!

Example of a visual chatbot

An example of a visual chatbot is the Visual Dialog chatbot created by scientists of Virginia Tech.

With the Visual Dialog chatbot, you can upload any image and the chatbot can reply to questions you ask about the image:


You can try the Visual Dialog chatbot yourself, using this link.

By the way, don’t be surprised if the answers aren’t completely accurate 😉

Although there are many examples of image recognition technology in apps, there aren’t many real-life examples of visual chatbots.

An example of an app using image recognition technology is Vivino. With this app, you can scan the label of a wine bottle and the app will automatically give you the latest information about that wine:

Source: Apple

A visual chatbot is an automated conversation partner that understands images and videos.Visual chatbot meaning

Use case examples for visual chatbots

So, now we’ve covered what visual chatbots are and provided an example, how can we use them in our business?

Let’s consider some use cases for visual chatbots:

Car insurance

Let’s say you just have been part of an accident with another car, leading your car to look a bit like this:

Photo by Michael Jin on Unsplash

Normally, you would need to call up the insurance company, where a human creates your claim manually and probably will come to your house to have a look at the car. But this could also be handled with a visual chatbot…

Instead of all this manual handling, you could just take some photos with your mobile phone and upload them to the visual chatbot.

Then, the visual chatbot makes an estimate of the costs of repair, and you can decide whether you want to pursue the claim or handle the repair yourself:


And all of that done just in a matter of minutes.


Now, let’s look at a more entertaining example.

Ever seen people around a museum wearing headsets and listening to an audio course of the museum?

Yeah, me too. But these audio courses are often delivered in a specific order you need to follow on a separate device (you got from the museum).

Wouldn’t it be much more entertaining with a visual chatbot?

Instead of following a specific order, you can just go to any piece of art and create a picture of it with your phone. Like this:

Photo by Ståle Grut on Unsplash

Then, the chatbot will automatically tell you what the piece of art is about and provide more information about it:


Way better, right?

What are the benefits of a visual chatbot?

In addition to the ‘standard’ benefits of chatbots, visual chatbots have two main important benefits: cost reduction and faster turnaround of cases.

Let’s take car insurance as an example again. Usually, when handling an insurance claim, someone from the insurance company comes to have a look at your car and makes an estimate for repairs.

Now, this doesn’t have to be done anymore, because the visual chatbot can do this automatically based on photos of the car.

Because of this, a person doesn’t need to look at the car anymore (fewer costs), but this will also lead to a faster turnaround of the cases. Normally, you would book an appointment with the insurance company, but now that doesn’t have to be done anymore.

How can a visual chatbot understand images and video?

For understanding images and videos, a chatbot needs to use an algorithm.

And I know what your thinking: Algorithms…. Yikes!

Well, algorithms are actually quite easy, let me explain it.

Let’s imagine that you need to look at 5,000 pictures of wolves and 5,000 pictures of dogs.

After seeing those 10,000 pictures, you probably would have a good idea of what a wolve and a dog looks like, right?

That’s basically what an algorithm does. It just looks at a lot of different pictures and tries to understand what it sees.

By the way, this is what called the “training” of an algorithm.

The only thing the visual chatbot must do is just ask the algorithm “Hey algorithm, does this picture look like a wolve or a dog?”

Now, you might ask, how can I create such an algorithm?

This is usually performed by data scientists, which are people that are trained to handle large amounts of data and use artificial intelligence to train an algorithm on that data.

What do you think of visual chatbots?

Now, let me ask you a question: What do you think of visual chatbots?

Have you used a visual chatbot before?

Or do you know a company that already uses a visual chatbot?

Let me know by leaving a comment below!

Comments (2)

  1. Asit

    Very good idea. Starting with a defined/narrow set will help make the chat deliver more value. So museum artifacts are defined although in 1000s , so are wines. So product discovery can be a monetisable use case.

    1. Joren Wouters

      Definitely agree, product discovery is a very good use case for visual chatbots.

Leave a Reply