Making chatbots more human using machine learning

One of the great challenges when developing conversational interfaces is to make chatbots more human. So, we asked ourselves the question: would it be possible to create a bot that is capable of imitating a person in real life?

Given enough data, chatbots are able to learn and understand human languages. This is done by techniques like NLP (Natural Language Processing) and Machine Learning. A conventional chatbot tries to understand the intent of a user and provides a fixed response based on their intent. The chatbot concept that we have created is slightly different. Instead of providing fixed responses to recognized intents, our social chatbot responds similarly to how a real person would.

For our Proof of Concept, I’ve tried to imitate Vasilis van Gemert from his excellent and worth listening to podcast: The Good, The Bad, and the Interesting.

The transcriptions of these podcasts were pre-processed in such a way that it suits the machine learning algorithm. Next I used Tensorflow to build the model.


TensorFlow is an open-source machine learning library developed by Google Brain Team and released in November 2015. The idea of TensorFlow is to express numeric operations as a graph. The nodes in the graph represent mathematical operations and the edges represent the data. The machine learning model we used for our chatbot is a deep neural network. Deep neural networks are commonly taught and visualized as graphs, which makes their implementation in TensorFlow more natural for machine learning practitioners.

TensorflowGraph Figure 1: Visualization of a TensorFlow graph

The model

The deep neural network we used for our concept is known as a sequence to sequence (seq2seq) model. A seq2seq model is able to learn vocabulary, sentence structure and more all from input and output data. It contains two main components: an encoder and a decoder. The encoder processes the input and the decoder generates an output. The goal is then to tune the model in such a way that the output of the model is similar to how Vasilis responds. In order to do this, we trained the model with thousands of lines of conversational data from all podcasts so far.

Tensorflow Seq2Seq Figure 2: seq2seq encoder decoder model architecture

The results

In the initial phases of training, the chatbot was simply outputting “Ja” (yes) to every input we gave. Although a funny coincidence, this makes some sense intuitively since “Ja” is a commonly used word in the Dutch language. Gradually, the chatbot was starting to form more complete sentences and recognized basic patterns of Dutch grammar.

Vasilisbot1 Figure 3: Initial phases of training. Lines starting with > denotes input, [..] the output.

Vasilisbot2 *Figure 4: Training after 140000 iterations. *

I continued to train our model until we saw no performance increase. For our final model, the responses are alright, but not as good as we hoped them to be. We noticed that the responses are often out of context. The chatbot has some understanding of the language but is not able to form coherent thoughts. It has the tendency to repeat words and sometimes struggles to finish its own sentences. It has some idea of sentence structure and interestingly enough, occasionally responds in a mixture of English and Dutch.

Improvements and experiences

We gained a lot of insights by implementing our own deep learning model in TensorFlow. We also looked at ways to improve the performance of our chatbot. Our insights are summarized as follows:

  • The task at hand is a challenging one. For each input you can have thousands of acceptable responses. It is difficult to score these responses in an objective way.
  • Current state-of-the-art performance for chatbots of these types are far from human like performance.
  • Data is key. With more data we are confident that we can produce a better chatbot experience.
  • Penalizing responses that are not fully complete or contain repetition of words may improve performance.
  • Spending more time tuning the hyperparameters such as number of iterations, choice of optimizer, learning rate, batch size, can lead to better performance.


We experimented with seq2seq models in TensorFlow to build a chatbot that imitates a real person. We obtained decent results but not as good as we had hoped for. We learned more about deep learning and how that is applied to natural language processing problems. We realized that a huge amount of data is key for machine learning applications and are aware of the limitations of this technology. Custom trained chatbots have a long way to go before they can reach human level performances, but AI has enabled us to make the first step towards this goal. Let’s keep experimenting…