New conversational model
Real world is a stubborn place. It is complex in ways that resist abstraction and modeling. It notices and reacts to our attempts to affect it. Nor can we hope to examine it objectively by pre-defined rules or by programmed state machines.
The fundamental aspect can’t be ignored is the probabilistic nature of the model that will serve those dialogs.
The complexity of the system increases with the number of variables, leading the conversation to a certain function instead of another. Hence, the obligation to move away from rule-based systems and embrace uncertainty, probability and statistic.
Once the limitations of those approaches are unveiled, we are ready to attain context and sequentiality in a completely different way. We should let machine learning do what it can do best: calculate predictions over large amount of input.
The system should being able to replicate human behavior by learning from real conversation segments.
The model should inquiry the system for getting more informations and take more plausible decisions on what to present to the user. With those premises, machine learning is entering into the game, and it comes with the form of neural networks.
Neural networks are adaptable systems whose ability to learn comes from varying the strength of connections between its artificial neurons. They are basically universal function approximators. I described neural networks for intention classification by using convolutions for grasping the semantics behind user’s sentences.
For helping the classifier could be provided, other than the raw text, a set of discrete features bring informations that share some predictive relation with the action to choose. Those informations could be somehow related to text itself, but also they could be contextual to the user profile or to the ongoing marketing campaigns promoted by the merchant.
For example, a particular search request should be directed into a special promotion? The time of the day can affect the query selection because in the night a camomile sells better than black tea? Just feed the classifier with those features, and let ML to do the hard work.
The new conversational model should not just understand the single intention of a phrase, but elaborate it in the overall context the user has engaged with the Chatbot.
Time series models
The concept of time series is dependent on the idea that past behavior can be used to predict future behavior. In sequence-based models, the output is not just determined by the last input, like in regression predictive models, but also by its proceedings. This peculiar predictor should have some interesting characteristics. Latests input affect more the final output than the ones far away in time, and those models should be able to override, remove, remember qualifiers along the sequence. Long short-term memory (LSTM) units come to our rescue. LSTMs can remember values over arbitrary event series and they are a more sophisticated extension of the recurrent neural networks.
Multi-task learning (MTL)
Consider a hypothetical recommender that has learned to predict your preferences about cars. A mono-task system could be trained to give a single output that might match the car model with your profile. A multi-task model could be trained to not only return you the model’s name, but also the color, the engine type, the accessories. MTL aims to solve simultaneously multiple classification tasks within the same neural network. We can view MTL as a form of transfer learning where commonalities and differences across tasks are exploited to improve the overall learning efficiency. It seems that neural networks love diversity: more tasks they learn more accurate are their predictions, compared to training them separately. The conversational engine I created uses multi-task learning (MTL). It not merely returns a single label, but instead it returns a fine-grained set of parameters that add expressiveness into the behavior of the Chatbot.
Neural network architecture
My proposition is about to use both (CNN, LSTM) in a fully-connected neural network, in order to leverage classification qualities of CNN with the sequentiality of LSTM. That means, a particular meaning of an user’s utterance is not considered alone like the current state of the art of Chatbot classifiers, but it is evaluated in the context of the conversation. Fully-connected neural network means that the different layers of the network (CNN, LSTM) are affected by the sames feed-forward and back-propagation iterations.