“Ok Google, find me a flight to London”

While working in Barcelona for a travel company I set myself to explore how feasible was to make that work.

  • All good things come in threes

  1. Speech-to-text algorithm to interpret the voice
  2. Parser: to understand what the user is looking for
  3. Service provider that can do something with the query

Speech-to-text

Fortunately for us google offers a free local speech-to-text algorithm that works in chrome. We will use that for the PoC


Parser

We need to go from a sentence like: find a flight to Barcelona for next week to something that a computer can understand.

const data = {
destination: 'BCN', // code for Barcelona
from: 'LDN', // user current location
date: today + 1*week
};
  • How can you make it do what we want?

If you start to think of rules to type that will make a computer understand that query you will realise the complexity of it. So we are going to use a little magic called machine learning to make it learn how to interpret the sentences on its own.

This machine learning model will get an input sentence and somehow figure out which words correspond to what. In out case we have the following labels:

  • departure_date
  • return_date
  • departure
  • destination
  • O for none

So our desired output is:

find: O
a: O
flight: O
to: O
Barcelona: destination
for: O
next: departure_date
week: departure_date
  • This is the model that we will use

  • Model overview

Layer 1: Word embedding

We are going to get our input sentence find a flight to Barcelona for next week and pass it to the word embedding layer. This will transform every word into a tensor

This is useful because the embedding gives the next layer an easily representation of the meaning of the word. The embedding we are going to use is the pre-trained glove embedding.

We also use a char embedding layer plus Bi-LSTM to increase the accuracy a few percentages. That comes from this paper:

Layer 2: Bi-lateral LSTM

Now we need to go from a list of tensors to a list of numbers that indicate witch label correspond to what word

For an introduction a LSTM (Long short-term memory layer) is a type of RNN (Recurrent Neural Network) and a RNN is a type of neural network where the output of a layer can feed into it’s input again to keep a sort of temporary memory.

Here we will process the input query sequentially word by word and transform the glove embedding in to a scalar array that will represent one of the output labels previously described.
At first the algorithm will output random output labels because it still doesn't know how to understand the sentence, this is where training comes in

  • Training

You can run this on your machine or run the code here

This is the step where we will give the algorithm examples of how it should work and the machine learning part will get to work fine tuning the internal weights until the algorithm and the training data match

Let’s clone the project and download the data

$ git clone https://github.com/alepacheco/ASF-core
$ cd ASF-core
$ git checkout stable
$ pip install -r requirements.txt # To install python packages required
$ make glove # this will download the data for the embedding layer

We can see the training data that we are going to use in the file data/atis.pkl it comes from the ATIS dataset

Training parameters are in the file model/config.py by default it uses

# training
train_embeddings = False
nepochs = 5
dropout = 0.5
batch_size = 246
lr = 0.01
lr_decay = 0.9
clip = -1 # if negative, no clipping
nepoch_no_imprv = 3
# model hyperparameters
hidden_size_char = 100 # lstm on chars
hidden_size_lstm = 100 # lstm on word embeddings
Prepare the data
$ python build_data.py
Building vocab...
- done. 354 tokens
Building vocab...
- done. 400000 tokens
Writing vocab...
- done. 352 tokens
Writing vocab...
- done. 84 tokens
Writing vocab...
- done. 45 tokens
Train
$ python train.py

After some time we should see it decreasing the training loss. This number is the difference between the outcome we want the algorithm to return and the one that is currently returning. When we start it won’t know what to return and the loss will be very high, but it will start to decay as the algorithm learns what we want it to do

Epoch 1 out of 15
28/45 [=================>............] - ETA: 15s - train loss: 19.9427
...
Epoch 15 out of 15
45/45 [==============================] - 37s - train loss: 0.5352
acc 99.42 - f1 98.49

Awesome we get a 99.43% accuracy after a few minutes of training!

Evaluate the trained model

We can now try with a custom sentence and see how the model performs

$ python evaluate.py
Testing model over test set
acc 99.42 - f1 98.49
input> Find a flight from london to Amsterdam for sunday
Find a flight from london to Amsterdam for sunday
O O O O B-fromloc O B-toloc O B-depart_date

If you got here you successfully trained and run the model, now you can wrap the logic in a express server to use it as a back end

Run the evaluation server

$ python server.py

In a new terminal try it curl

$ curl -X POST "http://localhost:5000/parse" -d "flight to new york from los angeles for next sunday"

And we get a JSON object we can use in our app

{
"type": "",
"departure": "LAX",
"destination": "NYC",
"departureDate": "2018-03-25",
"departureTime": "",
"returnDate": ""
}

Flight service provider

Instead of booking a flight automatically we are just going to send the user to some flight results for that query, we are going to use

We are going to use the previous response to fill in this url

const resultsUrl = `https://www.edreams.com/#/results/type=R;dep=${params.departure_date};from=${params.departure};to=${params.destination};ret=${params.return_date}`;

You can see the front end code here

Resources