The LSTM schema

The LSTM gates

1. Forget gate

\(f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)\)

2. Input gate

\(i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)\)

3. Output gate

\(o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)\)

Getting the formula right!

1. Getting the output state right. Start from the destination!

\[h_t = o_t * tanh(C_t)\]

2. So, now get the new cell state

Use the previous cell state and the two gates - forget and the input gates.

\[C_t = f_t * C_{t-1} + i_t * \tilde C_t\]

$\tilde C_t$ is the new candidate cell state, which will be tempered down by the input gate.

3. Getting the candidate cell state

\[\tilde C_t = tanh(W_c \cdot [h_{t-1}, x_t] + b_c)\]

4. Putting all together!

Translator implementation in LSTM