Language Models
Language Models - what is the need of language model
- Language models are the mathematical models that can do one or more of the following -
- text output - generative modelling.
- It predicts the next word that will come after the words in a given sentence. It can also predict the next sentence.
- each word is assigned a probability.
- each sentence is a conditional probability of the next sentence based on all words.
- Obective - compute the probability of a sentence or sequence of words. $P(W) = P(w1,w2,w3,…,wn)$
- Related task - computing the probability of the upcoming word. $P(w4/w1,w2,w3)$
- embeddings - numeric values representation of text
- classification - identify targets - key entity identification.
- text output - generative modelling.
-
A recent history of Language Models:
- Language Models are trained using self regression training. This is also known as Autoregressive training. The training set is not lablelled. But the labelling is autoregressive as the training set in corpus itself is used to determine the probability of the new text generation. Language Modelling is at the core of such pre-training which is self recursive.
How to compute probabilities : https://samratkar.github.io/2025/02/06/probabilities.html
Applications of language models
- spell check. which word has the highest probability.
- speech recognition - audio to transcription.
What are the different types of language models and how they work
- Encoder - Language models that convert text to numbers. Eg : BERT, RoBERTA, DistilBERT.
- Decoders - Language models that are generative in nature. They generate texts - GPT, GPT-2, GPT-3, ChatGPT.
- Encoders and Decoders - The language models that can do both - encoding and decoding. Eg - T5, Switch, Flan-T5.
- All the above models are models based on transformer architecture : vscode preview.
- Following models are legacy non-transformer based language models -
- Bag of words - an algorithm that represents words as large sparse vectors or arrays of numbers. it simply records the presence of the words.
- Word2Vec - its numeric representation constitutes the meaning of the words and the contexts of few neighboring words.
- Transformer based language model - numerically represent dense vectors of numbers which captured the meaning of words in the context of a sentence or a paragraph.
Bag of Words : https://samratkar.github.io/2025/02/07/bagofwords.html
Word2Vec : https://samratkar.github.io/2025/02/07/word2vec.html