- Data Collection:
- Processed a large text corpus to extract:
- Unigrams (single words)
- Bigrams (two-word combinations)
- Trigrams (three-word combinations)
- Processed a large text corpus to extract:
- Model:
- Probabilities calculated for n-grams using Add-1 smoothing.
- Backoff model:
- Trigrams → Bigrams → Unigrams
- Prediction Flow:
- User Input → Tokenization → N-Gram Matching → Predicted Word