Next-Word Prediction App

2025-02-26

How Does the Model Work?

Data Collection:
- Processed a large text corpus to extract:
  - Unigrams (single words)
  - Bigrams (two-word combinations)
  - Trigrams (three-word combinations)
Model:
- Probabilities calculated for n-grams using Add-1 smoothing.
- Backoff model:
  - Trigrams → Bigrams → Unigrams
Prediction Flow:
- User Input → Tokenization → N-Gram Matching → Predicted Word

Quantitative Performance

Model Accuracy by N-Gram

Trigrams perform best, with 70% accuracy in predictions.
Bigrams and unigrams serve as fallback mechanisms.

How the app works

User experience:
- Input a sentence → Predict the next word in real-time.
- Simple and intuitive interface.
Interactive features:
- Adjust diversity with sliders.
- See fallback behavior from trigrams → bigrams → unigrams.

Why Choose This App?

Accurate Predictions: Context-aware using trigrams and bigrams.
Diverse Applications: From messaging apps to search engines.
Customizable: Adjust prediction behavior for specific use cases.
Scalable: Adapts to new data for continuous improvement.

Ready to revolutionize text input?

Conclusions

Thank you for following this presentation. For more details and to explore the Shiny app, please visit the following link:

https://pietrogazzi.shinyapps.io/DScapstone/