In supervised learning, we use R-squared, ROC, Precision-call, or F-sore to evaluate performance during model training. How is a Large Language Model evaluated? Large Language Models are Transformer-based models built on complex neural networks and fundamentally follow supervised learning framework. They still apply the typical train-test-validation data split. …