The data that those large language models were built on

Chris Kuo/Dr. Dataman
18 min readMay 9, 2023

Why can Large Language Models (LLMs) answer questions, do book reports, draft notes, or summarize a document? An important reason is the data that they were trained on, or fine-tuned with. This post helps you to understand those widely used datasets that are known in the LLM community. While you research LLMs, you may be curious about the data. If you are thinking about fine-tuning a…

--

--

Chris Kuo/Dr. Dataman

The Dataman articles are my reflections on data science and teaching notes at Columbia University https://sps.columbia.edu/faculty/chris-kuo