Fine-tuning a GPT — LoRA

Chris Kuo/Dr. Dataman
18 min readJun 19

This post explains the proven fine-tuning method LoRA, the abbreviation for “Low-Rank Adaptation of Large Language Models”. In this post, I will walk you through the LoRA technique, its architecture, and its advantages. I will present related background knowledge, such as the concepts of “low-rank” and “adaptation” to help your understanding. Similar to “Fining-tune a GPT — Prefix-tuning”, I cover a code example and will walk you through the code line by line. I will especially cover the GPU-consuming nature of fine-tuning a Large Language Model (LLM). Then I talk more about practical treatments which have been packaged as Python libraries “bitsandbytes” and “accelerate”. After completing this article, you will be able to explain:

  • When do we still need fine-tuning?
  • The challenges in adding more layers — Inference Latency
  • What is the “rank” in the “low-rank” of LoRA?
  • The architecture of LoRA
  • The advantages of LoRA
  • Fine-tuning is still GPU-intensive
  • The techniques to reduce the use of GPUs
  • The code example

Why do we still need fine-tuning?

Pretrained Large Language Models (LLMs) are already trained with different types of data for various tasks such as text summarization, text generation, question and answering, etc. Why do we still need to fine-tune a LLM? The simple reason is to train the LLM to adapt to your domain data to do specific tasks. Just like a well-trained chef can do Italian cuisine or Chinese cuisine, the chef just needs to transfer his basic cooking knowledge and sharpen his skills to do other kinds of cuisines. The term “transfer learning” and “fine-tuning” are sometimes used interchangeably in NLP (for example, “Parameter-Efficient Transfer Learning for NLP” by Neil Houlsby et al. (2019)). With the arrival of LLMs, the term “fine-tuning” is used even more. (If you like to understand more about image learning, you can take a look at “Transfer Learning for Image Classification — (1) All Start Here” or the book “Transfer learning for image classification — with Python examples”.)

The challenges in adding more layers — Inference Latency

Chris Kuo/Dr. Dataman

The Dataman articles are my reflections on data science and teaching notes at Columbia University