Fine-tuning a GPT — LoRA

Chris Kuo/Dr. Dataman
18 min readJun 19, 2023

This post explains the proven fine-tuning method LoRA, the abbreviation for “Low-Rank Adaptation of Large Language Models”. In this post, I will walk you through the LoRA technique, its architecture, and its advantages. I will present related background knowledge, such as the concepts of “low-rank” and “adaptation” to help your understanding. Similar to “Fining-tune a GPT — Prefix-tuning”, I cover a code example and will walk you through the code line by line. I will especially cover the GPU-consuming nature of fine-tuning a Large Language Model (LLM). Then I talk more about practical treatments which have been packaged as Python libraries “bitsandbytes” and “accelerate”. After completing this article, you will be able to explain:

  • When do we still need fine-tuning?
  • The challenges in adding more layers — Inference Latency
  • What is the “rank” in the “low-rank” of LoRA?
  • The architecture of LoRA
  • The advantages of LoRA
  • Fine-tuning is still GPU-intensive
  • The techniques to reduce the use of GPUs
  • The code example

Why do we still need fine-tuning?

Pretrained Large Language Models (LLMs) are already trained with different types of data for various tasks such as text summarization…