Fine-tuning a GPT — LoRA

18 min readJun 19, 2023

This post explains the proven fine-tuning method LoRA, the abbreviation for “Low-Rank Adaptation of Large Language Models”. In this post, I will walk you through the LoRA technique, its architecture, and its advantages. I will present related background knowledge, such as the concepts of “low-rank” and “adaptation” to help your understanding. Similar to “Fining-tune a GPT — Prefix-tuning”, I cover a code example and will walk you through the code line by line. I will especially cover the GPU-consuming nature of fine-tuning a Large Language Model (LLM). Then I talk more about practical treatments which have been packaged as Python libraries “bitsandbytes” and “accelerate”. After completing this article, you will be able to explain:

When do we still need fine-tuning?
The challenges in adding more layers — Inference Latency
What is the “rank” in the “low-rank” of LoRA?
The architecture of LoRA
The advantages of LoRA
Fine-tuning is still GPU-intensive
The techniques to reduce the use of GPUs
The code example

Why do we still need fine-tuning?

Pretrained Large Language Models (LLMs) are already trained with different types of data for various tasks such as text summarization…

Fine-tuning a GPT — LoRA

Written by Chris Kuo/Dr. Dataman