Member-only story

Fine-tuning a GPT — Prefix-tuning

18 min readJun 10, 2023

In this and the next posts, I will walk you through the fine-tuning process for a Large Language Model (LLM) or a Generative Pre-trained Transformer (GPT). There are two prominent fine-tuning methods. One is Prefix-tuning and the other is LoRA (Low-Rank Adaptation of Large Language Models). This post explains Prefix-tuning and the next post “Fine-tuning a GPT — LoRA” for LoRA. In both posts, I will cover a code example and walk you through the code line by line. In the LoRA article, I will especially cover the GPU-consuming nature of fine-tuning a Large Language Model (LLM).

After completing this article, you will be able to explain

Why fine-tuning a GPT is needed
The challenges in fine-tuning a GPT
The idea of Prefix-tuning
The architecture of Prefix-tuning
The code for Prefix-tuning

Before jumping to fine-tuning a GPT, I want to even clear up some doubts about why fine-tuning is needed. Let’s start!

Why do we still need to fine-tune a GPT?

Since GPTs are already trained with various datasets for question answering, text summarization, translation, or classification, why do we still need to fine-tune a GPT? Here is the answer. Consider GPTs as powerful “Transformer” robots (in the Transformers movies) equipped with all sorts of weaponry. The robot needs to be specialized to do certain tasks with domain data. Building a full-functioning real transformer in the Transformer movie (if they ever exist!) is incredibly expensive — likewise building a GPT. Customizing a GPT, or called fine-tuning, will be far less costly.

Are there any challenges in fine-tuning a GPT?

In a very basic form, customizing a GPT means updating all its parameters iteratively to new values so it can do the specialized work. However, most of the LLMs have billions of parameters so the task to update all the parameters is still prohibitively expensive. For example, Google’s flan-t5-XXL has 11 billion parameters and the physical file size is more than 100 GB.

Since fine-tuning a GPT is challenging, how can we develop efficient fine-tuning methods? The primary idea of fine-tuning is NOT to touch the billions of pre-trained…

Fine-tuning a GPT — Prefix-tuning

Written by Chris Kuo/Dr. Dataman

Responses (1)