Member-only story

Fine-tuning a GPT — Prefix-tuning

Chris Kuo/Dr. Dataman
18 min readJun 10, 2023

--

In this and the next posts, I will walk you through the fine-tuning process for a Large Language Model (LLM) or a Generative Pre-trained Transformer (GPT). There are two prominent fine-tuning methods. One is Prefix-tuning and the other is LoRA (Low-Rank Adaptation of Large Language Models). This post explains Prefix-tuning and the next post “Fine-tuning a GPT — LoRA” for LoRA. In both posts, I will cover a code example and walk you through the code line by line. In the LoRA article, I will especially cover the GPU-consuming nature of fine-tuning a Large Language Model (LLM).

After completing this article, you will be able to explain

  • Why fine-tuning a GPT is needed
  • The challenges in fine-tuning a GPT
  • The idea of Prefix-tuning
  • The architecture of Prefix-tuning
  • The code for Prefix-tuning

Before jumping to fine-tuning a GPT, I want to even clear up some doubts about why fine-tuning is needed. Let’s start!

Why do we still need to fine-tune a GPT?

Since GPTs are already trained with various datasets for question answering, text summarization, translation, or classification, why do we still need to fine-tune a GPT? Here is the answer. Consider GPTs as powerful “Transformer” robots (in the Transformers movies) equipped with all sorts of weaponry. The robot needs to be specialized to do certain tasks with domain data. Building a full-functioning real transformer in the Transformer movie (if they ever exist!) is incredibly expensive — likewise building a GPT. Customizing a GPT, or called fine-tuning, will be far less costly.

Are there any challenges in fine-tuning a GPT?

In a very basic form, customizing a GPT means updating all its parameters iteratively to new values so it can do the specialized work. However, most of the LLMs have billions of parameters so the task to update all the parameters is still prohibitively expensive. For example, Google’s flan-t5-XXL has 11 billion parameters and the physical file size is more than 100 GB.

Since fine-tuning a GPT is challenging, how can we develop efficient fine-tuning methods? The primary idea of fine-tuning is NOT to touch the billions of pre-trained…

--

--

Chris Kuo/Dr. Dataman
Chris Kuo/Dr. Dataman

Responses (1)

Write a response