Reproduction of LoRA: Fall 2025

Summary

In this project, I performed a reproduction of Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning (PEFT) method designed to adapt large pre-trained language models by injecting trainable low-rank matrices into frozen layers. The core hypothesis of LoRA is that the change in weights during model adaptation has a low "intrinsic rank". Instead of updating full weight matrices, this method optimizes low-rank decomposition matrices A and B, which allows the model to be adapted with significantly fewer parameters while avoiding the inference latency associated with adapter layers.

Figure 1: Diagram of LoRA showing frozen weights and trainable low-rank matrices.

Figure 1: Diagram of LoRA. The pre-trained weights W0 are frozen, while the low-rank decomposition matrices A and B are trained.

I evaluated the method using a RoBERTa-base model on five datasets from the GLUE benchmark (SST-2, MRPC, COLA, RTE, and STS-B). The implementation successfully validated the original authors' central claims. Specifically, my LoRA implementation reduced the number of trainable parameters by 99.7% (from 125 million to 0.3 million) compared to Full Fine-Tuning (FFT). Despite this massive reduction, the model achieved performance within 2% of the FFT baseline across the GLUE subset and even outperformed FFT on the CoLA and RTE tasks. Furthermore, by merging the learned matrices with the frozen weights during deployment, I confirmed that the method introduces zero additional inference latency.

Beyond standard replication, I extended the analysis to profile system efficiency on the smaller RoBERTa architecture. While I observed a 38% reduction in GPU VRAM usage, this was lower than the 66% reported for GPT-3 in the original paper, highlighting that activation memory becomes a dominant bottleneck in smaller models. Additionally, my energy analysis revealed that while LoRA improves throughput, it does not guarantee energy efficiency; for tasks requiring more epochs to converge (like CoLA), LoRA actually increased total energy consumption compared to FFT.

Report

Download report