Computer Science

Attentive Perturbation: Extending Prefix Tuning to Large Language Models Inner Representations

Publié le - The 9th International Conference on Machine Learning, Optimization, and Data Science

Auteurs : Louis Falissard, Séverine Affeldt, Mohamed Nadif

From adapters to prefix-tuning, parameter efficient fine-tuning (PEFT) has been a well investigated research field in the past few years, which has led to an entire family of alternative approaches for large language model fine-tuning. All these methods rely on the fundamental idea of introducing additional learnable parameters to the model, while freezing all pre-trained representations during training. This finetuning process is generally done through refitting all model parameters to the new, supervised objective function. This process, however, still requires a considerable amount of computing power, which might not be readily available to everyone. In addition, even with the use of transfer learning, this method requires substantial amounts of data. In this article, we propose a novel and fairly straightforward extension of the prefix-tuning approach to modify both the model's attention weight and its internal representations. Our proposal introduces a "token-tuning" method relying on soft lookup based embeddings derived using attention mechanisms. We call this efficient extension "attentive perturbation", and empirically show that it outperforms other PEFT methods on most natural language understanding tasks in the few-shot learning setting.