Projected Language Models: A Large Model Pre-Segmented Into Smaller Ones

This paper has been accepted at the Foundation Models in the Wild workshop at ICML 2024.
Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to get a small language model with good specialized accuracy, even when specialization data is unknown during pretraining. We propose a novel architecture, projected networks (PN). PN is a high capacity network whose parameters can be linearly projected into a small network for fine tuning. We assess the empirical effectiveness of our solution compared to small model training, distillation and hard mixture of experts.

Projected Language Models: A Large Model Pre-Segmented Into Smaller Ones

More

Prime Video Drops ‘Invincible’ Season 3 Teaser Trailer, Premiere Date

Choose the Perfect QA Role

Upcoming Events, October + November 2024

Husband-and-wife Directors Discuss the Making of Their Acclaimed Short, ‘In the Shadow of the Cypress’

Accelerating Change: VeriSIM Life’s Mission to Transform Drug Discovery with AI

Art’s Purpose – Animated Spirit

Create Walking Talking Mario for Live Performance

Tyler Perry Pauses $800 Million Studio Expansion Due to Open AI’s Sora

New World: Aeternum interview – making the game’s creative and complex environments

Subscribe to AI IDEA