LLMLingua

Technology Radar

Published : Oct 23, 2024

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Oct 2024

Assess

LLMLingua enhances LLM efficiency by compressing prompts using a small language model to remove nonessential tokens with minimal performance loss. This approach allows LLMs to maintain reasoning and in-context learning while efficiently processing longer prompts, which addresses challenges like cost efficiency, inference latency and context handling. Compatible with various LLMs without additional training and supporting frameworks like LLamaIndex, LLMLingua is great for optimizing LLM inference performance.