In-Context Clustering with Large Language Models

New York University

Abstract

We propose In-Context Clustering (ICC), a flexible LLM-based procedure for clustering data from diverse distributions. Unlike traditional clustering algorithms constrained by predefined similarity measures, ICC flexibly captures complex relationships among inputs through an attention mechanism. We show that pretrained LLMs exhibit impressive zero-shot clustering capabilities on text-encoded numeric data, with attention matrices showing salient cluster patterns. Spectral clustering using attention matrices offers surprisingly competitive performance. We further enhance the clustering capabilities of LLMs on numeric and image data through fine-tuning using the Next Token Prediction (NTP) loss. Moreover, the flexibility of LLM prompting enables text-conditioned image clustering, a capability that classical clustering methods lack. Our work extends in-context learning to an unsupervised setting, showcasing the effectiveness and flexibility of LLMs for clustering.

ICL

Different from previous in-context supervised learning that requires multiple input-output pairs in the prompt, ICC extends in-context learning to an unsupervised setting where only unlabeled input data appears in the context.


Zero-shot ICC

LLMs pre-trained on large text corpus are capable of zero-shot clustering. The figure below shows zero-shot clustering accuracy of various pretrained LLMs on t-Distribution with different degrees of freedom (df). When df is small, the data distribution has a heavy tail, which violates the Gaussian assumption of k-means. LLMs (especially those with larger model sizes) show impressive zero-shot clustering capabilities on heavy-tailed data.

Zero-shot Clustering
Figure 1: Zero-Shot Clustering Accuracy.

To better understand the inner mechanism of ICC, we visualize the attention scores across different transformer layers. We observe that attention matrices in intermediate layers show block structures that align with cluster identities. Spectral clustering using attention scores yields competitive performance compared to direct LLM generation. This surprising result suggests that attention of LLMs already encodes rich structural information beyond what is directly generated. Please refer to Sec3.2 in our paper for more details.
Attention Allocation
Figure 2: Visualization of Attention Allocation on Input Data and Corresponding Cluster Labels. The x-axis and y-axis are the ground-truth cluster labels. The top right curves are the average accuracy of spectral clustering using the input-input attention score matrices (top-left) across different layers, compared with the average accuracy of LLM generation.


Improve ICC through Finetuning

While pretrained LLMs show promising zero-shot clustering capabilities, small open-sourced models lag behind classical methods and proprietary LLMs. We create sythetic clustering data and use simple LoRA fine-tuning with NTP loss to further improve ICC.

Effect of Finetuning
Figure 3: Effect of Finetuning.

We also extend ICC to multimodal LLMs. By projecting image embeddings obtained from a pretrained visual encoder to language embedding space, LLMs can learn to produce meaningful groupings of images bease their semantic meaning.

Image Clustering
Figure 4: Left: Multimodal LLM Achitecture with Average Pooling for Image Featrues. Right: Qualitative Comparison of Models on Image Clustering — ICC outperforms k-means when the data has rich sematic information


Text-Conditioned CLustering

Real-world data can have multiple plausible clusterings depending on the objective. For example, the same set of animal images can be clustered by visual properties like colors (orange vs. white) or semantic categories like species (dog vs. cat). When the clustering condition changes, classical methods typically require retraining or re-engineering features. In contrast, LLMs can easily adapt to new conditions through prompting thanks to their powerful contextual understanding capability.

Conditional Image Clustering
Figure 5: Clustering changes when the condition changes.

BibTeX

@misc{wang2025incontextclusteringlargelanguage,
      title={In-Context Clustering with Large Language Models}, 
      author={Ying Wang and Mengye Ren and Andrew Gordon Wilson},
      year={2025},
      eprint={2510.08466},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.08466}, 
}