lcc
LCC: Latent Cluster Correction
- Neural networks take input samples and transform them into latent representations
- Semantically similar samples tend to aggregate into latent clusters
- This repository implements Latent Cluster Correction, a new technique to improve said latent clusters
Pretty images
These are examples of input datasets fed into image classifier models. Some selected latent representations are extracted and plotted in 2D (via dimensionality reduction). Initially, during the feature extraction phase, the samples are not clearly separated. But as the samples progressively get into the classification phase, visible latent clusters emerge. The goal of LCC is to help the formation of these clusters.
Installation
Make sure uv is installed. Then run
uv python install 3.10
uv sync --all-extras
Usage
Fine-tuning with LCC: modify and run
lcc.sh, or use the CLI directly:uv run python -m lcc train --helpFor example:
uv run python -m lcc train \ microsoft/resnet-18 \ PRESET:cifar100 \ output_dir \ --batch-size 256 \ --head-name classifier.1 \ --logit-key logits \ --lcc-submodules resnet.encoder.stages.3 \ --lcc-warmup 1 \ --lcc-weight 0.01 \ --seed 123Pretty-print a model structure from HuggingFace: run
./pretty-print.sh HF_MODEL_NAME, e.g../pretty-print.sh microsoft/resnet-18
API overview
lcc.training: Training stufflcc.training.train: Pulls and trains a model from the HuggingFace model hub (presumably pretrained on ImageNet) on a dataset also pulled from HuggingFace. This method takes the model and dataset name as argument, so it's pretty rigid.
-
lcc.datasets.HuggingFaceDataset: A HuggingFace image classification dataset wrapped inside a Lightning Datamodule for easy use with PyTorch Lightning.lcc.datasets.get_dataset: Creating aHuggingFaceDatasetrequired a bunch of arguments. I was tired of copy-pasting them around, so I made this method to create classical datasets more quickly. Seenlnas.datasets.DATASET_PRESETS_CONFIGURATIONSfor the list of available presets.
lcc.classifiers: Classifier models and wrapperslcc.classifiers.HuggingFaceClassifier: A HuggingFace image classification model wrapped inside a Lightning Module for easy use with PyTorch Lightning.lcc.classifiers.TimmClassifier: Same but fortimmmodels, which despite also coming from the Huggingface hub, require some special considerations. See alsotimm.list_models.
lcc.correction: LCC stuff. You probably don't need to touch that directly since LCC is done automatically for classifier classes found inlcc.classifiers.lcc.plotting: Cool plotting stuff.lcc.plotting.class_scatter: 2D scatter plot where samples are colored by class. Also support "outliers", which are samples with negative label.
Cite
-
@misc{hothanhImprovingFineTuningLatent2025, title = {Improving {{Fine-Tuning}} with {{Latent Cluster Correction}}}, author = {Ho Thanh, C{\'e}dric}, year = {2025}, month = jan, number = {arXiv:2501.11919}, eprint = {2501.11919}, primaryclass = {cs}, publisher = {arXiv}, doi = {10.48550/arXiv.2501.11919}, urldate = {2025-01-22}, archiveprefix = {arXiv}, keywords = {Computer Science - Machine Learning}, } Code
@software{Ho_Thanh_LCC_Latent_Cluster_2025, author = {Ho Thanh, Cédric}, license = {MIT}, month = jan, title = {{LCC: Latent Cluster Correction}}, url = {https://github.com/altaris/lcc}, version = {1.0.0}, year = {2025} }