lcc
LCC: Latent Cluster Correction
- Neural networks take input samples and transform them into latent representations
- Semantically similar samples tend to aggregate into latent clusters
- This repository implements Latent Cluster Correction, a new technique to improve said latent clusters
Pretty images
These are examples of input datasets fed into image classifier models. Some selected latent representations are extracted and plotted in 2D (via dimensionality reduction). Initially, during the feature extraction phase, the samples are not clearly separated. But as the samples progressively get into the classification phase, visible latent clusters emerge. The goal of LCC is to help the formation of these clusters.
Installation
Make sure uv
is installed. Then run
uv python install 3.10
uv sync --all-extras
Usage
Fine-tuning with LCC: modify and run
lcc.sh
, or use the CLI directly:uv run python -m lcc train --help
For example:
uv run python -m lcc train \ microsoft/resnet-18 \ PRESET:cifar100 \ output_dir \ --batch-size 256 \ --head-name classifier.1 \ --logit-key logits \ --lcc-submodules resnet.encoder.stages.3 \ --lcc-warmup 1 \ --lcc-weight 0.01 \ --seed 123
Pretty-print a model structure from HuggingFace: run
./pretty-print.sh HF_MODEL_NAME
, e.g../pretty-print.sh microsoft/resnet-18
API overview
lcc.training
: Training stufflcc.training.train
: Pulls and trains a model from the HuggingFace model hub (presumably pretrained on ImageNet) on a dataset also pulled from HuggingFace. This method takes the model and dataset name as argument, so it's pretty rigid.
-
lcc.datasets.HuggingFaceDataset
: A HuggingFace image classification dataset wrapped inside a Lightning Datamodule for easy use with PyTorch Lightning.lcc.datasets.get_dataset
: Creating aHuggingFaceDataset
required a bunch of arguments. I was tired of copy-pasting them around, so I made this method to create classical datasets more quickly. Seenlnas.datasets.DATASET_PRESETS_CONFIGURATIONS
for the list of available presets.
lcc.classifiers
: Classifier models and wrapperslcc.classifiers.HuggingFaceClassifier
: A HuggingFace image classification model wrapped inside a Lightning Module for easy use with PyTorch Lightning.lcc.classifiers.TimmClassifier
: Same but fortimm
models, which despite also coming from the Huggingface hub, require some special considerations. See alsotimm.list_models
.
lcc.correction
: LCC stuff. You probably don't need to touch that directly since LCC is done automatically for classifier classes found inlcc.classifiers
.lcc.plotting
: Cool plotting stuff.lcc.plotting.class_scatter
: 2D scatter plot where samples are colored by class. Also support "outliers", which are samples with negative label.
Cite
-
@misc{hothanhImprovingFineTuningLatent2025, title = {Improving {{Fine-Tuning}} with {{Latent Cluster Correction}}}, author = {Ho Thanh, C{\'e}dric}, year = {2025}, month = jan, number = {arXiv:2501.11919}, eprint = {2501.11919}, primaryclass = {cs}, publisher = {arXiv}, doi = {10.48550/arXiv.2501.11919}, urldate = {2025-01-22}, archiveprefix = {arXiv}, keywords = {Computer Science - Machine Learning}, }
Code
@software{Ho_Thanh_LCC_Latent_Cluster_2025, author = {Ho Thanh, Cédric}, license = {MIT}, month = jan, title = {{LCC: Latent Cluster Correction}}, url = {https://github.com/altaris/lcc}, version = {1.0.0}, year = {2025} }