lcc

LCC: Latent Cluster Correction

Python 3.10 CUDA 12 Documentation License DOI Paper

  • Neural networks take input samples and transform them into latent representations
  • Semantically similar samples tend to aggregate into latent clusters
  • This repository implements Latent Cluster Correction, a new technique to improve said latent clusters

Pretty images

These are examples of input datasets fed into image classifier models. Some selected latent representations are extracted and plotted in 2D (via dimensionality reduction). Initially, during the feature extraction phase, the samples are not clearly separated. But as the samples progressively get into the classification phase, visible latent clusters emerge. The goal of LCC is to help the formation of these clusters.

Installation

Make sure uv is installed. Then run

uv python install 3.10
uv sync --all-extras

Usage

  • Fine-tuning with LCC: modify and run lcc.sh, or use the CLI directly:

    uv run python -m lcc train --help
    

    For example:

    uv run python -m lcc train \
      microsoft/resnet-18 \
      PRESET:cifar100 \
      output_dir \
      --batch-size 256 \
      --head-name classifier.1 \
      --logit-key logits \
      --lcc-submodules resnet.encoder.stages.3 \
      --lcc-warmup 1 \
      --lcc-weight 0.01 \
      --seed 123
    
  • Pretty-print a model structure from HuggingFace: run ./pretty-print.sh HF_MODEL_NAME, e.g.

    ./pretty-print.sh microsoft/resnet-18
    

API overview

Cite

  • Preprint

    @misc{hothanhImprovingFineTuningLatent2025,
      title = {Improving {{Fine-Tuning}} with {{Latent Cluster Correction}}},
      author = {Ho Thanh, C{\'e}dric},
      year = {2025},
      month = jan,
      number = {arXiv:2501.11919},
      eprint = {2501.11919},
      primaryclass = {cs},
      publisher = {arXiv},
      doi = {10.48550/arXiv.2501.11919},
      urldate = {2025-01-22},
      archiveprefix = {arXiv},
      keywords = {Computer Science - Machine Learning},
    }
    
  • Code

    @software{Ho_Thanh_LCC_Latent_Cluster_2025,
      author = {Ho Thanh, Cédric},
      license = {MIT},
      month = jan,
      title = {{LCC: Latent Cluster Correction}},
      url = {https://github.com/altaris/lcc},
      version = {1.0.0},
      year = {2025}
    }
    
1"""
2.. include:: ../README.md
3.. include:: ../CHANGELOG.md
4"""