Overview

We support basic alignment algorithms, i.e.,, supervised finetuning (SFT), direct preference optimization (DPO) and proximal policy optimization (PPO). Our implementation covered different modalities, each of which may involve additional algorithms. For instance, in the Text -> Text modality, we have also implemented SimPO, KTO, and others.

Modality

SFT

RM

DPO

PPO

Text -> Text (t2t)

✔️

✔️

✔️

✔️

Text+Image -> Text (ti2t)

✔️

✔️

✔️

✔️

Text+Image -> Text+Image (ti2ti)

✔️

✔️

✔️

✔️

Text -> Image (t2i)

✔️

⚒️

✔️

⚒️

Text -> Video (t2v)

✔️

⚒️

✔️

⚒️

Text -> Audio (t2a)

✔️

⚒️

✔️

⚒️

Note

Align-Anything employs a highly scalable implementation style through inheritance and derivation. Researchers can quickly extend algorithms such as SimPO and KTO to other modalities. We hope this will bring convenience to researchers.