Post-training quantization to {−1, 0, +1} — works on VLMs and seq2seq, not just LLMs.
Select a model, enter a prompt, and click Generate. The first call downloads and loads the checkpoint (a few seconds); subsequent calls are fast.
Qwen3-1.7B quantized to ternary weights {−1, 0, +1}. Retains 97.5 % of FP16 quality across standard benchmarks.