Ternary-Quant

Post-training quantization to {−1, 0, +1} — works on VLMs and seq2seq, not just LLMs.

Select a model, enter a prompt, and click Generate. The first call downloads and loads the checkpoint (a few seconds); subsequent calls are fast.

Model

Gemma4-E2B (vision+text) Qwen3-1.7B (text) Qwen2-VL-2B (vision)

Prompt

Max new tokens

32 512

Temperature (0 = greedy)

0 1.5

Generated output

Qwen3-1.7B quantized to ternary weights {−1, 0, +1}. Retains 97.5 % of FP16 quality across standard benchmarks.

Model	Prompt	Max new tokens	Temperature (0 = greedy)