Usage

The benchmark scripts write JSONL and TSV output by default. Set OUT_DIR when you want to keep a run separate from the shared benchmarks/ directory.

Ollama

Start Ollama and make sure the target model is available:

ollama pull gemma4:e2b
scripts/ollama_bench.sh gemma4:e2b

Useful environment variables:

OLLAMA_URL: defaults to http://127.0.0.1:11434
OUT_DIR: defaults to benchmarks/

MLX VLM Direct

Run a local MLX VLM model directory:

scripts/mlx_vlm_bench.py \
  --model artifacts/models/mlx-community-gemma-4-e2b-it-4bit \
  --out benchmarks/mlx-vlm-run.jsonl

Optional tuning flags:

--kv-bits
--kv-quant-scheme
--prefill-step-size
--temperature

MLX VLM Server

Use this when an OpenAI-compatible MLX VLM server is already running:

MODEL=artifacts/models/mlx-community-gemma-4-e2b-it-4bit \
SERVER_URL=http://127.0.0.1:18080 \
scripts/mlx_vlm_server_bench.sh

The script records wall-clock timing around /v1/chat/completions calls and keeps the runtime-reported token counters when available.

Usage ​

Ollama ​

MLX VLM Direct ​

MLX VLM Server ​

Usage

Ollama

MLX VLM Direct

MLX VLM Server