Rankings
Performance rankings across 16 task categories on LongShOTBench. All scores in %.
Last updated: March 30, 2026
| Score | Category | Modality | Duration | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # | Model | Params | Input | Overall | CP | RE | Info | MM | Visual | Audio | Speech | Short | Med | Long |
Qwen3-Omni-30B-A3B-Thinking Native Video + Audio | 30B (3B active) | V+A | 61.52 | 49.26 | 71.59 | 62.77 | 62.45 | 59.99 | 53.69 | 61.77 | 58.64 | 60.75 | 58.17 | |
Qwen3-VL-32B-Thinking Mid-Size Video VLMs (10B–50B) | 32B | Video | 45.66 | 36.73 | 52.62 | 43.70 | 49.61 | 43.63 | 42.42 | 44.45 | 42.48 | 42.85 | 45.37 | |
Qwen3-VL-30B-A3B-Thinking Mid-Size Video VLMs (10B–50B) | 30B (3B active) | Video | 42.40 | 34.21 | 48.82 | 39.92 | 46.64 | 40.21 | 38.74 | 41.06 | 38.58 | 39.33 | 42.27 | |
4 | Qwen3-VL-8B-Thinking Small Video/Vision Models (5B–10B) | 8B | Video | 41.68 | 32.90 | 47.74 | 38.70 | 47.39 | 38.70 | 37.37 | 39.45 | 41.47 | 37.88 | 40.46 |
5 | Intern-S1-mini Mid-Size Video VLMs (10B–50B) | 14B | Video | 39.85 | 31.74 | 45.52 | 36.89 | 45.26 | 37.24 | 35.37 | 37.97 | 36.76 | 36.30 | 39.45 |
6 | Gemma-3-27B-IT Mid-Size Video VLMs (10B–50B) | 27B | Frames | 39.65 | 30.78 | 48.26 | 36.55 | 43.02 | 37.15 | 35.08 | 37.96 | 39.94 | 36.17 | 39.09 |
7 | Gemma-3-12B-IT Mid-Size Video VLMs (10B–50B) | 12B | Frames | 38.19 | 29.90 | 45.36 | 34.12 | 43.37 | 35.26 | 33.21 | 36.09 | 33.81 | 34.20 | 37.63 |
8 | Qwen3-VL-4B-Thinking Compact Models (≤5B) | 4B | Video | 36.26 | 28.57 | 41.35 | 33.08 | 42.03 | 33.64 | 32.23 | 34.21 | 35.22 | 32.93 | 35.12 |
9 | Qwen3-VL-32B-Instruct Mid-Size Video VLMs (10B–50B) | 32B | Video | 35.99 | 29.54 | 40.29 | 32.48 | 41.67 | 33.52 | 31.97 | 34.00 | 38.05 | 33.22 | 33.88 |
10 | MiMo-VL-7B-RL-2508 Small Video/Vision Models (5B–10B) | 7B | Video | 35.12 | 26.35 | 41.58 | 31.13 | 41.42 | 32.10 | 30.10 | 32.77 | 31.33 | 31.33 | 33.79 |
11 | Gemma-3-4B-IT Compact Models (≤5B) | 4B | Frames | 34.38 | 25.87 | 40.11 | 30.55 | 40.97 | 31.16 | 30.10 | 31.91 | 34.34 | 30.14 | 33.34 |
12 | Qwen3-Omni-30B-A3B-Instruct Native Video + Audio | 30B (3B active) | V+A | 33.76 | 25.64 | 42.59 | 30.24 | 36.58 | 30.72 | 27.08 | 31.65 | 22.60 | 30.65 | 31.38 |
13 | Qwen2.5-VL-32B-Instruct Mid-Size Video VLMs (10B–50B) | 32B | Video | 31.36 | 26.16 | 35.23 | 28.55 | 35.51 | 29.50 | 27.04 | 29.96 | 31.39 | 28.57 | 31.53 |
14 | GLM-4.6V-Flash Mid-Size Video VLMs (10B–50B) | 9B | Video | 30.46 | 22.74 | 34.75 | 26.67 | 37.69 | 27.20 | 25.33 | 27.77 | 29.44 | 26.08 | 29.63 |
15 | Qwen3-VL-8B-Instruct Small Video/Vision Models (5B–10B) | 8B | Video | 30.01 | 24.38 | 33.17 | 25.02 | 37.46 | 26.64 | 25.30 | 27.05 | 32.80 | 26.01 | 27.86 |
16 | Ovis2.6-30B-A3B Mid-Size Video VLMs (10B–50B) | 30B (3B active) | Video | 29.99 | 22.00 | 33.19 | 26.92 | 37.85 | 27.37 | 25.37 | 27.91 | 32.80 | 26.86 | 28.18 |
17 | Qwen3.5-35B-A3B Mid-Size Video VLMs (10B–50B) | 35B (3B active) | Video | 26.86 | 17.65 | 32.59 | 22.13 | 35.07 | 22.60 | 20.24 | 23.23 | 23.78 | 22.00 | 23.91 |
18 | Qwen3.5-27B Mid-Size Video VLMs (10B–50B) | 27B | Video | 26.64 | 18.19 | 33.27 | 23.04 | 32.04 | 23.70 | 20.60 | 24.32 | 23.24 | 22.88 | 25.58 |
19 | Qwen2.5-VL-72B-Instruct Large Video VLMs (50B+) | 72B | Video | 26.34 | 20.54 | 30.09 | 22.80 | 31.91 | 24.15 | 21.39 | 24.60 | 24.66 | 23.24 | 26.17 |
20 | Qwen3-VL-30B-A3B-Instruct Mid-Size Video VLMs (10B–50B) | 30B (3B active) | Video | 25.82 | 21.53 | 27.91 | 22.46 | 31.37 | 23.87 | 23.15 | 24.26 | 29.50 | 23.32 | 24.81 |
21 | InternVL3-78B Large Video VLMs (50B+) | 78B | Video | 23.58 | 18.47 | 25.12 | 21.16 | 29.58 | 20.89 | 18.71 | 21.21 | 25.07 | 20.31 | 21.80 |
22 | InternVL3-38B Mid-Size Video VLMs (10B–50B) | 38B | Video | 23.48 | 19.29 | 26.79 | 20.07 | 27.75 | 21.14 | 18.92 | 21.49 | 26.08 | 20.65 | 22.13 |
23 | Qwen3-VL-4B-Instruct Compact Models (≤5B) | 4B | Video | 23.34 | 18.33 | 25.42 | 19.37 | 30.25 | 20.67 | 19.13 | 21.00 | 29.03 | 20.10 | 21.71 |
24 | Qwen3.5-9B Small Video/Vision Models (5B–10B) | 9B | Video | 23.30 | 14.63 | 29.24 | 19.88 | 29.47 | 19.05 | 17.28 | 19.55 | 21.18 | 18.53 | 20.15 |
25 | Ovis2.5-9B Small Video/Vision Models (5B–10B) | 9B | Video | 23.19 | 17.50 | 25.84 | 19.16 | 30.26 | 20.31 | 18.76 | 20.60 | 25.01 | 19.69 | 21.50 |
26 | Molmo2-8B Small Video/Vision Models (5B–10B) | 8B | Video | 23.10 | 17.52 | 24.72 | 18.78 | 31.38 | 20.12 | 18.72 | 20.32 | 28.61 | 19.77 | 20.43 |
27 | InternVL3-14B Mid-Size Video VLMs (10B–50B) | 14B | Video | 21.21 | 17.35 | 23.75 | 18.56 | 25.19 | 19.69 | 17.56 | 19.98 | 20.06 | 19.07 | 21.02 |
28 | InternVL3.5-8B Small Video/Vision Models (5B–10B) | 8B | Video | 21.10 | 15.32 | 21.49 | 17.57 | 30.02 | 17.97 | 16.62 | 18.17 | 21.00 | 17.48 | 18.90 |
29 | LongVT-RL Small Video/Vision Models (5B–10B) | 7B | Video | 20.91 | 16.44 | 21.41 | 18.08 | 27.73 | 18.81 | 18.22 | 19.03 | 21.83 | 18.31 | 19.73 |
30 | InternVL3.5-38B Mid-Size Video VLMs (10B–50B) | 38B | Video | 20.73 | 15.80 | 22.06 | 16.41 | 28.65 | 18.42 | 17.22 | 18.66 | 21.42 | 17.76 | 19.76 |
31 | InternVL2.5-78B Large Video VLMs (50B+) | 78B | Video | 20.38 | 16.40 | 21.97 | 16.30 | 26.87 | 17.45 | 16.17 | 17.74 | 20.06 | 16.92 | 18.52 |
32 | Ovis2-16B Mid-Size Video VLMs (10B–50B) | 16B | Frames | 19.84 | 15.00 | 22.43 | 15.09 | 26.86 | 17.44 | 16.22 | 17.75 | 19.94 | 17.07 | 18.17 |
33 | InternVL3-8B Small Video/Vision Models (5B–10B) | 8B | Video | 18.94 | 15.36 | 18.92 | 15.97 | 25.52 | 16.88 | 15.55 | 17.00 | 20.00 | 16.45 | 17.68 |
34 | InternVL2.5-38B Mid-Size Video VLMs (10B–50B) | 38B | Video | 18.86 | 15.96 | 20.31 | 14.13 | 25.02 | 16.87 | 15.97 | 17.17 | 17.29 | 16.25 | 18.23 |
35 | Qwen2.5-VL-7B-Instruct Small Video/Vision Models (5B–10B) | 7B | Video | 18.39 | 15.26 | 20.44 | 17.00 | 20.84 | 17.75 | 16.53 | 18.04 | 20.47 | 17.32 | 18.60 |
36 | Kimi-VL-A3B-Thinking Mid-Size Video VLMs (10B–50B) | 8B (3B active) | Frames | 17.89 | 11.52 | 20.94 | 14.48 | 24.61 | 14.85 | 13.37 | 15.20 | 15.28 | 14.66 | 15.11 |
37 | Qwen3.5-4B Compact Models (≤5B) | 4B | Video | 17.78 | 11.49 | 23.02 | 15.40 | 21.20 | 15.49 | 14.39 | 15.85 | 16.64 | 15.50 | 15.23 |
38 | InternVL3.5-4B-Instruct Compact Models (≤5B) | 4B | Video | 17.11 | 13.84 | 18.02 | 13.39 | 23.20 | 15.33 | 13.75 | 15.57 | 13.39 | 14.90 | 16.43 |
39 | Keye-VL-1.5-8B Small Video/Vision Models (5B–10B) | 8B | Video | 17.09 | 12.81 | 19.26 | 14.33 | 21.94 | 15.12 | 14.56 | 15.37 | 18.29 | 14.55 | 16.24 |
40 | MiniCPM-o 4.5 Native Video + Audio | 8B | Video | 17.05 | 15.08 | 20.05 | 16.92 | 16.13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
41 | MiniCPM-V 4.5 Small Video/Vision Models (5B–10B) | 8B | Video | 16.29 | 12.54 | 17.84 | 12.73 | 22.06 | 14.01 | 13.87 | 14.29 | 14.93 | 13.53 | 15.13 |
42 | Qwen2.5-Omni-7B Native Video + Audio | 7B | V+A | 15.89 | 12.49 | 16.92 | 14.36 | 19.81 | 14.75 | 12.97 | 15.08 | 17.17 | 14.73 | 14.81 |
43 | Ovis2-8B Small Video/Vision Models (5B–10B) | 8B | Frames | 15.16 | 11.61 | 16.26 | 12.04 | 20.74 | 12.92 | 12.51 | 13.10 | 12.80 | 12.30 | 14.36 |
44 | MiniCPM-o 2.6 Native Video + Audio | 8B | Video | 14.41 | 11.27 | 14.36 | 11.78 | 20.24 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
45 | Ovis2-4B Compact Models (≤5B) | 4B | Frames | 13.07 | 9.82 | 13.47 | 10.18 | 18.83 | 11.04 | 10.82 | 11.09 | 10.50 | 10.34 | 12.63 |
46 | Kimi-VL-A3B-Instruct Mid-Size Video VLMs (10B–50B) | 8B (3B active) | Frames | 11.35 | 8.12 | 11.23 | 8.12 | 17.92 | 9.37 | 9.15 | 9.56 | 8.85 | 9.23 | 9.65 |
47 | LLaVA-OneVision-7B Small Video/Vision Models (5B–10B) | 7B | Video | 9.14 | 7.20 | 8.86 | 7.58 | 12.92 | 7.91 | 7.89 | 8.06 | 9.91 | 7.85 | 7.95 |
48 | Qwen2-Audio-7B-Instruct Audio-Only LLMs | 7B | Audio | 8.07 | 6.04 | 7.61 | 6.16 | 12.47 | 7.00 | 6.88 | 7.14 | 6.19 | 6.64 | 7.92 |