✕

Open-source LVLMs

We benchmark 19 commonly used open-source LVLMs by single-turn Perplexity (PPL) inferencer, which confines their output to options and computes the probability for each option.

Results (in %) under multimodal bias (VL-Bias) evaluation

$$Model$$	$$Ipss$$	$$B_{ovl}$$	$$B_{max}$$	$$Acc$$	$$\Delta Acc$$
LLaVA1.5-7B	51.58	1.85	15.94	52.15	95.66
LLaVA1.5-13B	57.92	2.91	18.60	59.08	81.08
LLaVA1.6-13B	65.64	3.29	21.06	67.70	59.77
MiniGPT-v2	58.20	2.72	16.48	59.74	73.02
mPLUG-Owl2	72.59	6.48	34.02	77.56	8.84
LLaMA-Adapter-v2	55.31	0.60	7.38	55.67	86.16
InstructBLIP	74.26	4.10	19.94	77.52	14.05
Otter	62.68	1.82	9.25	63.96	59.11
LAMM	54.51	1.63	10.09	55.24	85.69
Kosmos-2	48.96	0.22	0.93	49.58	70.66
Qwen-VL	71.29	4.07	30.14	74.27	23.27
InternLM-XC2	72.93	6.30	37.32	77.77	9.45
Shikra	61.08	3.40	21.56	63.44	54.48
LLaVA-RLHF	61.05	4.15	27.57	63.04	71.24
RLHF-V	67.16	6.96	27.69	72.34	15.09

Results (in %) under visual unimodal bias (V-Bias) evaluation

$$Model$$	$$Ipss$$	$$B_{ovl}$$	$$B_{max}$$	$$Acc$$	$$\Delta Acc$$
LLaVA1.5-7B	51.67	1.60	11.34	52.17	95.62
LLaVA1.5-13B	58.85	2.55	14.44	59.90	79.27
LLaVA1.6-13B	66.65	3.36	17.55	68.79	56.72
MiniGPT-v2	55.30	1.58	7.43	56.14	83.97
mPLUG-Owl2	73.26	5.77	31.50	77.68	9.07
LLaMA-Adapter-v2	55.16	0.42	6.78	55.40	86.39
InstructBLIP	75.06	3.23	18.02	77.61	13.60
Otter	62.54	1.48	8.46	63.56	60.38
LAMM	57.54	0.62	4.33	57.94	77.85
Kosmos-2	48.95	0.21	0.95	49.53	72.69
Qwen-VL	71.07	4.54	29.88	74.36	23.99
InternLM-XC2	72.53	7.24	37.80	78.05	8.09
Shikra	60.23	2.10	14.40	61.66	63.15
LLaVA-RLHF	62.50	3.01	14.36	64.00	68.89
RLHF-V	63.83	10.46	33.05	71.30	19.02

Results (in %) under language unimodal bias (L-Bias) evaluation

$$Model$$	$$Ipss$$	$$B_{ovl}$$	$$B_{max}$$	$$Acc$$	$$\Delta Acc$$
LLaVA1.5-7B	50.86	1.25	12.08	51.27	97.43
LLaVA1.5-13B	55.86	1.65	14.60	56.41	86.85
LLaVA1.6-13B	62.52	2.37	17.35	63.93	69.94
MiniGPT-v2	54.84	2.05	13.48	55.95	84.63
mPLUG-Owl2	70.37	4.75	22.58	73.92	11.45
LLaMA-Adapter-v2	51.72	0.34	2.22	51.91	95.45
InstructBLIP	71.83	3.41	16.94	74.42	19.54
Otter	59.71	0.93	4.65	60.36	68.99
LAMM	56.13	0.91	3.72	56.67	80.50
Kosmos-2	49.94	0.03	0.14	49.99	74.55
Qwen-VL	70.18	2.96	19.94	72.35	18.48
InternLM-XC2	71.83	5.38	37.23	75.80	9.23
Shikra	59.69	3.25	13.86	61.80	56.65
LLaVA-RLHF	59.70	3.61	34.59	61.34	75.23
RLHF-V	64.08	7.36	33.69	69.25	27.68