The list is sorted by size (on disk) for each score.
Highlighted = Pareto frontier.
Score | Model | Parameters | Size (GB) | Loader | Additional info |
---|---|---|---|---|---|
38/48 | Meta-Llama-3.1-70B-Instruct-Q4_K_M | 70B | 42.5 | llamacpp_HF | [link] |
38/48 | Mistral-Large-Instruct-2407.i1-Q4_K_S | 123B | 69.5 | llamacpp_HF | [link] |
37/48 | Meta-Llama-3.1-70B-Instruct-Q3_K_M | 70B | 34.3 | llamacpp_HF | [link] |
37/48 | Meta-Llama-3.1-70B-Instruct-Q3_K_L | 70B | 37.1 | llamacpp_HF | [link] |
37/48 | Meta-Llama-3.1-70B-Instruct-Q3_K_XL | 70B | 38.1 | llamacpp_HF | [link] |
37/48 | Meta-Llama-3.1-70B-Instruct-Q4_K_L | 70B | 43.3 | llamacpp_HF | [link] |
37/48 | Qwen2.5-72B-Instruct-Q4_K_M | 72B | 47.42 | llamacpp_HF | |
37/48 | Mistral-Large-Instruct-2407.i1-IQ4_XS | 123B | 65.4 | llamacpp_HF | [link] |
37/48 | Mistral-Large-Instruct-2407.i1-Q6_K | 123B | 100.7 | llamacpp_HF | [link] |
36/48 | Meta-Llama-3.1-70B-Instruct-Q3_K_S | 70B | 30.9 | llamacpp_HF | [link] |
36/48 | Meta-Llama-3.1-70B-Instruct-Q4_K_S | 70B | 40.3 | llamacpp_HF | [link] |
36/48 | turboderp_Llama-3.1-70B-Instruct-exl2_6.0bpw | 70B | 54.12 | ExLlamav2_HF | |
35/48 | Meta-Llama-3.1-70B-Instruct-IQ3_XS | 70B | 29.3 | llamacpp_HF | [link] |
35/48 | Meta-Llama-3.1-70B-Instruct-IQ4_XS | 70B | 37.9 | llamacpp_HF | [link] |
35/48 | hugging-quants_Meta-Llama-3.1-70B-Instruct-AWQ-INT4 | 70B | 39.77 | Transformers | |
35/48 | turboderp_Llama-3.1-70B-Instruct-exl2_4.5bpw | 70B | 41.41 | ExLlamav2_HF | |
35/48 | Meta-Llama-3.1-70B-Instruct-Q5_K_S | 70B | 48.7 | llamacpp_HF | [link] |
35/48 | RYS-XLarge-Q4_K_M | 72B | 50.7 | llamacpp_HF | [link] |
35/48 | Meta-Llama-3.1-70B-Instruct-Q6_K | 70B | 57.9 | llamacpp_HF | [link] |
35/48 | Meta-Llama-3.1-70B-Instruct-Q6_K_L | 70B | 58.4 | llamacpp_HF | [link] |
35/48 | Mistral-Large-Instruct-2411-IQ4_XS | 123B | 65.4 | llamacpp_HF | [link] |
35/48 | turboderp_Mistral-Large-Instruct-2407-123B-exl2_4.25bpw | 123B | 65.83 | ExLlamav2_HF | |
34/48 | Meta-Llama-3.1-70B-Instruct-Q2_K | 70B | 26.4 | llamacpp_HF | [link] |
34/48 | Meta-Llama-3.1-70B-Instruct-Q2_K_L | 70B | 27.4 | llamacpp_HF | [link] |
34/48 | Meta-Llama-3.1-70B-Instruct-IQ3_M | 70B | 31.9 | llamacpp_HF | [link] |
34/48 | Qwen2.5-32B-Instruct-Q8_0 | 32B | 34.82 | llamacpp_HF | |
34/48 | platypus-yi-34b.Q8_0 | 34B | 36.5 | llamacpp_HF | |
34/48 | Meta-Llama-3-70B-Instruct-Q4_K_S | 70B | 40.3 | llamacpp_HF | [link] |
34/48 | Llama-3.1-Nemotron-70B-Instruct-HF-Q4_K_M | 70B | 42.52 | llamacpp_HF | |
34/48 | Meta-Llama-3.1-70B-Instruct-Q5_K_M | 70B | 50.0 | llamacpp_HF | [link] |
34/48 | Meta-Llama-3.1-70B-Instruct-Q5_K_L | 70B | 50.6 | llamacpp_HF | [link] |
34/48 | LoneStriker_OpenBioLLM-Llama3-70B-6.0bpw-h6-exl2 | 70B | 54.27 | ExLlamav2_HF | |
33/48 | gemma-2-27b-it-Q4_K_S | 27B | 15.74 | llamacpp_HF | [link] |
33/48 | gemma-2-27b-it-Q4_K_M | 27B | 16.65 | llamacpp_HF | [link] |
33/48 | gemma-2-27b-it-Q5_K_S | 27B | 18.88 | llamacpp_HF | [link] |
33/48 | gemma-2-27b-it-Q5_K_M | 27B | 19.41 | llamacpp_HF | [link] |
33/48 | microsoft_Phi-3.5-MoE-instruct | 16x3.8B | 20.94 | Transformers | --load-in-4bit |
33/48 | ISTA-DASLab_Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16 | 70B | 21.92 | Transformers | |
33/48 | gemma-2-27b-it-Q6_K | 27B | 22.34 | llamacpp_HF | [link] |
33/48 | gemma-2-27b-it-Q6_K_L | 27B | 22.63 | llamacpp_HF | [link] |
33/48 | Meta-Llama-3-70B-Instruct-IQ3_XXS | 70B | 27.5 | llamacpp_HF | [link] |
33/48 | gemma-2-27b-it-Q8_0 | 27B | 28.94 | llamacpp_HF | [link] |
33/48 | Meta-Llama-3-70B-Instruct-IQ3_XS | 70B | 29.3 | llamacpp_HF | [link] |
33/48 | Meta-Llama-3-70B-Instruct-Q3_K_S | 70B | 30.9 | llamacpp_HF | [link] |
33/48 | 01-ai_Yi-1.5-34B-Chat | 34B | 34.39 | Transformers | --load-in-8bit |
33/48 | 01-ai_Yi-1.5-34B-Chat-16K | 34B | 34.39 | Transformers | --load-in-8bit |
33/48 | Undi95_Meta-Llama-3-70B-Instruct-hf | 70B | 35.28 | Transformers | --load-in-4bit |
33/48 | Mistral-Large-Instruct-2407-IQ2_XS | 123B | 36.1 | llamacpp_HF | [link] |
33/48 | dolphin-2.9.3-Yi-1.5-34B-32k-Q8_0 | 34B | 36.5 | llamacpp_HF | [link] |
33/48 | Llama3-TenyxChat-70B.i1-Q4_K_S | 70B | 40.3 | llamacpp_HF | |
33/48 | Smaug-Llama-3-70B-Instruct.i1-Q4_K_S | 70B | 40.3 | llamacpp_HF | |
33/48 | Meta-Llama-3-70B-Instruct.Q4_K_M | 70B | 42.5 | llamacpp_HF | |
33/48 | turboderp_Cat-Llama-3-70B-instruct-exl2_5.0bpw | 70B | 45.7 | ExLlamav2_HF | |
33/48 | turboderp_Llama-3-70B-Instruct-exl2_5.0bpw | 70B | 45.72 | ExLlamav2_HF | |
33/48 | magnum-72b-v1-Q4_K_M | 72B | 47.4 | llamacpp_HF | [link] |
33/48 | turboderp_Llama-3-70B-Instruct-exl2_6.0bpw | 70B | 54.17 | ExLlamav2_HF | |
33/48 | LoneStriker_dolphin-2.9-llama3-70b-6.0bpw-h6-exl2 | 70B | 54.27 | ExLlamav2_HF | |
33/48 | google_gemma-2-27b-it | 27B | 54.45 | Transformers | --bf16 --use_eager_attention |
33/48 | Meta-Llama-3-70B-Instruct.Q8_0 | 70B | 75.0 | llamacpp_HF | |
33/48 | Meta-Llama-3.1-405B-Instruct.i1-IQ2_XXS | 405B | 109.0 | llamacpp_HF | [link] |
32/48 | Phi-3-medium-128k-instruct-Q4_K_S | 14B | 7.95 | llamacpp_HF | [link] |
32/48 | Phi-3-medium-128k-instruct-Q5_K_S | 14B | 9.62 | llamacpp_HF | [link] |
32/48 | Phi-3-medium-128k-instruct-Q6_K | 14B | 11.45 | llamacpp_HF | [link] |
32/48 | gemma-2-27b-it-IQ4_XS | 27B | 14.81 | llamacpp_HF | [link] |
32/48 | gemma-2-27b-it-Q3_K_XL | 27B | 14.81 | llamacpp_HF | [link] |
32/48 | Phi-3-medium-128k-instruct-Q8_0 | 14B | 14.83 | llamacpp_HF | [link] |
32/48 | gemma-2-27b-it-Q4_K_L | 27B | 16.93 | llamacpp_HF | [link] |
32/48 | Meta-Llama-3-70B-Instruct-IQ2_S | 70B | 22.2 | llamacpp_HF | [link] |
32/48 | Meta-Llama-3-70B-Instruct-IQ2_M | 70B | 24.1 | llamacpp_HF | [link] |
32/48 | turboderp_gemma-2-27b-it-exl2_8.0bpw | 27B | 26.7 | ExLlamav2_HF | --no_flash_attn --no_xformers --no_sdpa |
32/48 | Meta-Llama-3-70B-Instruct-IQ3_S | 70B | 30.9 | llamacpp_HF | [link] |
32/48 | Meta-Llama-3-70B-Instruct-Q3_K_M | 70B | 34.3 | llamacpp_HF | [link] |
32/48 | Meta-Llama-3-70B-Instruct-Q3_K_L | 70B | 37.1 | llamacpp_HF | [link] |
32/48 | turboderp_Llama-3.1-70B-Instruct-exl2_4.0bpw | 70B | 37.15 | ExLlamav2_HF | |
32/48 | Meta-Llama-3-70B-Instruct-IQ4_XS | 70B | 37.9 | llamacpp_HF | [link] |
32/48 | Meta-Llama-3-70B-Instruct-IQ4_NL | 70B | 40.1 | llamacpp_HF | [link] |
32/48 | Llama-3-Giraffe-70B-Instruct.i1-Q4_K_S | 70B | 40.3 | llamacpp_HF | [link] |
32/48 | Athene-70B.i1-Q4_K_M | 70B | 42.5 | llamacpp_HF | [link] |
32/48 | Qwen2-72B-Instruct-Q4_K_M | 72B | 47.4 | llamacpp_HF | [link] |
32/48 | dnhkng_RYS-Gemma-2-27b-it | 27B | 58.98 | Transformers | |
31/48 | Phi-3-medium-4k-instruct-IQ3_XXS | 14B | 5.45 | llamacpp_HF | [link] |
31/48 | Phi-3-medium-128k-instruct-IQ3_M | 14B | 6.47 | llamacpp_HF | [link] |
31/48 | Phi-3-medium-128k-instruct-IQ4_XS | 14B | 7.47 | llamacpp_HF | [link] |
31/48 | Phi-3-medium-128k-instruct-IQ4_NL | 14B | 7.90 | llamacpp_HF | [link] |
31/48 | Phi-3-medium-128k-instruct-Q4_K_M | 14B | 8.57 | llamacpp_HF | [link] |
31/48 | Phi-3-medium-128k-instruct-Q5_K_M | 14B | 10.07 | llamacpp_HF | [link] |
31/48 | gemma-2-27b-it-Q3_K_S | 27B | 12.17 | llamacpp_HF | [link] |
31/48 | gemma-2-27b-it-IQ3_M | 27B | 12.45 | llamacpp_HF | [link] |
31/48 | gemma-2-27b-it-Q3_K_L | 27B | 14.52 | llamacpp_HF | [link] |
31/48 | gemma-2-27b-it-Q5_K_L | 27B | 19.69 | llamacpp_HF | [link] |
31/48 | Meta-Llama-3-70B-Instruct-IQ2_XS | 70B | 21.1 | llamacpp_HF | [link] |
31/48 | Meta-Llama-3-70B-Instruct-IQ2_XS | 70B | 21.1 | llamacpp_HF | [link] |
31/48 | Meta-Llama-3-70B-Instruct-Q2_K | 70B | 26.4 | llamacpp_HF | [link] |
31/48 | microsoft_Phi-3-medium-128k-instruct | 14B | 27.92 | Transformers | |
31/48 | Meta-Llama-3-70B-Instruct-IQ3_M | 70B | 31.9 | llamacpp_HF | [link] |
31/48 | miqu-1-70b.q4_k_m | 70B | 41.4 | llamacpp_HF | |
31/48 | Reflection-Llama-3.1-70B-Q4_K_M | 70B | 42.52 | llamacpp_HF | |
31/48 | miqu-1-70b.q5_K_M | 70B | 48.8 | llamacpp_HF | |
31/48 | oobabooga_miqu-1-70b-sf-EXL2-6.000b | 70B | 52.1 | ExLlamav2_HF | |
31/48 | cloudyu_Phoenix_DPO_60B | 60B | 60.81 | Transformers | --load-in-8bit |
31/48 | Meta-Llama-3-120B-Instruct.Q4_K_M | 120B | 73.22 | llamacpp_HF | [link] |
30/48 | Phi-3-medium-128k-instruct-IQ3_XXS | 14B | 5.45 | llamacpp_HF | [link] |
30/48 | Phi-3-medium-128k-instruct-IQ3_S | 14B | 6.06 | llamacpp_HF | [link] |
30/48 | Phi-3-medium-128k-instruct-Q3_K_L | 14B | 7.49 | llamacpp_HF | [link] |
30/48 | gemma-2-27b-it-Q3_K_M | 27B | 13.42 | llamacpp_HF | [link] |
30/48 | google_gemma-2-27b-it | 27B | 13.61 | Transformers | --load-in-4bit --bf16 --use_eager_attention |
30/48 | Qwen_Qwen2.5-7B-Instruct | 7B | 15.23 | Transformers | |
30/48 | Meta-Llama-3.1-70B-Instruct-IQ2_M | 70B | 24.1 | llamacpp_HF | [link] |
30/48 | Meta-Llama-3-70B-Instruct-abliterated-v3.5.i1-Q4_K_S | 70B | 40.3 | llamacpp_HF | [link] |
30/48 | Senku-70B-Full-Q4_K_M | 70B | 41.4 | llamacpp_HF | |
30/48 | microsoft_Phi-3.5-MoE-instruct | 16x3.8B | 41.87 | Transformers | --load-in-8bit |
30/48 | Llama3-70B-ShiningValiant2.i1-Q4_K_M | 70B | 42.5 | llamacpp_HF | [link] |
30/48 | Rhea-72b-v0.5-Q4_K_M | 72B | 43.8 | llamacpp_HF | |
30/48 | qwen1_5-72b-chat-q4_k_m | 72B | 44.2 | llamacpp_HF | |
30/48 | Phi-3-medium-128k-instruct-f32 | 14B | 55.80 | llamacpp_HF | [link] |
30/48 | Dracones_WizardLM-2-8x22B_exl2_4.0bpw | 8x22B | 70.67 | ExLlamav2_HF | |
30/48 | turboderp_Mixtral-8x22B-Instruct-v0.1-exl2_4.0bpw | 8x22B | 70.68 | ExLlamav2_HF | |
30/48 | miquliz-120b-v2.0.Q4_K_M | 120B | 72.1 | llamacpp_HF | |
30/48 | falcon-180b-chat.Q4_K_M | 180B | 108.6 | llamacpp_HF | |
29/48 | Phi-3-medium-128k-instruct-IQ3_XS | 14B | 5.81 | llamacpp_HF | [link] |
29/48 | gemma-2-9b-it-Q5_K_S | 9B | 6.48 | llamacpp_HF | [link] |
29/48 | Phi-3-medium-128k-instruct-Q3_K_M | 14B | 6.92 | llamacpp_HF | [link] |
29/48 | Phi-3-medium-4k-instruct-Q4_K_S | 14B | 7.95 | llamacpp_HF | [link] |
29/48 | Phi-3-medium-4k-instruct-Q4_K_M | 14B | 8.57 | llamacpp_HF | [link] |
29/48 | Phi-3-medium-4k-instruct-Q5_K_S | 14B | 9.62 | llamacpp_HF | [link] |
29/48 | Phi-3-medium-4k-instruct-Q5_K_M | 14B | 10.07 | llamacpp_HF | [link] |
29/48 | Phi-3-medium-4k-instruct-Q6_K | 14B | 11.45 | llamacpp_HF | [link] |
29/48 | gemma-2-27b-it-IQ3_XS | 27B | 11.55 | llamacpp_HF | [link] |
29/48 | Phi-3-medium-4k-instruct-Q8_0 | 14B | 14.83 | llamacpp_HF | [link] |
29/48 | google_gemma-2-9b-it | 9B | 18.48 | Transformers | --bf16 --use_eager_attention |
29/48 | UCLA-AGI_Gemma-2-9B-It-SPPO-Iter3 | 9B | 18.48 | Transformers | --bf16 --use_eager_attention |
29/48 | bartowski_Qwen1.5-32B-Chat-exl2_5_0 | 32B | 21.52 | ExLlamav2_HF | |
29/48 | turboderp_Llama-3.1-70B-Instruct-exl2_2.5bpw | 70B | 24.32 | ExLlamav2_HF | |
29/48 | microsoft_Phi-3-medium-4k-instruct | 14B | 27.92 | Transformers | |
29/48 | Mistral-Large-Instruct-2407-IQ2_XXS | 123B | 32.4 | llamacpp_HF | [link] |
29/48 | LoneStriker_Yi-34B-Chat-8.0bpw-h8-exl2 | 34B | 34.87 | ExLlamav2_HF | |
29/48 | 34b-beta.Q8_0 | 34B | 36.5 | llamacpp_HF | |
29/48 | Dracones_Llama-3-Lumimaid-70B-v0.1_exl2_4.5bpw | 70B | 41.43 | ExLlamav2_HF | |
29/48 | Llama3-ChatQA-1.5-70B.Q4_K_M | 70B | 42.5 | llamacpp_HF | Alpaca template. |
29/48 | turboderp_Llama-3-70B-exl2_5.0bpw | 70B | 45.7 | ExLlamav2_HF | |
29/48 | turboderp_command-r-plus-103B-exl2_3.0bpw | 104B | 47.72 | ExLlamav2_HF | |
29/48 | daybreak-miqu-1-70b-v1.0-q5_k_m | 70B | 48.8 | llamacpp_HF | |
29/48 | Phi-3-medium-4k-instruct-f32 | 14B | 55.80 | llamacpp_HF | [link] |
29/48 | c4ai-command-r-plus.i1-IQ4_XS | 104B | 56.2 | llamacpp_HF | [link] |
29/48 | Qwen1.5-110B-Chat-Q4_K_M | 110B | 67.2 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-IQ3_M | 9B | 4.49 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q3_K_M | 9B | 4.76 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q3_K_L | 9B | 5.13 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q3_K_L-Q8 | 9B | 5.35 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q3_K_XL | 9B | 5.35 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q4_K_S | 9B | 5.48 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q4_K_M | 9B | 5.76 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q4_K_L | 9B | 5.98 | llamacpp_HF | [link] |
28/48 | Phi-3-medium-128k-instruct-Q3_K_S | 14B | 6.06 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q5_K_M | 9B | 6.65 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q4_K_M-fp16 | 9B | 6.84 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q5_K_L | 9B | 6.87 | llamacpp_HF | [link] |
28/48 | Phi-3-medium-4k-instruct-Q3_K_M | 14B | 6.92 | llamacpp_HF | [link] |
28/48 | gemma-2-27b-it-IQ2_M | 27B | 9.40 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q8_0 | 9B | 9.83 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q8_0-f16 | 9B | 10.69 | llamacpp_HF | [link] |
28/48 | gemma-2-9b-it-Q8_0_L | 9B | 10.69 | llamacpp_HF | [link] |
28/48 | CohereForAI_aya-expanse-32b | 32B | 32.30 | Transformers | --load-in-8bit |
28/48 | gemma-2-9b-it-f32 | 9B | 36.97 | llamacpp_HF | [link] |
28/48 | dolphin-2.7-mixtral-8x7b.Q8_0 | 8x7B | 49.6 | llamacpp_HF | |
28/48 | turboderp_command-r-plus-103B-exl2_3.5bpw | 104B | 54.2 | ExLlamav2_HF | |
28/48 | LoneStriker_Smaug-72B-v0.1-6.0bpw-h6-exl2 | 72B | 55.82 | ExLlamav2_HF | |
28/48 | command-r-plus-Q4_K_M | 104B | 62.81 | llamacpp_HF | |
28/48 | turboderp_command-r-plus-103B-exl2_4.5bpw | 104B | 67.18 | ExLlamav2_HF | |
27/48 | gemma-2-9b-it-Q3_K_S | 9B | 4.34 | llamacpp_HF | [link] |
27/48 | gemma-2-9b-it-IQ4_XS | 9B | 5.18 | llamacpp_HF | [link] |
27/48 | Phi-3-medium-4k-instruct-IQ3_XS | 14B | 5.81 | llamacpp_HF | [link] |
27/48 | Phi-3-medium-4k-instruct-IQ3_S | 14B | 6.06 | llamacpp_HF | [link] |
27/48 | Phi-3-medium-4k-instruct-Q3_K_L | 14B | 7.49 | llamacpp_HF | [link] |
27/48 | gemma-2-9b-it-Q6_K | 9B | 7.59 | llamacpp_HF | [link] |
27/48 | gemma-2-9b-it-Q6_K_L | 9B | 7.81 | llamacpp_HF | [link] |
27/48 | gemma-2-9b-it-Q6_K-Q8 | 9B | 7.81 | llamacpp_HF | [link] |
27/48 | Phi-3-medium-4k-instruct-IQ4_NL | 14B | 7.90 | llamacpp_HF | [link] |
27/48 | google_gemma-2-9b-it | 9B | 9.24 | Transformers | --load-in-8bit --bf16 --use_eager_attention |
27/48 | gemma-2-9b-it-Q6_K-f32 | 9B | 10.51 | llamacpp_HF | [link] |
27/48 | Qwen_Qwen2-7B-Instruct | 7B | 15.23 | Transformers | |
27/48 | 01-ai_Yi-1.5-9B-Chat | 9B | 17.66 | Transformers | |
27/48 | Meta-Llama-3.1-70B-Instruct-IQ2_S | 70B | 22.2 | llamacpp_HF | [link] |
27/48 | miqu-1-70b.q2_K | 70B | 25.5 | llamacpp_HF | |
27/48 | ISTA-DASLab_c4ai-command-r-plus-AQLM-2Bit-1x16 | 104B | 31.94 | Transformers | |
27/48 | Platypus2-70B.i1-Q4_K_M | 70B | 41.4 | llamacpp_HF | [link] |
27/48 | Midnight-Miqu-70B-v1.0.Q4_K_M | 70B | 41.7 | llamacpp_HF | |
27/48 | Llama3-ChatQA-1.5-70B.Q4_K_M | 70B | 42.5 | llamacpp_HF | NVIDIA-ChatQA template. |
27/48 | mixtral-8x7b-instruct-v0.1.Q8_0 | 8x7B | 49.6 | llamacpp_HF | |
27/48 | TheBloke_Helion-4x34B-GPTQ | 4x34B | 58.31 | ExLlamav2_HF | |
26/48 | Phi-3-mini-4k-instruct-Q5_K_S | 3.8B | 2.64 | llamacpp_HF | [link] |
26/48 | gemma-2-9b-it-IQ3_XS | 9B | 4.14 | llamacpp_HF | [link] |
26/48 | google_gemma-2-9b-it | 9B | 4.62 | Transformers | --load-in-4bit --bf16 --use_eager_attention |
26/48 | Phi-3-medium-4k-instruct-IQ3_M | 14B | 6.47 | llamacpp_HF | [link] |
26/48 | Phi-3-medium-4k-instruct-IQ4_XS | 14B | 7.47 | llamacpp_HF | [link] |
26/48 | gemma-2-27b-it-IQ2_XS | 27B | 8.40 | llamacpp_HF | [link] |
26/48 | gemma-2-27b-it-Q2_K | 27B | 10.45 | llamacpp_HF | [link] |
26/48 | gemma-2-27b-it-Q2_K_L | 27B | 10.74 | llamacpp_HF | [link] |
26/48 | gemma-2-27b-it-IQ3_XXS | 27B | 10.75 | llamacpp_HF | [link] |
26/48 | turboderp_gemma-2-27b-it-exl2_3.0bpw | 27B | 13.06 | ExLlamav2_HF | --no_flash_attn --no_xformers --no_sdpa |
26/48 | microsoft_Phi-3-small-8k-instruct | 7B | 14.78 | Transformers | |
26/48 | internlm_internlm2_5-7b-chat | 7B | 15.48 | Transformers | |
26/48 | Meta-Llama-3-70B-Instruct-IQ2_XXS | 70B | 19.1 | llamacpp_HF | |
26/48 | Qwen_Qwen1.5-14B-Chat | 14B | 28.33 | Transformers | |
26/48 | Mistral-Large-Instruct-2407-IQ1_M | 123B | 28.4 | llamacpp_HF | [link] |
26/48 | LoneStriker_Yi-34B-200K-8.0bpw-h8-exl2 | 34B | 34.84 | ExLlamav2_HF | |
26/48 | LoneStriker_dolphin-2.2-yi-34b-200k-8.0bpw-h8-exl2 | 34B | 34.89 | ExLlamav2_HF | |
26/48 | CausalLM-RP-34B.q8_0 | 34B | 37.0 | llamacpp_HF | |
26/48 | lzlv_70b_fp16_hf.Q5_K_M | 70B | 48.8 | llamacpp_HF | |
26/48 | nous-hermes-2-mixtral-8x7b-dpo.Q8_0 | 8x7B | 49.6 | llamacpp_HF | |
26/48 | Mixtral-8x22B-Instruct-v0.1.Q4_K_M | 8x22B | 85.5 | llamacpp_HF | |
25/48 | Phi-3-mini-4k-instruct-Q5_K_M | 3.8B | 2.82 | llamacpp_HF | [link] |
25/48 | Phi-3-mini-4k-instruct-old-Q5_K_M | 3.8B | 2.82 | llamacpp_HF | [link] |
25/48 | gemma-2-9b-it-IQ2_S | 9B | 3.21 | llamacpp_HF | [link] |
25/48 | gemma-2-9b-it-IQ2_M | 9B | 3.43 | llamacpp_HF | [link] |
25/48 | gemma-2-9b-it-IQ3_XXS | 9B | 3.80 | llamacpp_HF | [link] |
25/48 | gemma-2-9b-it-Q2_K | 9B | 3.81 | llamacpp_HF | [link] |
25/48 | gemma-2-9b-it-Q2_K_L | 9B | 4.03 | llamacpp_HF | [link] |
25/48 | Phi-3-medium-4k-instruct-IQ2_S | 14B | 4.34 | llamacpp_HF | [link] |
25/48 | gemma-2-27b-it-IQ2_S | 27B | 8.65 | llamacpp_HF | [link] |
25/48 | NousResearch_Nous-Hermes-2-SOLAR-10.7B | 10.7B | 21.46 | Transformers | |
25/48 | turboderp_Llama-3-70B-Instruct-exl2_2.4bpw | 70B | 23.47 | ExLlamav2_HF | |
25/48 | internlm_internlm2_5-20b-chat | 20B | 39.72 | Transformers | |
25/48 | LoneStriker_Llama-3-70B-Instruct-Gradient-524k-6.0bpw-h6-exl2 | 70B | 54.11 | ExLlamav2_HF | |
25/48 | goliath-120b.Q4_K_M | 120B | 70.6 | llamacpp_HF | |
25/48 | Meta-Llama-3.1-405B-Instruct.i1-IQ1_M | 405B | 95.1 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-IQ4_XS | 3.8B | 2.06 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-IQ4_NL | 3.8B | 2.18 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-old-IQ4_NL | 3.8B | 2.18 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-old-Q5_K_S | 3.8B | 2.64 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-Q6_K | 3.8B | 3.14 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-old-Q6_K | 3.8B | 3.14 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-Q8_0 | 3.8B | 4.06 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-old-Q8_0 | 3.8B | 4.06 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-old-fp16 | 3.8B | 7.64 | llamacpp_HF | [link] |
24/48 | Phi-3-mini-4k-instruct-fp32 | 3.8B | 15.29 | llamacpp_HF | [link] |
24/48 | meta-llama_Meta-Llama-3.1-8B-Instruct | 8B | 16.06 | Transformers | |
24/48 | Meta-Llama-3.1-70B-Instruct-IQ2_XS | 70B | 21.1 | llamacpp_HF | [link] |
24/48 | upstage_SOLAR-10.7B-Instruct-v1.0 | 10.7B | 21.46 | Transformers | |
24/48 | mistralai_Mistral-Nemo-Instruct-2407 | 12B | 24.5 | Transformers | |
24/48 | dnhkng_RYS-Phi-3-medium-4k-instruct | 14B | 35.42 | Transformers | |
24/48 | MultiVerse_70B.Q4_K_M | 70B | 45.2 | llamacpp_HF | |
24/48 | maid-yuzu-v8-alter.Q8_0 | 8x7B | 49.8 | llamacpp_HF | |
24/48 | LoneStriker_Llama-3-70B-Instruct-Gradient-262k-6.0bpw-h6-exl2 | 70B | 54.27 | ExLlamav2_HF | |
23/48 | Phi-3-mini-4k-instruct-old-IQ4_XS | 3.8B | 2.06 | llamacpp_HF | [link] |
23/48 | Phi-3-mini-4k-instruct-Q4_K_M | 3.8B | 2.39 | llamacpp_HF | [link] |
23/48 | Phi-3-medium-128k-instruct-Q2_K | 14B | 5.14 | llamacpp_HF | [link] |
23/48 | Phi-3-medium-4k-instruct-Q3_K_S | 14B | 6.06 | llamacpp_HF | [link] |
23/48 | microsoft_Phi-3-mini-4k-instruct | 3.8B | 7.64 | Transformers | |
23/48 | Mistral-Nemo-Instruct-2407-Q8_0 | 12B | 13.0 | llamacpp_HF | [link] |
23/48 | Zyphra_Zamba2-7B-Instruct | 7B | 15.05 | Transformers | |
23/48 | bhenrym14_airoboros-3_1-yi-34b-200k | 34B | 34.39 | Transformers | --load-in-8bit |
23/48 | xwin-lm-70b-v0.1.Q4_K_M | 70B | 41.4 | llamacpp_HF | |
22/48 | Phi-3-mini-4k-instruct-old-IQ3_XXS | 3.8B | 1.51 | llamacpp_HF | [link] |
22/48 | Phi-3-mini-4k-instruct-Q4_K_S | 3.8B | 2.19 | llamacpp_HF | [link] |
22/48 | Phi-3-mini-4k-instruct-old-Q4_K_M | 3.8B | 2.39 | llamacpp_HF | [link] |
22/48 | Phi-3-medium-128k-instruct-IQ2_S | 14B | 4.34 | llamacpp_HF | [link] |
22/48 | Meta-Llama-3-8B-Instruct-Q4_K_S | 8B | 4.69 | llamacpp_HF | |
22/48 | Phi-3-medium-128k-instruct-IQ2_M | 14B | 4.72 | llamacpp_HF | [link] |
22/48 | Phi-3-medium-4k-instruct-IQ2_M | 14B | 4.72 | llamacpp_HF | [link] |
22/48 | Phi-3-medium-4k-instruct-Q2_K | 14B | 5.14 | llamacpp_HF | [link] |
22/48 | Qwen_Qwen1.5-7B-Chat | 7B | 15.44 | Transformers | |
22/48 | THUDM_glm-4-9b-chat | 9B | 18.80 | Transformers | |
22/48 | liuhaotian_llava-v1.5-13b | 13B | 26.09 | Transformers | |
22/48 | turboderp_command-r-v01-35B-exl2_6.0bpw | 35B | 32.09 | ExLlamav2_HF | |
22/48 | turboderp_command-r-plus-103B-exl2_2.5bpw | 104B | 41.23 | ExLlamav2_HF | |
22/48 | tulu-2-dpo-70b.Q4_K_M | 70B | 41.4 | llamacpp_HF | |
22/48 | wizardlm-70b-v1.0.Q4_K_M | 70B | 41.4 | llamacpp_HF | |
22/48 | MoMo-72B-lora-1.8.6-DPO-Q4_K_M | 72B | 43.8 | llamacpp_HF | |
22/48 | meraGPT_mera-mix-4x7B | 4x7B | 48.31 | Transformers | |
21/48 | Phi-3-mini-4k-instruct-IQ3_XXS | 3.8B | 1.51 | llamacpp_HF | [link] |
21/48 | Phi-3-mini-4k-instruct-old-Q3_K_L | 3.8B | 2.09 | llamacpp_HF | [link] |
21/48 | Phi-3-mini-4k-instruct-old-Q4_K_S | 3.8B | 2.19 | llamacpp_HF | [link] |
21/48 | Phi-3.1-mini-4k-instruct-Q4_K_M | 3.8B | 2.39 | llamacpp_HF | [link] |
21/48 | Phi-3.1-mini-4k-instruct-Q5_K_S | 3.8B | 2.64 | llamacpp_HF | [link] |
21/48 | gemma-2-9b-it-IQ2_XS | 9B | 3.07 | llamacpp_HF | [link] |
21/48 | Phi-3-medium-4k-instruct-IQ2_XS | 14B | 4.13 | llamacpp_HF | [link] |
21/48 | Meta-Llama-3.1-8B-Instruct-Q4_K_S | 8B | 4.69 | llamacpp_HF | [link] |
21/48 | Meta-Llama-3.1-8B-Instruct-Q5_K_M | 8B | 5.73 | llamacpp_HF | [link] |
21/48 | Meta-Llama-3.1-8B-Instruct-Q6_K | 8B | 6.6 | llamacpp_HF | [link] |
21/48 | Meta-Llama-3.1-8B-Instruct-Q6_K_L | 8B | 6.85 | llamacpp_HF | [link] |
21/48 | microsoft_Phi-3-vision-128k-instruct | 4.2B | 8.29 | Transformers | |
21/48 | Meta-Llama-3-8B-Instruct-Q8_0 | 8B | 8.54 | llamacpp_HF | |
21/48 | 01-ai_Yi-1.5-6B-Chat | 6B | 12.12 | Transformers | |
21/48 | microsoft_Phi-3-small-128k-instruct | 7B | 14.78 | Transformers | |
21/48 | gustavecortal_oneirogen-7B | 7B | 15.23 | Transformers | |
21/48 | NurtureAI_Meta-Llama-3-8B-Instruct-64k | 8B | 16.06 | Transformers | |
21/48 | Undi95_Meta-Llama-3-8B-Instruct-hf | 8B | 16.06 | Transformers | |
21/48 | Salesforce_SFR-Iterative-DPO-LLaMA-3-8B-R | 8B | 16.06 | Transformers | |
21/48 | Meta-Llama-3-8B-Instruct-fp16 | 8B | 16.1 | llamacpp_HF | |
21/48 | c4ai-command-r-v01-Q8_0 | 35B | 37.2 | llamacpp_HF | |
21/48 | internlm_internlm2-chat-20b | 20B | 39.72 | Transformers | |
21/48 | falcon-180b.Q4_K_M | 180B | 108.6 | llamacpp_HF | |
20/48 | gemma-2-2b-it-Q4_K_S | 2.6B | 1.64 | llamacpp_HF | [link] |
20/48 | Phi-3-mini-4k-instruct-Q3_K_M | 3.8B | 1.96 | llamacpp_HF | [link] |
20/48 | Phi-3-mini-4k-instruct-Q3_K_L | 3.8B | 2.09 | llamacpp_HF | [link] |
20/48 | Phi-3.1-mini-4k-instruct-Q4_K_S | 3.8B | 2.19 | llamacpp_HF | [link] |
20/48 | Phi-3.1-mini-4k-instruct-Q5_K_M | 3.8B | 2.82 | llamacpp_HF | [link] |
20/48 | Phi-3.1-mini-4k-instruct-Q6_K | 3.8B | 3.14 | llamacpp_HF | [link] |
20/48 | Phi-3.1-mini-4k-instruct-Q6_K_L | 3.8B | 3.18 | llamacpp_HF | [link] |
20/48 | Phi-3-medium-128k-instruct-IQ2_XS | 14B | 4.13 | llamacpp_HF | [link] |
20/48 | Meta-Llama-3.1-8B-Instruct-Q4_K_M | 8B | 4.92 | llamacpp_HF | [link] |
20/48 | Meta-Llama-3.1-8B-Instruct-Q4_K_L | 8B | 5.31 | llamacpp_HF | [link] |
20/48 | Meta-Llama-3.1-8B-Instruct-Q5_K_L | 8B | 6.06 | llamacpp_HF | [link] |
20/48 | Meta-Llama-3-8B-Instruct-Q6_K | 8B | 6.6 | llamacpp_HF | |
20/48 | TheBloke_llava-v1.5-13B-GPTQ | 13B | 7.26 | ExLlamav2_HF | |
20/48 | microsoft_Phi-3-mini-4k-instruct-20240701 | 3.8B | 7.64 | Transformers | |
20/48 | openchat_openchat_3.5 | 7B | 14.48 | Transformers | |
20/48 | Weyaxi_Einstein-v6.1-Llama3-8B | 8B | 16.06 | Transformers | |
20/48 | BAAI_Bunny-Llama-3-8B-V | 8B | 16.96 | Transformers | |
20/48 | CohereForAI_aya-23-35B | 35B | 34.98 | Transformers | --load-in-8bit |
20/48 | llama-2-70b-chat.Q4_K_M | 70B | 41.4 | llamacpp_HF | |
20/48 | Ein-72B-v0.1-full.Q4_K_M | 72B | 45.2 | llamacpp_HF | |
19/48 | Phi-3.1-mini-128k-instruct-IQ3_XXS | 3.8B | 1.51 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-128k-instruct-IQ3_XS | 3.8B | 1.63 | llamacpp_HF | [link] |
19/48 | gemma-2-2b-it-Q4_K_M | 2.6B | 1.71 | llamacpp_HF | [link] |
19/48 | Phi-3-mini-4k-instruct-IQ3_M | 3.8B | 1.86 | llamacpp_HF | [link] |
19/48 | Phi-3-mini-4k-instruct-old-IQ3_M | 3.8B | 1.86 | llamacpp_HF | [link] |
19/48 | gemma-2-2b-it-Q5_K_S | 2.6B | 1.88 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-4k-instruct-Q3_K_M | 3.8B | 1.96 | llamacpp_HF | [link] |
19/48 | Phi-3-mini-4k-instruct-old-Q3_K_M | 3.8B | 1.96 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-4k-instruct-IQ4_XS | 3.8B | 2.06 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-128k-instruct-Q3_K_L | 3.8B | 2.09 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-4k-instruct-Q3_K_L | 3.8B | 2.09 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-128k-instruct-Q3_K_XL | 3.8B | 2.17 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-4k-instruct-Q3_K_XL | 3.8B | 2.17 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-128k-instruct-Q4_K_L | 3.8B | 2.47 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-4k-instruct-Q4_K_L | 3.8B | 2.47 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-128k-instruct-Q5_K_S | 3.8B | 2.64 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-128k-instruct-Q5_K_M | 3.8B | 2.82 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-128k-instruct-Q5_K_L | 3.8B | 2.88 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-4k-instruct-Q5_K_L | 3.8B | 2.88 | llamacpp_HF | [link] |
19/48 | Meta-Llama-3-8B-Instruct-IQ3_S | 8B | 3.68 | llamacpp_HF | [link] |
19/48 | Phi-3.1-mini-4k-instruct-Q8_0 | 3.8B | 4.06 | llamacpp_HF | [link] |
19/48 | mistral-7b-instruct-v0.2.Q4_K_S | 7B | 4.14 | llamacpp_HF | |
19/48 | Meta-Llama-3-8B-Instruct-IQ4_XS | 8B | 4.45 | llamacpp_HF | |
19/48 | Meta-Llama-3-8B-Instruct-IQ4_NL | 8B | 4.68 | llamacpp_HF | |
19/48 | Meta-Llama-3-8B-Instruct-Q4_K_M | 8B | 4.92 | llamacpp_HF | |
19/48 | Meta-Llama-3-8B-Instruct-Q5_K_S | 8B | 5.6 | llamacpp_HF | |
19/48 | Meta-Llama-3-8B-Instruct-Q5_K_M | 8B | 5.73 | llamacpp_HF | |
19/48 | microsoft_Phi-3-mini-128k-instruct | 3.8B | 7.64 | Transformers | |
19/48 | microsoft_Phi-3.5-vision-instruct | 3.8B | 8.29 | Transformers | |
19/48 | Nexusflow_Starling-LM-7B-beta | 7B | 14.48 | Transformers | |
19/48 | NousResearch_Hermes-2-Pro-Mistral-7B | 7B | 14.48 | Transformers | |
19/48 | Phi-3.1-mini-4k-instruct-f32 | 3.8B | 15.29 | llamacpp_HF | [link] |
19/48 | internlm_internlm2-chat-7b | 7B | 15.48 | Transformers | |
19/48 | lightblue_suzume-llama-3-8B-multilingual | 8B | 16.06 | Transformers | |
19/48 | UCLA-AGI_Llama-3-Instruct-8B-SPPO-Iter3 | 8B | 16.06 | Transformers | |
19/48 | ai21labs_Jamba-v0.1 | 52B | 25.79 | Transformers | --load-in-4bit |
19/48 | Meta-Llama-3.1-8B-Instruct-f32 | 8B | 32.1 | llamacpp_HF | [link] |
19/48 | internlm_internlm2-chat-20b-sft | 20B | 39.72 | Transformers | |
19/48 | llama-2-70b.Q5_K_M | 70B | 48.8 | llamacpp_HF | |
19/48 | DeepSeek-Coder-V2-Instruct-IQ2_XS | 236B | 68.7 | llamacpp_HF | [link] |
19/48 | zephyr-orpo-141b-A35b-v0.1.Q4_K_M | 141B | 85.59 | llamacpp_HF | |
18/48 | gemma-2-2b-it-IQ4_XS | 2.6B | 1.57 | llamacpp_HF | [link] |
18/48 | Phi-3.1-mini-4k-instruct-IQ3_XS | 3.8B | 1.63 | llamacpp_HF | [link] |
18/48 | Phi-3-mini-4k-instruct-IQ3_S | 3.8B | 1.68 | llamacpp_HF | [link] |
18/48 | Phi-3-mini-4k-instruct-Q3_K_S | 3.8B | 1.68 | llamacpp_HF | [link] |
18/48 | Phi-3-mini-4k-instruct-old-IQ3_S | 3.8B | 1.68 | llamacpp_HF | [link] |
18/48 | Phi-3-mini-4k-instruct-old-Q3_K_S | 3.8B | 1.68 | llamacpp_HF | [link] |
18/48 | Phi-3.1-mini-128k-instruct-IQ3_M | 3.8B | 1.86 | llamacpp_HF | [link] |
18/48 | Phi-3.1-mini-4k-instruct-IQ3_M | 3.8B | 1.86 | llamacpp_HF | [link] |
18/48 | gemma-2-2b-it-Q5_K_M | 2.6B | 1.92 | llamacpp_HF | [link] |
18/48 | Phi-3.1-mini-128k-instruct-Q3_K_M | 3.8B | 1.96 | llamacpp_HF | [link] |
18/48 | gemma-2-2b-it-Q6_K | 2.6B | 2.15 | llamacpp_HF | [link] |
18/48 | Phi-3.1-mini-128k-instruct-Q4_K_S | 3.8B | 2.19 | llamacpp_HF | [link] |
18/48 | gemma-2-2b-it-Q6_K_L | 2.6B | 2.29 | llamacpp_HF | [link] |
18/48 | Phi-3.1-mini-128k-instruct-Q4_K_M | 3.8B | 2.39 | llamacpp_HF | [link] |
18/48 | gemma-2-2b-it-Q8_0 | 2.6B | 2.78 | llamacpp_HF | [link] |
18/48 | Phi-3.1-mini-128k-instruct-Q6_K | 3.8B | 3.14 | llamacpp_HF | [link] |
18/48 | Phi-3.1-mini-128k-instruct-Q6_K_L | 3.8B | 3.18 | llamacpp_HF | [link] |
18/48 | Meta-Llama-3-8B-Instruct-Q3_K_M | 8B | 4.02 | llamacpp_HF | |
18/48 | Meta-Llama-3-8B-Instruct-Q3_K_L | 8B | 4.32 | llamacpp_HF | |
18/48 | google_gemma-2-2b-it | 2B | 5.23 | Transformers | |
18/48 | microsoft_Phi-3-mini-128k-instruct | 3.8B | 7.64 | Transformers | --load-in-8bit |
18/48 | microsoft_Phi-3.5-mini-instruct | 3.8B | 8.29 | Transformers | |
18/48 | Meta-Llama-3.1-8B-Instruct-Q8_0 | 8B | 8.54 | llamacpp_HF | [link] |
18/48 | gemma-2-2b-it-f32 | 2.6B | 10.46 | llamacpp_HF | [link] |
18/48 | mistralai_Mistral-7B-Instruct-v0.2 | 7B | 14.48 | Transformers | |
18/48 | jieliu_Storm-7B | 7B | 14.48 | Transformers | |
18/48 | failspy_kappa-3-phi-abliterated | 3.8B | 15.28 | Transformers | |
18/48 | Phi-3.1-mini-128k-instruct-f32 | 3.8B | 15.29 | llamacpp_HF | [link] |
18/48 | turboderp_llama3-turbcat-instruct-8b | 8B | 16.06 | Transformers | |
18/48 | Meta-Llama-3-70B-Instruct-IQ1_M | 70B | 16.8 | llamacpp_HF | |
18/48 | Orenguteng_Lexi-Llama-3-8B-Uncensored | 8B | 17.67 | Transformers | |
18/48 | Meta-Llama-3.1-70B-Instruct-IQ2_XXS | 70B | 19.1 | llamacpp_HF | [link] |
18/48 | LoneStriker_Nous-Capybara-34B-4.65bpw-h6-exl2 | 34B | 20.76 | ExLlamav2_HF | |
18/48 | Qwen_Qwen1.5-MoE-A2.7B-Chat | 14.3B | 28.63 | Transformers | |
18/48 | TheProfessor-155b.i1-IQ3_XS | 155B | 63.2 | llamacpp_HF | [link] |
17/48 | gemma-2-2b-it-IQ3_M | 2.6B | 1.39 | llamacpp_HF | [link] |
17/48 | Phi-3-mini-4k-instruct-v0.3-IQ3_XXS | 3.8B | 1.51 | llamacpp_HF | [link] |
17/48 | gemma-2-2b-it-Q3_K_L | 2.6B | 1.55 | llamacpp_HF | [link] |
17/48 | Phi-3-mini-4k-instruct-v0.3-Q3_K_S | 3.8B | 1.68 | llamacpp_HF | [link] |
17/48 | microsoft_Phi-3-mini-128k-instruct | 3.8B | 1.91 | Transformers | --load-in-4bit |
17/48 | turboderp_Phi-3-mini-128k-instruct-exl2_5.0bpw | 3.8B | 2.54 | ExLlamav2_HF | |
17/48 | Phi-3-mini-4k-instruct-v0.3-Q5_K_S | 3.8B | 2.64 | llamacpp_HF | [link] |
17/48 | turboderp_Phi-3-mini-128k-instruct-exl2_6.0bpw | 3.8B | 2.99 | ExLlamav2_HF | |
17/48 | Phi-3.1-mini-128k-instruct-Q8_0 | 3.8B | 4.06 | llamacpp_HF | [link] |
17/48 | Meta-Llama-3.1-8B-Instruct-Q3_K_L | 8B | 4.32 | llamacpp_HF | [link] |
17/48 | Meta-Llama-3.1-8B-Instruct-IQ4_XS | 8B | 4.45 | llamacpp_HF | [link] |
17/48 | Meta-Llama-3.1-8B-Instruct-Q5_K_S | 8B | 5.6 | llamacpp_HF | [link] |
17/48 | amazingvince_Not-WizardLM-2-7B | 7B | 14.48 | Transformers | |
17/48 | Undi95_Toppy-M-7B | 7B | 14.48 | Transformers | |
17/48 | internlm_internlm2-chat-7b-sft | 7B | 15.48 | Transformers | |
17/48 | mzbac_llama-3-8B-Instruct-function-calling | 8B | 16.06 | Transformers | |
17/48 | openchat_openchat-3.6-8b-20240522 | 8B | 16.06 | Transformers | |
17/48 | kubernetes-bad_Mistral-7B-Instruct-v0.3 | 7B | 28.99 | Transformers | |
17/48 | ZeusLabs_L3-Aethora-15B-V2 | 15B | 30.02 | Transformers | |
17/48 | ggml-alpaca-dragon-72b-v1-q4_k_m | 72B | 43.8 | llamacpp_HF | |
17/48 | DeepSeek-V2-Chat-0628-IQ2_XS | 236B | 68.7 | llamacpp_HF | [link] |
17/48 | Meta-Llama-3.1-405B-Instruct.i1-IQ1_S | 405B | 86.8 | llamacpp_HF | [link] |
17/48 | grok-1-IQ2_XS | 314B | 93.31 | llamacpp_HF | [link] |
16/48 | Phi-3.1-mini-4k-instruct-Q3_K_S | 3.8B | 1.68 | llamacpp_HF | [link] |
16/48 | Phi-3.1-mini-128k-instruct-IQ4_XS | 3.8B | 2.06 | llamacpp_HF | [link] |
16/48 | Phi-3-mini-4k-instruct-v0.3-Q4_K_S | 3.8B | 2.19 | llamacpp_HF | [link] |
16/48 | Meta-Llama-3-8B-Instruct-IQ3_M | 8B | 3.78 | llamacpp_HF | |
16/48 | Meta-Llama-3.1-8B-Instruct-Q3_K_M | 8B | 4.02 | llamacpp_HF | [link] |
16/48 | TheBloke_Mistral-7B-Instruct-v0.2-GPTQ | 7B | 4.16 | ExLlamav2_HF | |
16/48 | Meta-Llama-3.1-8B-Instruct-Q3_K_XL | 8B | 4.78 | llamacpp_HF | [link] |
16/48 | microsoft_Phi-3-mini-128k-instruct-20240701 | 3.8B | 7.64 | Transformers | |
16/48 | mixtral-8x7b-instruct-v0.1.Q2_K | 8x7B | 15.6 | llamacpp_HF | |
15/48 | Phi-3-mini-4k-instruct-IQ3_XS | 3.8B | 1.63 | llamacpp_HF | [link] |
15/48 | Phi-3-mini-4k-instruct-old-IQ3_XS | 3.8B | 1.63 | llamacpp_HF | [link] |
15/48 | Phi-3.1-mini-128k-instruct-Q3_K_S | 3.8B | 1.68 | llamacpp_HF | [link] |
15/48 | Phi-3-mini-4k-instruct-v0.3-Q3_K_L | 3.8B | 2.09 | llamacpp_HF | [link] |
15/48 | Phi-3-mini-4k-instruct-v0.3-Q4_K_M | 3.8B | 2.39 | llamacpp_HF | [link] |
15/48 | Phi-3-mini-4k-instruct-v0.3-Q5_K_M | 3.8B | 2.82 | llamacpp_HF | [link] |
15/48 | Meta-Llama-3-8B-Instruct-IQ3_XS | 8B | 3.52 | llamacpp_HF | |
15/48 | Meta-Llama-3.1-8B-Instruct-Q3_K_S | 8B | 3.66 | llamacpp_HF | [link] |
15/48 | hjhj3168_Llama-3-8b-Orthogonalized-exl2 | 8B | 6.7 | ExLlamav2_HF | |
15/48 | cognitivecomputations_dolphin-2.9-llama3-8b | 8B | 16.06 | Transformers | |
15/48 | xtuner_llava-llama-3-8b-v1_1 | 8B | 16.06 | Transformers | |
15/48 | NousResearch_Hermes-2-Pro-Llama-3-8B | 8B | 16.06 | Transformers | |
15/48 | CohereForAI_c4ai-command-r-v01-4bit | 35B | 22.69 | Transformers | |
14/48 | turboderp_Phi-3-mini-128k-instruct-exl2_4.0bpw | 3.8B | 2.09 | ExLlamav2_HF | |
14/48 | Meta-Llama-3-8B-Instruct-IQ3_XXS | 8B | 3.27 | llamacpp_HF | |
14/48 | Meta-Llama-3.1-8B-Instruct-IQ3_XS | 8B | 3.52 | llamacpp_HF | [link] |
14/48 | Meta-Llama-3-8B-Instruct-Q3_K_S | 8B | 3.66 | llamacpp_HF | |
14/48 | Meta-Llama-3.1-8B-Instruct-IQ3_M | 8B | 3.78 | llamacpp_HF | [link] |
14/48 | mattshumer_Llama-3-8B-16K | 8B | 16.06 | Transformers | |
14/48 | nvidia_ChatQA-1.5-8B | 8B | 16.06 | Transformers | Alpaca template. |
14/48 | Gryphe_MythoMax-L2-13b | 13B | 26.03 | Transformers | |
14/48 | microsoft_Orca-2-13b | 13B | 52.06 | Transformers | |
14/48 | Undi95_ReMM-SLERP-L2-13B | 13B | 52.06 | Transformers | |
13/48 | Phi-3-mini-4k-instruct-v0.3-Q3_K_M | 3.8B | 1.96 | llamacpp_HF | [link] |
13/48 | Phi-3-mini-4k-instruct-v0.3-Q6_K | 3.8B | 3.14 | llamacpp_HF | [link] |
13/48 | Phi-3-medium-4k-instruct-IQ2_XXS | 14B | 3.72 | llamacpp_HF | [link] |
13/48 | gradientai_Llama-3-8B-Instruct-262k | 8B | 16.06 | Transformers | |
13/48 | gradientai_Llama-3-8B-Instruct-Gradient-1048k | 8B | 16.06 | Transformers | |
13/48 | nvidia_ChatQA-1.5-8B | 8B | 16.06 | Transformers | NVIDIA-ChatQA template. |
13/48 | alpindale_gemma-7b-it | 7B | 17.08 | Transformers | |
13/48 | internlm_internlm2-wqx-20b | 20B | 39.72 | Transformers | |
13/48 | GeorgiaTechResearchInstitute_galactica-30b-evol-instruct-70k | 30B | 59.95 | Transformers | GALACTICA template. |
13/48 | turboderp_dbrx-instruct-exl2_3.75bpw | 132B | 62.83 | ExLlamav2_HF | Without the "You are DBRX..." system prompt. |
12/48 | Phi-3.1-mini-128k-instruct-Q2_K | 3.8B | 1.42 | llamacpp_HF | [link] |
12/48 | Phi-3.1-mini-128k-instruct-Q2_K_L | 3.8B | 1.51 | llamacpp_HF | [link] |
12/48 | Phi-3-mini-4k-instruct-v0.3-Q8_0 | 3.8B | 4.06 | llamacpp_HF | [link] |
12/48 | ISTA-DASLab_Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16 | 8B | 4.08 | Transformers | |
12/48 | HuggingFaceH4_zephyr-7b-beta | 7B | 14.48 | Transformers | |
12/48 | Phi-3-mini-4k-instruct-v0.3-f32 | 3.8B | 15.29 | llamacpp_HF | [link] |
12/48 | Meta-Llama-3-70B-Instruct-IQ1_S | 70B | 15.3 | llamacpp_HF | |
12/48 | NousResearch_Llama-2-13b-chat-hf | 13B | 26.03 | Transformers | |
12/48 | mistralai_mathstral-7B-v0.1 | 7B | 28.99 | Transformers | |
12/48 | llama-65b.Q5_K_M | 65B | 46.2 | llamacpp_HF | |
11/48 | Phi-3-mini-4k-instruct-Q2_K | 3.8B | 1.42 | llamacpp_HF | [link] |
11/48 | Phi-3-mini-4k-instruct-v0.3-IQ3_M | 3.8B | 1.86 | llamacpp_HF | [link] |
11/48 | Phi-3-mini-4k-instruct-v0.3-IQ4_XS | 3.8B | 2.06 | llamacpp_HF | [link] |
11/48 | Meta-Llama-3-8B-Instruct-IQ2_M | 8B | 2.95 | llamacpp_HF | |
11/48 | Meta-Llama-3.1-8B-Instruct-IQ2_M | 8B | 2.95 | llamacpp_HF | [link] |
11/48 | Meta-Llama-3-8B-Instruct-Q2_K | 8B | 3.18 | llamacpp_HF | |
11/48 | Phi-3-medium-128k-instruct-IQ2_XXS | 14B | 3.72 | llamacpp_HF | [link] |
11/48 | meta-llama_Llama-3.2-3B-Instruct | 3B | 6.43 | Transformers | |
11/48 | mlabonne_phixtral-2x2_8 | 2x2.8B | 8.92 | Transformers | |
11/48 | Meta-Llama-3.1-70B-Instruct-IQ1_M | 70B | 16.8 | llamacpp_HF | [link] |
11/48 | mlfoundations_tabula-8b | 8B | 32.12 | Transformers | |
11/48 | nyunai_nyun-c2-llama3-56B | 56B | 55.62 | Transformers | --load-in-8bit |
10/48 | ISTA-DASLab_c4ai-command-r-v01-AQLM-2Bit-1x16 | 35B | 12.72 | Transformers | |
10/48 | facebook_galactica-30b | 30B | 60.66 | Transformers | |
9/48 | Phi-3-mini-4k-instruct-old-Q2_K | 3.8B | 1.42 | llamacpp_HF | [link] |
9/48 | Qwen_Qwen2-1.5B-Instruct | 1.5B | 3.09 | Transformers | |
9/48 | Meta-Llama-3.1-8B-Instruct-Q2_K_L | 8B | 3.69 | llamacpp_HF | [link] |
9/48 | microsoft_phi-2 | 2.7B | 5.56 | Transformers | |
9/48 | TheBloke_vicuna-33B-GPTQ | 33B | 16.94 | ExLlamav2_HF | |
8/48 | Phi-3-mini-4k-instruct-v0.3-Q2_K | 3.8B | 1.42 | llamacpp_HF | [link] |
8/48 | Meta-Llama-3-8B-Instruct-IQ2_S | 8B | 2.76 | llamacpp_HF | |
8/48 | ISTA-DASLab_Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16 | 8B | 4.08 | Transformers | |
8/48 | NousResearch_Llama-2-7b-chat-hf | 7B | 13.48 | Transformers | |
8/48 | mistralai_Mistral-7B-Instruct-v0.1 | 7B | 14.48 | Transformers | |
8/48 | NousResearch_Nous-Capybara-7B-V1.9 | 7B | 14.48 | Transformers | |
8/48 | gradientai_Llama-3-8B-Instruct-Gradient-1048k | 8B | 16.06 | Transformers | Revision of 2024/05/04. |
7/48 | Phi-3.1-mini-4k-instruct-Q2_K | 3.8B | 1.42 | llamacpp_HF | [link] |
7/48 | Phi-3-mini-4k-instruct-v0.3-IQ3_XS | 3.8B | 1.63 | llamacpp_HF | [link] |
7/48 | Meta-Llama-3.1-8B-Instruct-Q2_K | 8B | 3.18 | llamacpp_HF | [link] |
7/48 | h2oai_h2o-danube3-4b-chat | 4B | 7.92 | Transformers | |
7/48 | tiiuae_falcon-11B | 11B | 22.21 | Transformers | |
7/48 | GeorgiaTechResearchInstitute_galactica-30b-evol-instruct-70k | 30B | 59.95 | Transformers | Alpaca template. |
6/48 | Phi-3-mini-4k-instruct-IQ2_S | 3.8B | 1.22 | llamacpp_HF | [link] |
6/48 | Phi-3.1-mini-4k-instruct-Q2_K_L | 3.8B | 1.51 | llamacpp_HF | [link] |
6/48 | meta-llama_Meta-Llama-3.1-8B | 8B | 16.06 | Transformers | |
6/48 | tiiuae_falcon-40b-instruct | 40B | 41.84 | Transformers | --load-in-8bit; falcon-180B-chat instruction template. |
5/48 | Phi-3-mini-4k-instruct-old-IQ2_S | 3.8B | 1.22 | llamacpp_HF | [link] |
5/48 | Meta-Llama-3-8B-Instruct-IQ2_XS | 8B | 2.61 | llamacpp_HF | |
5/48 | Phi-3-medium-128k-instruct-IQ1_S | 14B | 2.96 | llamacpp_HF | [link] |
5/48 | internlm_internlm2-chat-1_8b-sft | 1.8B | 3.78 | Transformers | |
5/48 | TheBloke_Llama-2-13B-GPTQ | 13B | 7.26 | ExLlamav2_HF | |
5/48 | LoneStriker_deepseek-coder-33b-instruct-6.0bpw-h6-exl2 | 33B | 25.32 | ExLlamav2_HF | |
5/48 | NousResearch_Llama-2-13b-hf | 13B | 26.03 | Transformers | |
5/48 | unsloth_llama-3-70b-bnb-4bit | 70B | 39.52 | Transformers | |
4/48 | Phi-3-mini-4k-instruct-IQ2_M | 3.8B | 1.32 | llamacpp_HF | [link] |
4/48 | turboderp_Phi-3-mini-128k-instruct-exl2_3.0bpw | 3.8B | 1.63 | ExLlamav2_HF | |
4/48 | internlm_internlm2-chat-1_8b | 1.8B | 3.78 | Transformers | |
4/48 | TheBloke_deepseek-coder-33B-instruct-AWQ | 33B | 18.01 | AutoAWQ | |
3/48 | Phi-3-mini-4k-instruct-IQ2_XXS | 3.8B | 1.04 | llamacpp_HF | [link] |
3/48 | Phi-3-mini-4k-instruct-old-IQ2_XXS | 3.8B | 1.04 | llamacpp_HF | [link] |
3/48 | Phi-3.1-mini-4k-instruct-IQ2_M | 3.8B | 1.32 | llamacpp_HF | [link] |
3/48 | Phi-3-mini-4k-instruct-old-IQ2_M | 3.8B | 1.32 | llamacpp_HF | [link] |
3/48 | Phi-3-medium-128k-instruct-IQ1_M | 14B | 3.24 | llamacpp_HF | [link] |
3/48 | TheBloke_Llama-2-7B-GPTQ | 7B | 3.9 | ExLlamav2_HF | |
3/48 | turboderp_dbrx-instruct-exl2_3.75bpw | 132B | 62.83 | ExLlamav2_HF | |
2/48 | Phi-3-mini-4k-instruct-v0.3-IQ2_S | 3.8B | 1.22 | llamacpp_HF | [link] |
2/48 | Meta-Llama-3-8B-Instruct-IQ2_XXS | 8B | 2.4 | llamacpp_HF | |
2/48 | Phi-3-medium-4k-instruct-IQ1_M | 14B | 3.24 | llamacpp_HF | [link] |
2/48 | facebook_galactica-6.7b | 6.7B | 13.72 | Transformers | |
1/48 | Phi-3.1-mini-128k-instruct-IQ2_XS | 3.8B | 1.15 | llamacpp_HF | [link] |
1/48 | Phi-3-mini-4k-instruct-v0.3-IQ2_XS | 3.8B | 1.15 | llamacpp_HF | [link] |
1/48 | Phi-3-mini-4k-instruct-IQ2_XS | 3.8B | 1.15 | llamacpp_HF | [link] |
1/48 | Phi-3-mini-4k-instruct-old-IQ2_XS | 3.8B | 1.15 | llamacpp_HF | [link] |
1/48 | Phi-3.1-mini-128k-instruct-IQ2_S | 3.8B | 1.22 | llamacpp_HF | [link] |
1/48 | Phi-3.1-mini-128k-instruct-IQ2_M | 3.8B | 1.32 | llamacpp_HF | [link] |
1/48 | turboderp_Phi-3-mini-128k-instruct-exl2_2.5bpw | 3.8B | 1.41 | ExLlamav2_HF | |
1/48 | Phi-3-medium-4k-instruct-IQ1_S | 14B | 2.96 | llamacpp_HF | [link] |
1/48 | bartowski_CodeQwen1.5-7B-Chat-exl2_8_0 | 7B | 7.63 | ExLlamav2_HF | |
1/48 | NousResearch_Llama-2-7b-hf | 7B | 13.48 | Transformers | |
1/48 | Qwen_CodeQwen1.5-7B-Chat | 7B | 14.5 | Transformers | |
0/48 | facebook_galactica-125m | 0.125B | 0.25 | Transformers | |
0/48 | openai-community_gpt2 | 0.124B | 0.55 | Transformers | |
0/48 | Phi-3-mini-4k-instruct-IQ1_S | 3.8B | 0.84 | llamacpp_HF | [link] |
0/48 | Phi-3-mini-4k-instruct-old-IQ1_S | 3.8B | 0.84 | llamacpp_HF | [link] |
0/48 | Phi-3-mini-4k-instruct-IQ1_M | 3.8B | 0.92 | llamacpp_HF | [link] |
0/48 | Phi-3-mini-4k-instruct-old-IQ1_M | 3.8B | 0.92 | llamacpp_HF | [link] |
0/48 | Qwen_Qwen2-0.5B-Instruct | 0.5B | 0.99 | Transformers | |
0/48 | h2oai_h2o-danube3-500m-chat | 0.5B | 1.03 | Transformers | |
0/48 | Phi-3-mini-4k-instruct-v0.3-IQ2_M | 3.8B | 1.32 | llamacpp_HF | [link] |
0/48 | openai-community_gpt2-medium | 0.355B | 1.52 | Transformers | |
0/48 | Meta-Llama-3-8B-Instruct-IQ1_S | 8B | 2.02 | llamacpp_HF | |
0/48 | Meta-Llama-3-8B-Instruct-IQ1_M | 8B | 2.16 | llamacpp_HF | |
0/48 | TinyLlama_TinyLlama-1.1B-Chat-v1.0 | 1.1B | 2.2 | Transformers | |
0/48 | ISTA-DASLab_Llama-2-7b-AQLM-2Bit-1x16-hf | 7B | 2.38 | Transformers | |
0/48 | meta-llama_Llama-3.2-1B-Instruct | 1B | 2.47 | Transformers | |
0/48 | facebook_galactica-1.3b | 1.3B | 2.63 | Transformers | |
0/48 | openai-community_gpt2-large | 0.774B | 3.25 | Transformers | |
0/48 | EleutherAI_gpt-neo-1.3B | 1.3B | 5.31 | Transformers | |
0/48 | openai-community_gpt2-xl | 1.5B | 6.43 | Transformers | |
0/48 | EleutherAI_gpt-neo-2.7B | 2.7B | 10.67 | Transformers | |
0/48 | facebook_opt-6.7b | 6.7B | 13.32 | Transformers | |
0/48 | gpt4chan_model_float16 | 6B | 24.21 | Transformers | |
0/48 | EleutherAI_gpt-j-6b | 6B | 24.21 | Transformers | |
0/48 | facebook_opt-13b | 13B | 25.71 | Transformers | |
0/48 | EleutherAI_gpt-neox-20b | 20B | 41.29 | Transformers | |
0/48 | facebook_opt-30b | 30B | 59.95 | Transformers |
Do you find this useful? Consider making a donation.
Updates
2024/11/18
2024/10/24
2024/10/16
2024/10/14
2024/09/27
2024/08/20
2024/08/10
2024/08/08
2024/08/05
2024/08/04
2024/08/01
2024/07/29
2024/07/27
2024/07/25
2024/07/24
2024/07/23
2024/07/22
2024/07/20
2024/07/18
2024/07/15
2024/07/11
2024/07/03
2024/07/02
2024/07/01
2024/06/28
2024/06/26
2024/06/25
2024/05/24
2024/05/23
2024/05/21
2024/05/20
2024/05/19
2024/05/12
2024/05/10
2024/05/07
2024/05/06
2024/05/05
2024/05/04
2024/05/03
2024/04/28
2024/04/27
2024/04/26
2024/04/25
2024/04/24
2024/04/23
About
This test consists of 48 manually written multiple-choice questions. It evaluates a combination of academic knowledge and logical reasoning.
Compared to MMLU, it has the advantage of not being in any training dataset, and the disadvantage of being much smaller. Compared to lmsys chatbot arena, it is harsher on small models like Starling-LM-7B-beta that write nicely formatted replies but don't have much knowledge.
The correct Jinja2 instruction template is used for each model, as autodetected by text-generation-webui from the model's metadata. For base models without a template, Alpaca is used. The questions are evaluated using the /v1/internal/logits endpoint in the project's API.
The questions are private.
Limitations
This benchmark does not evaluate code generation, non-English languages, role-playing, RAG, and long context understanding. The performance in those areas may have a weak or nonexistent correlation with what is being measured.