Blog
A detailed comparison between GPTQ, AWQ, EXL2, q4_K_M, q4_K_S, and load_in_4bit: perplexity, VRAM, speed, model size, and loading time.
Published at: 10/24/23, 7:24 PM
A direct comparison between llama.cpp, AutoGPTQ, ExLlama, and transformers perplexities
Published at: 7/14/23, 12:00 AM
Total 2 posts.