oobabooga blog
Tags
  • Blog
  • Tags
    • A formula that predicts GGUF VRAM usage from GPU layers and context length
    • A detailed comparison between GPTQ, AWQ, EXL2, q4_K_M, q4_K_S, and load_in_4bit: perplexity, VRAM, speed, model size, and loading time.
    • A direct comparison between llama.cpp, AutoGPTQ, ExLlama, and transformers perplexities

Tags

#perplexity

#perplexity

  • A direct comparison between llama.cpp, AutoGPTQ, ExLlama, and transformers perplexities    7/14/23, 12:00 AM
  • Made with Material for MkDocs