- I've converted Qwen QVQ to EXL2 format and uploaded the 4.65bpw version. 32K context with 4-bit cache in less than 48 GB VRAM. Benchmarks are still running. Looking forward to find out how it compares to QwQ which was the best local model in my recent mass benchmark. huggingface.co/wolfram/QVQ-...
Dec 26, 2024 00:10