HN.zip

KVarN: Native vLLM KV-cache quantization back end by Huawei

30 points by theanonymousone - 4 comments
throwa356262 [3 hidden]5 mins ago
Better performance than TQ and better quality than FP16?

Am I reading this right??

v3ss0n [3 hidden]5 mins ago
Why this is not a PR for vLLM ?
esafak [3 hidden]5 mins ago
It's the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It's easier now while the divergence is low, so don't wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.
jmalicki [3 hidden]5 mins ago
And with the help of AI, pointing at AI at this paper and saying "making a vLLM PR from this paper" tends to work surprisingly well, even if you need to nudge it a little bit along the way.