HN.zip

The Theoretical Limitations of Embedding-Based Retrieval

48 points by fzliu - 1 comments
gdiamos [3 hidden]5 mins ago
Their idea is that capacity of even 4096-wide vectors limits their performance.

Sparse models like BM25 have a huge dimension and thus don’t suffer from this limit, but they don’t capture semantics and can’t follow instructions.

It seems like the holy grail is a sparse semantic model. I wonder how splade would do?