Will a Kolmogorov-Arnold Network based model achieve SOTA on some significant machine learning tasks in 2024?
➕
Plus
32
Ṁ2641
Dec 31
15%
chance

https://arxiv.org/abs/2404.19756

https://github.com/KindXiaoming/pykan?tab=readme-ov-file

https://news.ycombinator.com/item?id=40219205

This sounds big to me.

I'm not sure exactly how to set the criteria. Perhaps based on https://paperswithcode.com/sota?

Someone who knows more about ML benchmarks please chime in.

Get
Ṁ1,000
and
S1.00
Sort by:

These KANs are "just" rebranded symbolic regressors with nonparametric functions taking the place of a bag of functions like "exp", "cos", and so on. The paper is a master class in doing this well, and it is super fun to read. So, the results are not super surprising. We get very high expressive power and good interp, but also the usual pitfalls---this is much more complex than a bunch of matmult, i.e., slow.

They approximate the functions with Bernstein polynomials. These polynomials are defined as a recursion of non-linear functions. Stable algorithms like De Casteljau's run in O(k^2) where k is the degree of the polynomial. So, the action of just one neuron hides a lot of complexity already. And you have to pay the costs both at inference and training time. It is not clear that this can scale easily.

On the bright side, there's lots of work on using splines for computer graphics, so perhaps something can be adapted relatively easily. Or maybe a huge lookup table might do the trick.

Alternatively, someone could swap out splines for something better behaved, like piecewise linear approximations.

PS: I created a market operationalizing ubiquity/SOTA differently, as significant adoption at NeurIPS by 2027 https://manifold.markets/jgyou/will-kolmogorovarnold-networks-kan

Case in point here's someone pointing out how a piecewise linear approximation gives you an MLP back, and thus good scaling: https://twitter.com/bozavlado/status/1787376558484709691

They did not even do experiment on MNIST, bruh

bought Ṁ20 NO

My first impression is, "Universal Approximation Theorem means the fancy learnable function can be replaced by a bunch of simple non-linearity nodes (e.g. ReLU) to give the same result, possibly more efficiently in real life application when considering the additional complexity of the spline thingy". But I may be missing something.

bought Ṁ10 YES

From the author: "although we have to be honest that KANs are slower to train due to their learnable activation functions"

https://twitter.com/ZimingLiu11/status/1785489312563487072?t=oDCxKpId2MY3pfe7O84NBA&s=19from th

bought Ṁ30 NO

Looks like a cool idea