The required compute seems a bit high: "We trained all CIFAR-10 models on 1xB200 GPU, and all ImageNet 64×64 models on 8xB200 GPUs. The largest CIFAR-10 model uses 20 B200 hours to train, and the largest ImageNet 64×64 model uses 640 B200 hours"
20 B200 hours for CIFAR-10 seems like a lot...
andybak [3 hidden]5 mins ago
When I first learned about computer science at the age of 11 or so (and in 1982 or so) the first page of the text book put digital and analogue computers on what seemed to be an equal footing. And then proceeded to ignore the latter for the rest of the book. Apart from a few notable exceptions ( https://en.wikipedia.org/wiki/Phillips_Machine ) I've often wondered about analogue computing.
jerf [3 hidden]5 mins ago
If you want to understand the issue with analog computers, design a SHA-256 circuit for one of them and consider the consequences of trying to push a megabyte of data through it. While that is an extreme example I chose precisely to make the issues clear, much real computing has many of the same characteristics, just distributed a bit more widely in time and space.
Or, to put it another way, you can make anything sound good if you consider only the positives and anything sound bad if you only consider the negatives. Analog computing sounds amazing when you read the brochure and consider only the positives. But when you bring the negatives back in, it makes sense why it is not frequently used. It is not a case of the mainstream keeping some great idea down because, uh, Big Digital or something, it's a case where digital computing turns out to be a stonking good idea and it's hard for the analog world to compete and it's virtually impossible for them to ever be anything but a niche.
Neural networks are an interesting possibility for a future successful niche, although even so, it would be neural networks specifically that may grow in importance and not analog computing in general. And I still wouldn't guarantee it'll be a good idea... we may have a lot of trouble keeping what would be very deeply nested analog circuitry stable in the real world and digital may still win out, e.g., an analog neural net that has a noticeable personality shift when it gets warmer may not be the best engineering solution. That's a question for 20 or 30 years from now.
ShinyLeftPad [3 hidden]5 mins ago
Is SHA-256 something that makes sense in analog realm, or is it something that only needs to exist due to digital constraints?
jerf [3 hidden]5 mins ago
The use case of "I have a big pile of data and I need to ensure that it survived transmission intact" isn't going anywhere. Just being in an analog computing world would not make the way some data needs to be precise go away. Not everything is a TV signal being consumed by human eyeballs.
But if you want an even better exercise, sure, work out how to send, say, precise stock market transactions where a "$5.04" becoming a "$5.05" is a very big deal to some people who have lots of money, and work out a mechanism for verifying the integrity of a lot of such data efficiently.
Bear in mind "efficiently" in this case includes the idea that "$5.04" and "$5.05" are actually close together, not separated by quite a lot of signal bandwidth. It should be of a similar size to the current digital world where that is a single bit; if you're throwing more bandwidth at your representation you've already lost to digital. Or to put it in analog terms, that needs to be pretty close to the noise barrier already; it's not a solution to make it so that you end up with "$5.04 +/- 0.000001" and "$5.05 +/- 0.000001" as the two signals you send. That is, after all, what digital is in the first place: All signals are analog in the end, and we send 0s and 1s with enough separation that the receiver can then re-amplify them into a 0 or a 1 without loss. It's not really analog if you're not hard up against the noise floor, it's just digital wearing an analog wig.
If analog is supposedly "better" than digital... at least, for the sake of argument, I recognize you did not make that claim... that would include being able to do some of these things that we do in the digital world quite comfortably. If it's just a niche... well, that's exactly where we are now in the world anyhow.
Quite a lot of real-world data is quite digital in nature. This message I'm posting is intrinsically digital. Even if I were to write something by hand, we all know, and knew even before computers, that the essence of that message is captured by a stream of letters. When we read the Gettysburg Address it doesn't even occur to us to worry about the theoretically vast amount of information we lose by not having the handwritten original. While those can be of historical interest we all know the payload is in the digital stream of letters and words. You have probably never worried before about whether the nuances of Hacker News posts are lost because they're typed in a fixed-width font but displayed in a proportional one, because the digital text carries the vast majority of the content. Even in an "analog world" there is no escape from quite a lot of digital-characteristic data. And there is no escape from the many, many issues with truly analog solutions, such as the inability to copy data without loss. This text has undergone literally dozens of copies by the time it gets from my keyboard to you eyes, and that would drive the analog world insane. Either accept much more degradation than any of us are used to, or dedicated much much larger amounts of bandwidth to everything in a way we would find horribly inefficient in our real world.
em3rgent0rdr [3 hidden]5 mins ago
Noise and component imprecision has always limited analog computing.
vintermann [3 hidden]5 mins ago
In neutral networks, we seem to be pushing towards ever lower precision floats, and we use noise for all sorts of useful things.
PaulHoule [3 hidden]5 mins ago
And a general lack of reconfigurability to solve general problems. There’s been interest in analog neural networks for a long time.
Those problems you mention are important in music synthesis where people could live with limited reconfigurability but reliability is at a premium: synth players in early touring bands (e.g. Yes) had to be electronics technicians and instruments have to survive being packed in boxes and transported everywhere. The Yamaha DX-7 made FM synthesis mainstream because digital FM synthesis was absolutely reliable.
DennisP [3 hidden]5 mins ago
Analog synths are a lot more reliable these days though.
vessenes [3 hidden]5 mins ago
Also true: all computing is analog computing.
mdnahas [3 hidden]5 mins ago
My father designed processors. He says all electronics are analog. Some just pretends to act digital.
mikestorrent [3 hidden]5 mins ago
Quantum annealers (D-Wave machines) are basically analogue computers, with Josephson junctions as the primary component as opposed to oscillators. I wonder if they could render these images, too?
seanmcdirmid [3 hidden]5 mins ago
At the end of my undergrad, I remember a UW professor being poached by intel to work on analogue computing research project, the chair of the department at the time said that it was an opportunity that might not ever happen again and he had to take. I don’t think it went anywhere (since I never heard of intel coming out with a product), but I at least knew there was an attempt.
jcims [3 hidden]5 mins ago
Take a look at extropic. AFAICT it's a form of analog computer.
tugdual [3 hidden]5 mins ago
I love this ! Used to work at Rain AI on training neural networks in unconventional hardware - people often that computers don't necessarily have to be electronic digital - there is a whole domain dedicated to creating machines that can apply certain mathematical operations faster or more efficiently than their electronic counter parts. I created this site to try create a classification of that space:
Really interesting - if I understood the article correctly, they're simulating this on conventional hardware, so in order to get the proposed benefits, it would need to be implemented in some other electronic medium.
vessenes [3 hidden]5 mins ago
Very cool. I’m reminded of Wolfram’s pitch that neural nets are a search through the very broad computational complexity of the function space they describe; he did a little work to show that you could find similar behavior in other function spaces. These oscillators are yet again a different function space, and its cool they can be harnessed in this way.
The question of what physical / electronic phenomena is the most efficient yet large enough function space to be used for inference is a really good one to think about. I have no suggestions.
WhitneyLand [3 hidden]5 mins ago
It’s not clear to me how this would ever be practical since it seems dependent on n^2 scaling.
You’ve got to wonder when you have an image generation demo why would you possibly have 64 x 64 pixel output as your demo?
If I’m understanding this properly to generate a 4K image, you need like 5 trillion point to point connections on the chip. Even if power use from the oscillators is zero that’s going to be an issue.
anigbrowl [3 hidden]5 mins ago
Yes I too am perplexed. I'm into audio synthesis so I feel I have somewhat better-than-average knowledge of oscillators, from the component or elementary mathematical level (depending on whether they're analog or digital) to complex interactions for fun and profit (frequency, phase, ring modulation).
These are cool results but I was disappointed not to find any discussion of where oscillator array technology stands today what the manufacturing challenges/opportunities might be. It seems like it would be prohibitively expensive for anything beyond minimal networks of a few hundred nodes that could be used in sensors. Even if you have perfectly consistent oscillators that synchronize to each other within very fine tolerances, wiring them up to each other is still a massive headache.
itishappy [3 hidden]5 mins ago
I bet 5 million coupled oscillators, all slightly detuned, would sound freakin' amazing.
What they are trying to achieve is to demonstrate that the coupling approach works in a simulated physics environment (O(n^2) as you point out) so that they can then build CMOS circuits that create actual oscillators and then let the laws of physics do the computation. This is a very bold vision!
ttul [3 hidden]5 mins ago
And anyone who has done an introductory course in VLSI design would know that capacitance (coupling) is something you usually want to get rid of. However, all kinds of amazing analog circuits have been developed over the decades that exploit coupling effects. So, their idea is not outlandish at all.
fc417fc802 [3 hidden]5 mins ago
But wouldn't capacitance as it naturally occurs be only to immediate neighbors? Not n^2 as in their model.
WhitneyLand [3 hidden]5 mins ago
Which idea is not outlandish? Physical computing? I agree physical computing is a fascinating topic.
But specifically what they’ve simulated here? I don’t see how that would ever work in real life scaled up to any kind of real size.
I’m not criticizing them for starting out small. Lots of things can be proven with small models. I’m saying in principle, I don’t see how this will work unless there’s some fundamentally new technique that is currently not known about. Maybe they have some secret idea but they haven’t shown it here.
fc417fc802 [3 hidden]5 mins ago
At 5k to 10k nodes aren't they in the ballpark of a single layer from a scaled up conventional model? Rather than scaling further presumably you could stack these. However for a physical implementation ~100M interconnects seems questionable to me (but I know next to nothing about hardware engineering TBF) so I wonder if they intend to move to a partially connected model similar to the gyroscopic model of computation that the article links to.
fluoridation [3 hidden]5 mins ago
Doesn't that require quadratically-many wires to connect all the processing units?
kannanvijayan [3 hidden]5 mins ago
I read through the article, and I'm not sure this is dependent on quadratic scaling.
Are they allowing all oscillators to influence all others, or are they picking modalities where the influences can be limited to some maximal fixed degree?
One would imagine that there'd be a variety of different topologies available to explore. Even if during training the treatment was fully connected, one could imagine the training itself biasing towards a maximal fixed degree per oscillator, and then inference later operating on a quantized version of that that drops the low-weight influences to zero.
fc417fc802 [3 hidden]5 mins ago
The oscillating elements don't map directly to pixels. Conventional models also have n^2 parameters.
WhitneyLand [3 hidden]5 mins ago
Well image generators work differently…
Do you mean that they may get away with less oscillators because of the decoder layer? Well there’s the rub isn’t it, the more work you have done by a software layer the less power you’ve proportionally saved by having it be done by physical computing.
But let’s spitball here what would you estimate would be needed in number of oscillators and interconnects for a 4K image?
fc417fc802 [3 hidden]5 mins ago
Conventional image generators still have to process n^2 connections so I don't think that observation is a valid objection in and of itself.
One thing I'm unclear on is that their total parameter count scales similarly to conventional models but many of those conventional models incorporate convolutions. I wonder how interconnect count (as opposed to unique parameters) compares to performance?
As to 4k images, I'm not clear how much farther their current architecture would be expected to scale. Single layer networks aren't parameter efficient compared to deep networks; I'd naively assume that to also apply here. That said given their results so far with what amounts to a single layer the naive assumption begins to seem questionable.
the8472 [3 hidden]5 mins ago
Think of the models making progress on CIFAR-10, ImageNet, CelebA, etc. 15 years ago. They had issues too and weren't just scaled-up as is to the architectures we have today.
ainch [3 hidden]5 mins ago
This method is cool and the post explains it well. It would, however, be good to get more detail on the energy efficiency they flag as their motivation: is this model actually more energy efficient than the comparators they highlight?
fc417fc802 [3 hidden]5 mins ago
It seems like total parameter count is more or less on par with conventional approaches so any gains won't be from there.
We can implement coupled oscillators in hardware but are the couplings and frequencies programmable? If they're being streamed in I guess you'd still have a memory bandwidth bottleneck and associated energy usage. If not then the fair comparison is to a conventional model hardcoded in an ASIC which AFAIU is actually quite energy efficient.
alfiedotwtf [3 hidden]5 mins ago
Do the parameters in these harmonic systems compress better? Instead of needing to hold individual parameters for each oscillator, could groupings of oscillators be instead be described with its output over a given time and then just reverse that output to get the original parameters (I’m thinking the output is like an FFT of the oscillators which is a single value, then do an inverse FFT to get the original oscillator parameters etc)
dimatura [3 hidden]5 mins ago
Very cool work - refreshing to see a of different approach. I learned about Kuramoto oscillators many years ago from a book called Sync, by Steven Strogatz, which I highly recommend.
italiansolider [3 hidden]5 mins ago
Readers care, this requires a nice amount of physics knowledge to really understand. Not too advanced but still, physics.
NopIdoN [3 hidden]5 mins ago
> However, the trade-off with our approach is that it requires a more complex loss that operates given only generated samples.
foax [3 hidden]5 mins ago
This kind of reminds me of DCT in lossy image compression, but in reverse.
_def [3 hidden]5 mins ago
Not at all related but still reminds me a bit of FM synthesis
fusionadvocate [3 hidden]5 mins ago
Is this somewhat related to reservoir computing?
fc417fc802 [3 hidden]5 mins ago
(Disclaimer, not my area of expertise.) It appears to be adjacent but more general. There's an entire collection of methods (including reservoir computing) that conceptually resemble or are based on physical systems in one way or another. This appears to be an attempt to develop a new method that natively takes place as a physical process that we could readily implement in hardware.
OutOfHere [3 hidden]5 mins ago
Can this even make an image having more than one "class"? Can it make an image of an astronaut riding a horse on the moon?
vessenes [3 hidden]5 mins ago
Yes, I had the same question. I don’t think so, as currently designed. It trains to specific points / classes in an embedding space. They didn’t discuss how one might go to non-trained points in the paper as far as I could read, and they did show some visualization around the idea that the runs aim at / around set points in the space.
luciana1u [3 hidden]5 mins ago
finally, a way to generate images that's slower AND worse. progress.
20 B200 hours for CIFAR-10 seems like a lot...
Or, to put it another way, you can make anything sound good if you consider only the positives and anything sound bad if you only consider the negatives. Analog computing sounds amazing when you read the brochure and consider only the positives. But when you bring the negatives back in, it makes sense why it is not frequently used. It is not a case of the mainstream keeping some great idea down because, uh, Big Digital or something, it's a case where digital computing turns out to be a stonking good idea and it's hard for the analog world to compete and it's virtually impossible for them to ever be anything but a niche.
Neural networks are an interesting possibility for a future successful niche, although even so, it would be neural networks specifically that may grow in importance and not analog computing in general. And I still wouldn't guarantee it'll be a good idea... we may have a lot of trouble keeping what would be very deeply nested analog circuitry stable in the real world and digital may still win out, e.g., an analog neural net that has a noticeable personality shift when it gets warmer may not be the best engineering solution. That's a question for 20 or 30 years from now.
But if you want an even better exercise, sure, work out how to send, say, precise stock market transactions where a "$5.04" becoming a "$5.05" is a very big deal to some people who have lots of money, and work out a mechanism for verifying the integrity of a lot of such data efficiently.
Bear in mind "efficiently" in this case includes the idea that "$5.04" and "$5.05" are actually close together, not separated by quite a lot of signal bandwidth. It should be of a similar size to the current digital world where that is a single bit; if you're throwing more bandwidth at your representation you've already lost to digital. Or to put it in analog terms, that needs to be pretty close to the noise barrier already; it's not a solution to make it so that you end up with "$5.04 +/- 0.000001" and "$5.05 +/- 0.000001" as the two signals you send. That is, after all, what digital is in the first place: All signals are analog in the end, and we send 0s and 1s with enough separation that the receiver can then re-amplify them into a 0 or a 1 without loss. It's not really analog if you're not hard up against the noise floor, it's just digital wearing an analog wig.
If analog is supposedly "better" than digital... at least, for the sake of argument, I recognize you did not make that claim... that would include being able to do some of these things that we do in the digital world quite comfortably. If it's just a niche... well, that's exactly where we are now in the world anyhow.
Quite a lot of real-world data is quite digital in nature. This message I'm posting is intrinsically digital. Even if I were to write something by hand, we all know, and knew even before computers, that the essence of that message is captured by a stream of letters. When we read the Gettysburg Address it doesn't even occur to us to worry about the theoretically vast amount of information we lose by not having the handwritten original. While those can be of historical interest we all know the payload is in the digital stream of letters and words. You have probably never worried before about whether the nuances of Hacker News posts are lost because they're typed in a fixed-width font but displayed in a proportional one, because the digital text carries the vast majority of the content. Even in an "analog world" there is no escape from quite a lot of digital-characteristic data. And there is no escape from the many, many issues with truly analog solutions, such as the inability to copy data without loss. This text has undergone literally dozens of copies by the time it gets from my keyboard to you eyes, and that would drive the analog world insane. Either accept much more degradation than any of us are used to, or dedicated much much larger amounts of bandwidth to everything in a way we would find horribly inefficient in our real world.
Those problems you mention are important in music synthesis where people could live with limited reconfigurability but reliability is at a premium: synth players in early touring bands (e.g. Yes) had to be electronics technicians and instruments have to survive being packed in boxes and transported everywhere. The Yamaha DX-7 made FM synthesis mainstream because digital FM synthesis was absolutely reliable.
https://computers.tugdual.fr/
The question of what physical / electronic phenomena is the most efficient yet large enough function space to be used for inference is a really good one to think about. I have no suggestions.
You’ve got to wonder when you have an image generation demo why would you possibly have 64 x 64 pixel output as your demo?
If I’m understanding this properly to generate a 4K image, you need like 5 trillion point to point connections on the chip. Even if power use from the oscillators is zero that’s going to be an issue.
These are cool results but I was disappointed not to find any discussion of where oscillator array technology stands today what the manufacturing challenges/opportunities might be. It seems like it would be prohibitively expensive for anything beyond minimal networks of a few hundred nodes that could be used in sensors. Even if you have perfectly consistent oscillators that synchronize to each other within very fine tolerances, wiring them up to each other is still a massive headache.
But specifically what they’ve simulated here? I don’t see how that would ever work in real life scaled up to any kind of real size.
I’m not criticizing them for starting out small. Lots of things can be proven with small models. I’m saying in principle, I don’t see how this will work unless there’s some fundamentally new technique that is currently not known about. Maybe they have some secret idea but they haven’t shown it here.
Are they allowing all oscillators to influence all others, or are they picking modalities where the influences can be limited to some maximal fixed degree?
One would imagine that there'd be a variety of different topologies available to explore. Even if during training the treatment was fully connected, one could imagine the training itself biasing towards a maximal fixed degree per oscillator, and then inference later operating on a quantized version of that that drops the low-weight influences to zero.
Do you mean that they may get away with less oscillators because of the decoder layer? Well there’s the rub isn’t it, the more work you have done by a software layer the less power you’ve proportionally saved by having it be done by physical computing.
But let’s spitball here what would you estimate would be needed in number of oscillators and interconnects for a 4K image?
One thing I'm unclear on is that their total parameter count scales similarly to conventional models but many of those conventional models incorporate convolutions. I wonder how interconnect count (as opposed to unique parameters) compares to performance?
As to 4k images, I'm not clear how much farther their current architecture would be expected to scale. Single layer networks aren't parameter efficient compared to deep networks; I'd naively assume that to also apply here. That said given their results so far with what amounts to a single layer the naive assumption begins to seem questionable.
We can implement coupled oscillators in hardware but are the couplings and frequencies programmable? If they're being streamed in I guess you'd still have a memory bandwidth bottleneck and associated energy usage. If not then the fair comparison is to a conventional model hardcoded in an ASIC which AFAIU is actually quite energy efficient.