So 2000 fps, 16 million pixels, and 7000 operations per pixel, works out to 224 TFLOPS.
An RTX 4090 is advertised as being able to compute 82 TFLOPS.
Where do you think the extra is coming from? Is it just straightforward optimisations like constant folding? Or do you think the compiler is noticing that 1000 iterations of the loop doesn't change the answer and optimising it down to just 1 loop?
montag [3 hidden]5 mins ago
Impressive. Did you do all this in the past 2 hours?
kevmo314 [3 hidden]5 mins ago
Yeah, it took me an hour-ish to get the performance and then an hour-ish to figure out how to actually render it correctly. I had never seen the P2/P5 PGM format before so I didn't realize P5 files were binary. :)
> (Spoilers: it also allocates 60+ GB of RAM for intermediate results, so consider reducing the image size before running it on your own machine)
The input algorithm is essentially written in SSA form, and it's easy to analyze when each "register" stops being used; then you can drop the arrays after their last use. Turns out the number of live registers never exceeds 160 at any point in time, and with this one addition the max memory usage drops to barely above 1GB.
Jtsummers [3 hidden]5 mins ago
I don't know why you are downvoted (at the time of this writing). That's a very good point and makes the simple Python solution feasible on many more computers (due to reduced memory footprint). It's also a straightforward pass (once you know what liveness analysis is). Since there are no loops it's just one pass starting from the end and working backwards to determine whether a variable is live or not, and then an extra check on each iteration to decide whether to delete particular arrays.
tekknolagi [3 hidden]5 mins ago
I think getting downvoted (not by me) because if you scroll to the end of the post, that's the very first submission (by me).
Jtsummers [3 hidden]5 mins ago
Ah, fair. I didn't catch that.
edflsafoiewq [3 hidden]5 mins ago
Converted it to GLSL that can be dropped into shadertoy.com/new. I get about 15FPS at 714x401.
> If you evaluate the expression with (x,y) values in the ±1 square, then color pixels black or white based on their sign
I do not understand what this means at all, particularly the bit before the comma.
Can someone explain it differently?
cjs_ac [3 hidden]5 mins ago
Yeah, this took me a while. Basically, the entire file represents a sort of assembly language for a function. You need to evaluate that entire function for each pixel in the output image.
Each line starts with a line number (with a leading underscore). This is followed by an instruction, and zero, one, or two arguments. The arguments can be either floating point constants or integers (with leading underscores) which represent the results of those corresponding lines earlier in the function.
The x and y coordinates of each pixel are loaded in the assembly language using the var-x and var-y instructions.
jstanley [3 hidden]5 mins ago
Break down the range of x values from -1 to 1, and y values from -1 to 1, each into 1024 parts (or whatever resolution you want to plot at). Evaluate the function for each (x,y) coordinate. If the resulting number is negative colour the pixel white, otherwise colour it black.
Ohh.. "the +/- 1 square" means a square of size 2 centred around the origin? That was way too terse for me.
iamwil [3 hidden]5 mins ago
For a given (x, y), you color the pixel black if you're outside of a shape, and color the pixel white if you're inside the shape.
The math formula is a way to describe the contour of the shape. To tell if you're inside or outside of a shape, you plug in your (x, y) coordinate into the math formula, and if the output is > 0, then you're outside. If the output is < 0, then you're inside.
I think they call it a signed distance field. You should watch his talk that explain it. It's really great.
Can anyone explain where this blob of "assembly language" comes from? In practice, the author presumably with the desired output image and backed into this esoteric format, but I'm not following how.
What is considered an acceptable preprocessing or transformation? In the limit case, a static executable that computes nothing and outputs a constant, literal image probably violates the intention of the question.
jstanley [3 hidden]5 mins ago
> In practice, the author presumably with the desired output image and backed into this esoteric format, but I'm not following how.
I actually don't think this is how he did it. You might notice the font is kind of weird.
This is really fun, I'm getting badly nerd sniped by this.
Titan2189 [3 hidden]5 mins ago
It's been 4 hours already. How are you doing? Didn't forget to eat and drink? Just checking in on you :)
James_K [3 hidden]5 mins ago
Some of these solutions seem to copy the .vm file and rewrite it in another programming language. Is this permitted? Seems like cheating to not count the time spent compiling and transforming that code in the runtime of the program.
IshKebab [3 hidden]5 mins ago
Hash the input, if it matches the expected input output the expected image. Otherwise error.
Technically correct.
ainiriand [3 hidden]5 mins ago
This is why we can't have nice things.
mkeeter [3 hidden]5 mins ago
Quoth the article: "Obviously, you could precompute the entire image, but that's against the spirit of the challenge"
crazygringo [3 hidden]5 mins ago
Why waste time hashing?
Just output the expected image in all cases.
Still technically correct. ;)
zomglings [3 hidden]5 mins ago
Can you explain how you produce the opcodes from a text/image?
edflsafoiewq [3 hidden]5 mins ago
Looks SDF-ish. My guess is convert the letters to curves, cut them into pieces without holes so each piece is the set of points between certain curves, write down SDFs for each curve (negative on the "inside", positive on the "outside"), combine all SDFs in a piece with intersection (max), then combine all pieces with union (min).
iamwil [3 hidden]5 mins ago
the text contains a math expression. The math expression can be interpreted as an abstract syntax tree. The AST can also be translated into bytecode and interpreted by a stack-based virtual machine.
I'm thinking that we can parse the input into an expression tree, and then for each x coordinate, have each node in the tree tell us which intervals of y coordinates it would return a value < 0 (or <= 0 without loss of generality). Since we're basically dealing with real numbers we can pretend true 0 doesn't exist.
Composing these intervals is trivial for all operations except addition/subtraction. (For example, `mul` is less than 0 if exactly one of its arguments is less than 0, `neg` is less than 0 if its argument is not less than 0, `min` less than 0 if either of its arguments is less than 0, etc.).
And then we define add(a, b) as sub(a, neg(b)). So then we only need to work out which y values sub(a, b) will return a negative result for.
sub(a,b) is negative when a<b.
We can have the sub() node asks its 2 child nodes which values they would return a negative result for. Every y coordinate where a gives a negative result and b gives a positive result is a y coordinate where sub(a,b) is negative. Every y coordinate where a is positive and b is negative is positive. For the remaining y values I think we have to actually evaluate the children and find out which ones give a negative result.
Obviously memoise the evaluation so that we don't repeat any work, and turn the expression tree into a DAG by combining any identical nodes. Some subtrees won't contain var-x, so the memoisation of those nodes can persist across different x coordinates.
And then for each x coordinate we ask the expression tree to tell us which y coordinates give negative outputs, and plot those. It's possible that the idea would generalise to 2-dimensional intervals, not sure.
I haven't implemented this yet but I'm planning to try it out, but to be honest I don't expect it to be faster than Matt's GPU version based on recursive subdivision with interval arithmetic. Btw his talk is great https://www.youtube.com/watch?v=UxGxsGnbyJ4&t=80s
And, a second fun challenge would be to reverse engineer the input file to extract the font! I expect the file is basically a union of x/y translations of functions that plot individual letters, so it shouldn't be crazy hard to split out the top-level union, then split-out the x/y translations, and then collect the letters. It's possible that it's more complicated though. In particular, is each character translated in x/y individually, or is each character translated only in x to form each line of text, and then the line as a whole is translated in y? Or something weirder?
6stringmerc [3 hidden]5 mins ago
Whoa cool invocation of the magician Prospero! I recently adapted “The Tempest” as a screenplay while in jail and typed it up:
Thanks so much for writing this up and sharing. It sounds like you've been through a horrific few years, but your perseverance in creativity is inspiring and fascinating. I wish you well in your efforts to make a better life for yourself, and I hope your work finds a wider audience.
neverokay [3 hidden]5 mins ago
What compelled you in jail? I’m asking if there was more to it than just being bored.
An RTX 4090 is advertised as being able to compute 82 TFLOPS.
Where do you think the extra is coming from? Is it just straightforward optimisations like constant folding? Or do you think the compiler is noticing that 1000 iterations of the loop doesn't change the answer and optimising it down to just 1 loop?
The input algorithm is essentially written in SSA form, and it's easy to analyze when each "register" stops being used; then you can drop the arrays after their last use. Turns out the number of live registers never exceeds 160 at any point in time, and with this one addition the max memory usage drops to barely above 1GB.
https://pastebin.com/jBwNrfim
I do not understand what this means at all, particularly the bit before the comma.
Can someone explain it differently?
Each line starts with a line number (with a leading underscore). This is followed by an instruction, and zero, one, or two arguments. The arguments can be either floating point constants or integers (with leading underscores) which represent the results of those corresponding lines earlier in the function.
The x and y coordinates of each pixel are loaded in the assembly language using the var-x and var-y instructions.
The expression is (probably) a 2d signed-distance function - https://en.wikipedia.org/wiki/Signed_distance_function
The math formula is a way to describe the contour of the shape. To tell if you're inside or outside of a shape, you plug in your (x, y) coordinate into the math formula, and if the output is > 0, then you're outside. If the output is < 0, then you're inside.
I think they call it a signed distance field. You should watch his talk that explain it. It's really great.
https://www.youtube.com/watch?v=UxGxsGnbyJ4
What is considered an acceptable preprocessing or transformation? In the limit case, a static executable that computes nothing and outputs a constant, literal image probably violates the intention of the question.
I actually don't think this is how he did it. You might notice the font is kind of weird.
I think he started with an SDF font and created the "assembly language" from that. https://en.m.wikipedia.org/wiki/Signed_distance_function
Technically correct.
Just output the expected image in all cases.
Still technically correct. ;)
The op-codes are where you do that translation from an AST into bytecode. The entire 2nd half of the crafting interpreters book will guide you through how it's done. https://craftinginterpreters.com/a-bytecode-virtual-machine....
Composing these intervals is trivial for all operations except addition/subtraction. (For example, `mul` is less than 0 if exactly one of its arguments is less than 0, `neg` is less than 0 if its argument is not less than 0, `min` less than 0 if either of its arguments is less than 0, etc.).
And then we define add(a, b) as sub(a, neg(b)). So then we only need to work out which y values sub(a, b) will return a negative result for.
sub(a,b) is negative when a<b.
We can have the sub() node asks its 2 child nodes which values they would return a negative result for. Every y coordinate where a gives a negative result and b gives a positive result is a y coordinate where sub(a,b) is negative. Every y coordinate where a is positive and b is negative is positive. For the remaining y values I think we have to actually evaluate the children and find out which ones give a negative result.
Obviously memoise the evaluation so that we don't repeat any work, and turn the expression tree into a DAG by combining any identical nodes. Some subtrees won't contain var-x, so the memoisation of those nodes can persist across different x coordinates.
And then for each x coordinate we ask the expression tree to tell us which y coordinates give negative outputs, and plot those. It's possible that the idea would generalise to 2-dimensional intervals, not sure.
I haven't implemented this yet but I'm planning to try it out, but to be honest I don't expect it to be faster than Matt's GPU version based on recursive subdivision with interval arithmetic. Btw his talk is great https://www.youtube.com/watch?v=UxGxsGnbyJ4&t=80s
And, a second fun challenge would be to reverse engineer the input file to extract the font! I expect the file is basically a union of x/y translations of functions that plot individual letters, so it shouldn't be crazy hard to split out the top-level union, then split-out the x/y translations, and then collect the letters. It's possible that it's more complicated though. In particular, is each character translated in x/y individually, or is each character translated only in x to form each line of text, and then the line as a whole is translated in y? Or something weirder?
https://samhenrycliff.medium.com/adapting-shakespeares-the-t...