floats/ #8

utterances-bot · 2022-05-12T09:21:22Z

Converting Integers to Floats Using Hyperfocus - Mara's Blog

A few years ago, due to some random chain of events, I ended up implementing a conversion from 128 bit integers to 64 bit floats. This would’ve turned out to be a complete waste of time, except that my final version is faster than the builtin conversion every compiler I tested. In this blog post, I’ll explain what happened, how floats work, how this conversion works, and how it got a bit out of hand.

https://blog.m-ou.se/floats/

Dushistov · 2022-05-12T09:21:23Z

A concrete example: the value 9.75 in binary looks like 1001.11,
so it is represented as a f64 with an exponent of 1026,

exponent for 9.75 is just 4(10), or 100(2), where 1026 come from?

m-ou-se · 2022-05-12T09:36:12Z

exponent for 9.75 is just 4(10), or 100(2), where 1026 come from?

@Dushistov

If you check the table with exponents:

   :   \
1026    1mmm.mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
1025     1mm.mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
1024      1m.mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
1023       1.mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
1022       0.1mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
1021       0.01mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
1020       0.001mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
   :            \

you'll see the row with 1026 matches the binary represenation of 9.75:

1026    1mmm.mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
        1001.11000… (=0.75)

Another way of saying it, is that 9.75 is 1001.11 = 1.00111 × 2³, and 3 + 1023 = 1026.

SUPERCILEX · 2022-05-13T07:20:41Z

Can these be turned back into compiler built-ins? Or is there a reason not to do that?

lilly-lizard · 2022-05-13T08:29:02Z

what a gorgeous project! I can see why you put so much time into it, lots of unholy fun here.

I found that your final code is missing a bracket for the mantissa calculation and fails my tests as a result :'(

let m = a + (b - (b >> 63 & !a)) >> 63;

should be:

let m = a + ((b - (b >> 63 & !a)) >> 63);

because + has stronger precedence than >>

(although I personally prefer the following:)

let m = ((a) + (((b) - (((b) >> (63)) & (!a))) >> (63)));

m-ou-se · 2022-05-13T08:32:18Z

@lilly-lizard Oh oops. Those disappeared while editing this blog post somehow. Fixed. :)

m-ou-se · 2022-05-13T08:34:41Z

Can these be turned back into compiler built-ins? Or is there a reason not to do that?

@SUPERCILEX Yeah, I opened a PR for the compiler_builtins crate, two years ago, but got distracted and never finished it. It should probably also end up in Clang's compiler-rt and Gcc's libgcc, though, for it to be useful on all platforms.

SUPERCILEX · 2022-05-13T17:14:00Z

Haha, makes sense!

scips · 2022-05-15T05:55:08Z

I don´t know how I ended reading your blog on a Sunday morning but reading you gave me a great refreshing kick on how one person can share something that will speed up the work of millions.
Thanks for sharing! And I love the small notes of humor all along the journey. I was almost feeling what you felt with the "Boom" 💥, the "Uh", the "wait"... and I was thinking on how you wrote your post while revisiting your journey.
Sharing is caring!
Have a nice day!

DavidHarper · 2022-05-21T09:27:34Z

I'm reminded of some code that I wrote more than 30 years ago when I needed to convert a tape full of binary data from IBM VM/370 native 64-bit float to IEEE-754 64-bit float, without loss of any accuracy. The VM/370 used a base-16 exponent, and therefore (unlike IEEE-754) no implied leading 1. I could have written a FORTRAN program to dump the data in decimal format on the IBM and read it back on the target machine, but that seemed way too easy, and I was young and cocky, so I read the IBM assembler manual and got to work. I still have that code. Here's a gist of it, but good luck finding a machine that will run it: https://gist.github.com/DavidHarper/ba20799c6b72d6131f263316d53723fb

steven-wheeler · 2022-06-02T05:02:17Z

This reminds me of dealing with the 32-bit float format about 25 years ago, when I was trying to implement as much floating-point as I could fit into a BASIC interpreter running on a PIC16C74 processor.

lowasser · 2022-06-04T22:42:26Z

Enjoyed this a lot! Once upon a time I did something similar for Java's BigInteger.doubleValue(): https://bugs.openjdk.java.net/browse/JDK-7131192. But you managed to make yours branch-free, and much smoother and more compact! Very nice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

floats/ #8

floats/ #8

utterances-bot commented May 12, 2022

Dushistov commented May 12, 2022

m-ou-se commented May 12, 2022

SUPERCILEX commented May 13, 2022

lilly-lizard commented May 13, 2022

m-ou-se commented May 13, 2022

m-ou-se commented May 13, 2022

SUPERCILEX commented May 13, 2022

scips commented May 15, 2022

DavidHarper commented May 21, 2022

steven-wheeler commented Jun 2, 2022

lowasser commented Jun 4, 2022

floats/ #8

floats/ #8

Comments

utterances-bot commented May 12, 2022

Converting Integers to Floats Using Hyperfocus - Mara's Blog

Dushistov commented May 12, 2022

m-ou-se commented May 12, 2022

SUPERCILEX commented May 13, 2022

lilly-lizard commented May 13, 2022

m-ou-se commented May 13, 2022

m-ou-se commented May 13, 2022

SUPERCILEX commented May 13, 2022

scips commented May 15, 2022

DavidHarper commented May 21, 2022

steven-wheeler commented Jun 2, 2022

lowasser commented Jun 4, 2022