Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

Re: What is acceptable for -ffast-math? (2001)

$
0
0
Linus Torvalds - Re: What is acceptable for -ffast-math? (Was: associative law incombine)

This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


  • To: <dewar at gnat dot com>
  • Subject: Re: What is acceptable for -ffast-math? (Was: associative law incombine)
  • From: Linus Torvalds <torvalds at transmeta dot com>
  • Date: Tue, 31 Jul 2001 15:50:28 -0700 (PDT)
  • cc: <gdr at codesourcery dot com>, <fjh at cs dot mu dot oz dot au>, <gcc at gcc dot gnu dot org>, <moshier at moshier dot ne dot mediaone dot net>, <tprince at computer dot org>


On Tue, 31 Jul 2001 dewar@gnat.com wrote:
>> Well it sure would be nice to here from some of these mythical numerical> programmers (I don't care if they are writing games or nuclear reactor codes)> who would be happier, so far we haven't heard this! And in my experience,> even quite inexperienced floating-point numerical programmers are very> disturbed when optimization changes the results of their programs.

I used -ffast-math myself, when I worked on the quake3 port to Linux (it's
been five years, how time flies).

It didn't make much difference at that point, because the x86 had
hand-written assembly, and gcc for the alpha didn't do much (anything?)
with -ffast-math.

But I tried _everything_. The main FP work I did on that thing on the
alpha improved the framerate by about 50% on alpha - FP was _that_
critical for it.  Most of it was by looking at gcc output and trying to
re-organize the C code to make it be better (because gcc didn't do much on
its own).

And yes, it was exactly things like multiplying by reciprocals.

> > Your arguments about "numerical computation" are just silly, as you don't> > seem to realize that there are tons of problems where your theoretical> > issues are nothing more than noise.>> If you think the arguments are silly, then I really fear you lack the full> context for this discussion, a discussion that has, as you should know raged> for well over thirty years.

Most of the kind of "numercial" work that you seem to be talking about has
probably rather little to do with FP performance. Most of the traditional
heavy FP code tends to be _much_ more about cache layout and good memory
access patterns.

I'm personally aquainted with tryign to make a game engine go fast, where
the memory effects are fewer, and the FP itself is the bottleneck.

> Sure -ffast-math is precisely intended to allow transformations that would> not otherwise be allowed (let's not call them optimizations, that's just> too controversial a word in the context of this argument).

Why not call them optimizations? They are. The only thing we change is the
boundary of valid ranges.

> The question is what is the boundary of allowable transformations. No one> agrees that there should be no boundaries (even you don't like the change> results to zero, though note that abandoning denormals has exactly this> effect, and might be considered acceptable).

Oh, round-to-zero is definitely acceptable in the world of "who cares
about IEEE, we want fast math, and we'll use fixed arithmetic if the FP
code is too slow".

In fact, it is _so_ acceptable that CPU designers design for it. Look at
MMX2, and wonder why they have a RTZ mode? Because it makes the _hardware_
go faster.

That should tell you something. Big companies that have billion-dollar
fabs spend time optimizing their chips that take several years to design
for _games_. Not for IEEE traditional Fortran-kind math.

But apparently some gcc developers don't think that is even a worthy
market, because you just want to do fluid dynamics.

> So, what is the boundary, can one for instance forget about denormals and> flush to zero to save a bit of time, can one truncate instead of round,> can one ignore negative zeroes, or infinity semantics, can one ignore> intermediate overflow (note: nearly all the discussed transformations are> implicitly saying yes to this last question).

Do them all by default with -ffast-math.

Then, you can have specific flags for people who want just _one_
optimization. I doubt you'll find many users who do that, but maybe I'm
wrong. Giving people the choice is always a good idea.

> I have not seen anyone writing from the point of view of serious numerical> coding saying [ .. ]

There you go again. What the hell do you call "serious numerical coding"?

Take a look at the computer game market today. It's a lot more serious
than most matematicians puttering around in their labs, let me tell you.
That's a BIG industry.

Also note that _nobody_ in your kind of "serious numerical coding"
community would ever worry about "-ffast-math" in the first place. Why the
hell would they, when 99% of the time it doesn't make any difference at
all. The people you apparently consider serious are a lot more interested
in fast communication (so that they can solve the thing in parallell) and
incredible memory bandwidth.

I doubt you'll find many of your "serious numerical coding" people who
would even _notice_ the raw FP throughput. Look at SpecFP - CPU's are fast
enough, it spends most of its time waiting on memory.

> Should -ffast-math allow full precision operation? I would think so,> since it definitely improves performance, and reduces surprises.

Ehh.. gcc right now allows full precision operation BY DEFAULT. No
-ffast-math required.

Same goes for negative zero as far as I remember - on some HP-PA stuff at
least. Simply because it was too painful to get "right" on their earlier
hardware.

In short, what you seem to argue that -ffast-math should means are all
things gcc _already_ does, with no

> By the way, I said I would be shocked to find a Fortran compiler that did> associative redistribution in the absence of parens. I am somewhat surprised> that no one stepped forward with a counter-example, but I suspect in fact that> there may not be any shocking Fortran implementations around.

I would suspect that very few people have Fortran compilers around and
bother to check it.

> It is an old argument, the one that says that fpt is approximate, so why bother> to be persnickety about it. Seymour Cray always tool this viewpoint, and it> did not bother him that 81.0/3.0 did not give exactly 27.0 on the CDC 6000> class machines.

..and he was universally respected for making the fastest machines around.

What you forget to mention is that these days it's so _cheap_ to get
IEEE, that from a hardware standpoint pretty much everybody includes it
anyway.

But they then often throw it away because it ends up having expensive
run-time issues (alpha with exception handling and proper denormals, Intel
with special RTZ modes etc).

Why? Because in many areas Seymour Cray is _still_ right. The thing that
killed off non-IEEE was not that he was wrong, but the fact that _some_
people do need IEEE "exact FP". Not everybody. Not even the majority. But
because some people do need it, you need to support it. Which is why
everybody does, today.

But do you see the difference between

  "We have to support it because a portion of the user base has to have
   it, and if we don't have it we won't be able to sell to any of that
   user base"

and

  "Everybody must use it, because anything else is wrong"

Eh?

Do you see that Seymours approach didn't fail because he was always wrong?
It failed because he was _sometimes_ wrong.

And you know what? He was right enough of the time to have built up an
empire for a while. That's something not everybody can say about
themselves. And that is something that you should respect.

			Linus



Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>