JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
fast floating point sqrt() | Reply
I'm developing a program that computes real square roots for over 50% of time. Does anyone knows a way to optimize this operation, if it's possible?
Using gcc, of course.
Re: fast floating point sqrt() (response to post by bhackerozzo) | Reply
Edit: Sorry, my link was targeted towards PPC architecture specifically, I didn't notice that before.
Re: fast floating point sqrt() (response to post by bhackerozzo) | Reply
How accurate do they have to be, and what sort of range do you have to compute it for? I.e. does it have to be as close as possible, and work for any double you can throw at it?
I'm just wondering about the feasability of lookup table + interpolation, which is a common speed up for cos/sin calculation
Re: fast floating point sqrt() (response to post by bhackerozzo) | Reply
You can use Newton's method. If you want sqrt(x) then each succesive guess can be be improved by averaging it with x/guess. Here is some untested code:

double guess=1	//you can start with any number here (except 0)
for (int i=1; i<=ITERATIONS; i++)
	guess=(guess+x/guess)/2;


To increase accuracy you simply increase ITERATIONS. Although from memory this converges fast (quadratically?).
Re: fast floating point sqrt() (response to post by sql_lall) | Reply
More precisely I should calculate power 1.5 of reals in range 0.0-5.0
These are my conclusions:
1) sqrt(x*x*x) is quite fast
2) pow(x,1.5) is TOO slow
3) lookup table is faster than first method, interpolation requires addictional operations, that make it slow. much better increasing table size.
4) newton method is fine, can converge in 4-5 cycles, but is imprecise for low inputs (i.e. <0.01).

Can we improve 4) playing with bits in order to start with a better guess? for example, halving exponent, or what?
Re: fast floating point sqrt() (response to post by bhackerozzo) | Reply
1) How does it compare to x * sqrt(x)?
3) What sort of interpolation? You should be able to implement something with only a few extra adds / subtractions and probably a multiplication / division, and it'll get relatively close.
{hmm...i'll try and come up with something, but it may take some thought}
4) What are you starting with for the guess, & how precise do you need?
I.e. guess = i would possibly be a better start than guess = 1, or you might be able to add a conditional first to get a better guess (or even use a lookup for a guess).
Maybe scaling up by 4 then dividing by the answer by two could help for lower values.

Oh, and here's some code from a game making manual I have...runs correct to within 5%, apparently:
float Fast_Sqrt(float f){
    float result;
    _asm {
        mov eax, f
        sub eax, 0x3f800000
        sar eax, 1
        add eax, 0x3f800000
        mov result, eax
    }
    return result;
}
This'll have to be modified for doubles I guess, but I'm not sure what exactly as I donm't know enough asm.
Re: fast floating point sqrt() (response to post by sql_lall) | Reply
x*sqrt(x) is *slightly* fast and of course precise.
your asm code is very quick and has a mean error of about 2.4%, then it's a good start point to apply newton method.
I wrote these lines:

float fast_sqrt(float d) {
	float guess;
	asm (
		"movl %0, %%eax;"
		"subl $0x3f800000, %%eax;"
		"sar  $1, %%eax;"
		"addl $0x3f800000, %%eax;"
		"movl %%eax, %1;"
		:"=r"(guess)
		:"r"(d)
		:"eax"
	);
	guess=(guess+d/guess)/2;
	guess=(guess+d/guess)/2;
	guess=(guess+d/guess)/2;
	return guess;
}


asm is fast and with only three iterations it's very precise.
But those 3 lines are a real performance disaster!
Do i have to compute these iterations in asm too?

(thanks for any help)
Re: fast floating point sqrt() (response to post by bhackerozzo) | Reply
I don't think you can do it faster than processor, because it uses the same method internally. Most probably it also has some table for initial values to reduce number of required iterations.
The only condition you can do it faster is if you don't need full precision and do less iterations.
Re: fast floating point sqrt() (response to post by venco) | Reply
Finally, i solved the problem, learning and using SSE instructions.
These are the benchmarks on my centrino laptop (expressed in clock cycles, mean values).
//13000 cycles (!!)
powf(f,0.5)
 
//63 cycles
sqrtf(f)
 
//30 cycles, precise as sqrtf
inline float fast_sqrt(float f) {
	asm("movss  (%%eax), %%xmm0;"
	    "sqrtss  %%xmm0, %%xmm0;"
	    "movss   %%xmm0, (%%eax);"
	    ::"a"(&f):"xmm0","memory");
	return f;
}
 
//5 cycles (11 bits precision)
inline float very_fast_sqrt(float f) {
	asm("movss  (%%eax), %%xmm0;"
	    "rsqrtss %%xmm0, %%xmm0;"
	    "rcpss   %%xmm0, %%xmm0;"
	    "movss   %%xmm0, (%%eax);"
	    ::"a"(&f):"xmm0","memory");
	return f;
}

Playing with parallel instructions, it's also possible to do up to four of operations as sqrt at the *same* time. Just what i needed.
I hope this will be useful for everybody!
Re: fast floating point sqrt() (response to post by bhackerozzo) | Reply
Ignore post, my idea won't be that fast.
RSS