Dienstag, 11. Oktober 2011

Ist KDB+/Q really that fast?

Let's try to answer this question by implementing the algorithm to generate gaussian distributed random numbers. There are actually two ways using box-muller transform:

Polar form: Given u and v, independent and uniformly distributed in the closed interval [−1, +1], set s = R2 = u2 + v2. If s = 0 or s ≥ 1, throw u and v away and try another pair (uv).
After that calculate (u;v)*sqrt -2f*log[s] % s
Let's see the implementation in q: 
polarform:{:x#1_raze{ u*sqrt -2f*log[s] % s:{u$u::-1+2?2.0}/[1<;2]}\[`int$ x % 2 ;0] };

Basic form: Given u and v, independent and uniformly distributed on (0, 1], just calculate:
sqrt[-2f*log u]*(sin r;cos r:2f*3.14159*v)
Let's see the implementation in q: 
basicform:{hn:`int$ x % 2;pi2:2f*3.14159;
                   x#raze exec  s*f,s*g from select s:sqrt -2.0*log u1,f:cos pi2 * u2,g:sin pi2 * u2 from ([]u1:hn?1.0;u2:hn?1.0)};

Executing the command in q with (value "\\t polarform 1000000";value "\\t basicform 1000000")
i got on my machine: 3277 155.

The implementation in basic form is about 21 times faster. Why is this the case?
Actually i dont know the answer. Most people will say the basic form avoids using scan and over. And this seems to be the bottleneck in the algorithm. Kdb+/Q is fast for problems that can be vectorized.

And is the implementation in basic form really that fast?
In c++ boost library there is a function to generate gaussian random numbers. Let's extend kdb+ with this function:

kx::K randn(kx::K k)
{
    kx::vector<kx::qtype::float_> result(kx::value<kx::qtype::int_>(k));
    boost::random::normal_distribution<double> dist;
    std::generate(result.begin(),result.end(),boost::phoenix::bind(dist,gen));

    return result();
}

Now load this function in kb+ with

randn: dll 2: (`$"randn";1)

and compare the speed from the boost library against the basic form using
(value "\\t .bst.randn 1000000";value "\\t basicform 1000000").
The result is 161 160. The basic form is as fast as the function in boost.

Let's increase the number of random numbers to 5 million:

command    time
"\\t basicform 5000000"    908
"\\t polarform 5000000"    17595
"\\t .bst.randn 5000000"    795

There is a little improvement using the function from boost. 
Can we still improve the speed?
Actually the cuda library thrust has a function to generate gaussian random numbers. 


struct nrd : public thrust::unary_function<unsigned int,float>
{

  __host__ __device__
  float operator()(unsigned int thread_id)
  {
    unsigned int seed = hash(thread_id);
    thrust::minstd_rand rng(seed);
    //thrust::default_random_engine rng(seed);
    thrust::random::experimental::normal_distribution<float> u(0.0f,1.0f);

    return u(rng);
  };


};



Now let's extend kdb+ with this function.


As you can see from the table the fastest one is using the function from cuda:


command    time
"
\\t basicform 5000000"    908
"
\\t polarform 5000000"    17595
"
\\t .cda.randn 5000000"    596
"
\\t .bst.randn 5000000"    795
 

Keine Kommentare:

Kommentar veröffentlichen