Let's try to answer this question by implementing the algorithm to generate gaussian distributed random numbers. There are actually two ways using box-muller transform:
Polar form: Given u and v, independent and uniformly distributed in the closed interval [−1, +1], set s = R2 = u2 + v2. If s = 0 or s ≥ 1, throw u and v away and try another pair (u, v).
After that calculate (u;v)*sqrt -2f*log[s] % s
Let's see the implementation in q:
polarform:{:x#1_raze{ u*sqrt -2f*log[s] % s:{u$u::-1+2?2.0}/[1<;2]}\[`int$ x % 2 ;0] };
Basic form: Given u and v, independent and uniformly distributed on (0, 1], just calculate:
sqrt[-2f*log u]*(sin r;cos r:2f*3.14159*v)
Let's see the implementation in q:
basicform:{hn:`int$ x % 2;pi2:2f*3.14159;
x#raze exec s*f,s*g from select s:sqrt -2.0*log u1,f:cos pi2 * u2,g:sin pi2 * u2 from ([]u1:hn?1.0;u2:hn?1.0)};
Executing the command in q with (value "\\t polarform 1000000";value "\\t basicform 1000000")
i got on my machine: 3277 155.
The implementation in basic form is about 21 times faster. Why is this the case?
Actually i dont know the answer. Most people will say the basic form avoids using scan and over. And this seems to be the bottleneck in the algorithm. Kdb+/Q is fast for problems that can be vectorized.
And is the implementation in basic form really that fast?
In c++ boost library there is a function to generate gaussian random numbers. Let's extend kdb+ with this function:
kx::K randn(kx::K k)
{
kx::vector<kx::qtype::float_> result(kx::value<kx::qtype::int_>(k));
boost::random::normal_distribution<double> dist;
std::generate(result.begin(),result.end(),boost::phoenix::bind(dist,gen));
return result();
}
Now load this function in kb+ with
randn: dll 2: (`$"randn";1)
and compare the speed from the boost library against the basic form using
(value "\\t .bst.randn 1000000";value "\\t basicform 1000000").
The result is 161 160. The basic form is as fast as the function in boost.
Let's increase the number of random numbers to 5 million:
command time
"\\t basicform 5000000" 908
"\\t polarform 5000000" 17595
"\\t .bst.randn 5000000" 795
There is a little improvement using the function from boost.
Can we still improve the speed?
Actually the cuda library thrust has a function to generate gaussian random numbers.
struct nrd : public thrust::unary_function<unsigned int,float>
{
__host__ __device__
float operator()(unsigned int thread_id)
{
unsigned int seed = hash(thread_id);
thrust::minstd_rand rng(seed);
//thrust::default_random_engine rng(seed);
thrust::random::experimental::normal_distribution<float> u(0.0f,1.0f);
return u(rng);
};
};
Now let's extend kdb+ with this function.
As you can see from the table the fastest one is using the function from cuda:
command time
"\\t basicform 5000000" 908
"\\t polarform 5000000" 17595
"\\t .cda.randn 5000000" 596
"\\t .bst.randn 5000000" 795