Dienstag, 11. Oktober 2011

Ist KDB+/Q really that fast?

Let's try to answer this question by implementing the algorithm to generate gaussian distributed random numbers. There are actually two ways using box-muller transform:

Polar form: Given u and v, independent and uniformly distributed in the closed interval [−1, +1], set s = R2 = u2 + v2. If s = 0 or s ≥ 1, throw u and v away and try another pair (uv).
After that calculate (u;v)*sqrt -2f*log[s] % s
Let's see the implementation in q: 
polarform:{:x#1_raze{ u*sqrt -2f*log[s] % s:{u$u::-1+2?2.0}/[1<;2]}\[`int$ x % 2 ;0] };

Basic form: Given u and v, independent and uniformly distributed on (0, 1], just calculate:
sqrt[-2f*log u]*(sin r;cos r:2f*3.14159*v)
Let's see the implementation in q: 
basicform:{hn:`int$ x % 2;pi2:2f*3.14159;
                   x#raze exec  s*f,s*g from select s:sqrt -2.0*log u1,f:cos pi2 * u2,g:sin pi2 * u2 from ([]u1:hn?1.0;u2:hn?1.0)};

Executing the command in q with (value "\\t polarform 1000000";value "\\t basicform 1000000")
i got on my machine: 3277 155.

The implementation in basic form is about 21 times faster. Why is this the case?
Actually i dont know the answer. Most people will say the basic form avoids using scan and over. And this seems to be the bottleneck in the algorithm. Kdb+/Q is fast for problems that can be vectorized.

And is the implementation in basic form really that fast?
In c++ boost library there is a function to generate gaussian random numbers. Let's extend kdb+ with this function:

kx::K randn(kx::K k)
{
    kx::vector<kx::qtype::float_> result(kx::value<kx::qtype::int_>(k));
    boost::random::normal_distribution<double> dist;
    std::generate(result.begin(),result.end(),boost::phoenix::bind(dist,gen));

    return result();
}

Now load this function in kb+ with

randn: dll 2: (`$"randn";1)

and compare the speed from the boost library against the basic form using
(value "\\t .bst.randn 1000000";value "\\t basicform 1000000").
The result is 161 160. The basic form is as fast as the function in boost.

Let's increase the number of random numbers to 5 million:

command    time
"\\t basicform 5000000"    908
"\\t polarform 5000000"    17595
"\\t .bst.randn 5000000"    795

There is a little improvement using the function from boost. 
Can we still improve the speed?
Actually the cuda library thrust has a function to generate gaussian random numbers. 


struct nrd : public thrust::unary_function<unsigned int,float>
{

  __host__ __device__
  float operator()(unsigned int thread_id)
  {
    unsigned int seed = hash(thread_id);
    thrust::minstd_rand rng(seed);
    //thrust::default_random_engine rng(seed);
    thrust::random::experimental::normal_distribution<float> u(0.0f,1.0f);

    return u(rng);
  };


};



Now let's extend kdb+ with this function.


As you can see from the table the fastest one is using the function from cuda:


command    time
"
\\t basicform 5000000"    908
"
\\t polarform 5000000"    17595
"
\\t .cda.randn 5000000"    596
"
\\t .bst.randn 5000000"    795
 

Sonntag, 4. September 2011

kdb+ and option pricing/hedging

Hedging short call option Problem

Background

Imagine that there are four guys ( christoph,tsvetan, laziz,kim) shorting a call option. Now they want to hedge their risks. Let's see what their strategies are and try to compare their risks.

Christ does nothing and will just pay out the difference between the strike price and the underlying value at the expiration date.

Tsvetan will buy the underlying at the beginning and close his position at expiration date.

Lazis use a so called stoploss-strategy. On a daily basis he will buy the underlying when it is above the strike price and close the position when it is below the strike price.

Kim pretends to be smart. On a daily basis he calculates the delta of the option and holds the position according to this number.

Assume further that the
  1. spot is 100,
  2. strike is 100,
  3. maturity is 1.0 ( This refers usually to one year),
  4. vola is 0.25,
  5. risk-free-rate is 0.01.
We will use kdb+ to analyse the four strategies. For this purpose i implemented several functions in Q and the code can be downloaded from here. All the functions are in the namespace .ql.

Black scholes formula:
All four guys will receive the option value. To know what they receive at the beginning we need to 
implement the black scholes formula in KDB+.
Using the function .ql.bls will do the job.



We can see from the first colum that the option value is 10.4035392. This is the money they receive at the beginning.

Gaussian distributed random numbers
We will use geometric brownian motion to simulate the underlying process. To simulate geometric brownian motion we need gaussian distributed random numbers.
Using the function .ql.randn will do the job.

 Here you can see the typical shape of the gaussian distribution.


Numerical solution of stochastic differential equation
Now we have gaussian random numbers. We need  to implement the numerical solution of the geometric brownian motion. The function .ql.paths will do the job.

Here you can see the typical shape of the lognormal distribution. 


Christ's strategy
We will simulate  100000 paths, and each path will have 251 points. Since in a year there are 251 trading days.
 
 The frequency table shows that 53243 from 100000 cases christ does not need to pay anything. The underlying was above the strike price. Now let's plot the cumulative distribution function. 
 
It shows that the probability that Christ will loose more than 47.1 over one year of trading is 5%.


Tsvetan's Strategy
Tsvetans's strategy will generate  a negative cashflow at the beginning. 
Let's take a look at the cumulative distribution function.
 
Here you can see that Tsvetans has a lower value at risk.


Lazis's Strategy
 
On the first day Lazis buys the underlying since it is above the strike price. On the second day Lazis close his position, since it is below the strike price. Then the asset stays under the strike price until the 14 days. There he is long the asset again.
 
Lazis's var is 25.1.

Kim's Strategy
On a daily basis Kim calculates the delta according to the underlying price. He adjusts his position according to the delta.
 
The frequency plot shows that the cost is concentrated on the option value.
 
Kim's var is 11.3.