High Frequency Finance
From LiteratePrograms
Investment strategies are called "high frequency" when is uses the whole available data rather than sampled ones. Of course this means that intra day data are used (opposite is extra day data).
Where to find data
If extra day data are easily downloadable on the internet (see for instance yahoo finance), it is far more difficult to find intra day data. Some exchanges put the data of the day downloadable on their web site, like here EuroNext.
<<read_euronext>>= row_data = urlread( [ 'http://www.euronext.com/tools/datacentre/' ... 'dataCentreDownloadExcell.jcsv?' ... 'quote=on&volume=on&lan=EN&cha=2593&time=on&' ... 'selectedMep=1&isinCode=FR0000133308&typeDownload=1&' ... 'format=xls&indexCompo=']); parsed_data = regexp( row_data, [ '(?<date>[0-9][0-9]:[0-9][0-9]:[0-9][0-9])' ... '[^0-9]+(?<id>[0-9]+)[^0-9]+(?<price>[0-9]+.[0-9]+)[^0-9]+(?<volume>[0-9]+)' ], 'names') data = struct('title', 'France Telecom the 17th of July', ... 'value', [cellfun(@(x)str2num(x), {parsed_data.id})', ... cellfun(@(x)str2num(x), {parsed_data.price})', cellfun(@(x)str2num(x), {parsed_data.volume})'], ... 'date', datenum( {parsed_data.date}, 'HH:MM:SS'), ... 'colnames', {{ 'id', 'Price', 'Volume'}}) <<plot_data>>= dts = mod(data.date,1)/datenum(0,0,0, 1,0,0); figure; a1=subplot(4,1,1:3); plot( dts, data.value(:,2), '.k'); ax = axis; axis([min(dts) max(dts) ax(3:4)]); ylabel(data.colnames{2}); title( data.title) a2=subplot(4,1,4); stem( dts, data.value(:,3),'.','linewidth',2); ylabel('Volume') linkaxes([a1,a2],'x')
This MATLAB codeblock shows how to read high freq data from the EuroNext web site and store them is a structured data format.
Volatility estimation
To capture the complexity of intra day volatility estimation, we first need some code to sample data according to any chosen frequency:
<<sampling>>= [data.date, idx] = sort(data.date); data.value = data.value(idx,:); dts = mod(data.date,1)/datenum(0,0,0, 1,0,0); dt_step = 1/6; sample = dts(1):dt_step:dts(end); idx_from = arrayfun(@(d)find(dts>d,1),sample); figure; plot( dts, data.value(:,2), '.k'); ax = axis; axis([min(dts) max(dts) ax(3:4)]); ylabel(data.colnames{2}); title( data.title) hold on stem( sample, data.value(idx_from,2), 'or'); hold off
Changing the sampling rate
The important point is to observe that the volatility estimate goes to infinity when we use more and more data (decrease the time step):
- the less number of point we use and the more variance we have on the estimator (right part)
- when we use more data, the noise decrease
- but when we really want to use all points (left part) the volatility increase rapidly
This is a well known theoretical result (see Jacod, Mykland, Ait Sahalia, Zhang and others).
<<volatility_estimated>>= f_sampled=@(dt_step)data.value(arrayfun(@(d)find(dts>d,1),dts(1):dt_step:dts(end)),2); sampling = [1:240]/60; volatilities = repmat(nan, length(sampling),1); for s=1:length(sampling) S = f_sampled( sampling(s)); volatilities(s)=100*std( diff(S)./S(2:end))/sqrt(sampling(s))*sqrt(8.5*256); end figure; plot( sampling, volatilities, 'k', sampling, volatilities, '.k') xlabel('Time step (hours)'); ylabel('Volatility (empirical)'); title( data.title)
Download code |