TimeSeries and TimeSeries Collections (Matlab)
From LiteratePrograms
MATLAB offers two new object since release R14SP3 (mid 2005):
- timeseries
- tscollection
They implement a way to use structured datasets.
Contents |
Core objects
TimeSeries
A timeseries is at least defined by:
- a name
- a date vector
- a column of values
For instance:
ts1 = timeseries(cumsum(randn(100,1)),(1:100)','name','my first dataset');
To be able to plot such an object, we can use the polymorphic plot function with usual options:
plot(ts1, 'linewidth',2);
Some properties of the timeserie are very simple to access as:
- the date vector, available through |ts1.time|
- the vector of values, through |ts1.data|
- the name, through |ts1.name|
The more generic |get| function allow access to more properties:
>> get(ts1) Events: [] Name: 'first' Data: [100x1 double] DataInfo: [1x1 tsdata.datametadata] Time: [100x1 double] TimeInfo: [1x1 tsdata.timemetadata] Quality: [] QualityInfo: [1x1 tsdata.qualmetadata] IsTimeFirst: true TreatNaNasMissing: true
TimeSeries Events
A special field called |Events| can be used to store sparse informations about the timeserie.
TSCollection
A TScollection is a collection of synchronized timeseries. It has its own name.
To build a TSCollection, you only need some synchronized timeseries, like in the following codeblock:
<<my_first_tscollection>>= ts1 = timeseries(cumsum(randn(100,1)), (1:100)','name','one'); ts2 = timeseries(cumsum(randn(100,1)*.5),(1:100)','name','two'); tsc = tscollection({ts1, ts2}, 'Name', 'my first TSCollection');
Unfortunately, the plot function does not work on TSCollections, so we need to write one plotting function:
<<tsc_plot.m>>= function h = tsc_plot( tsc, varargin) % TSC_PLOT - plot TSCollection % example: % tsc_plot(tsc, 'linewidth',2) h = figure; names = gettimeseriesnames(tsc); colors = 'brmckyg'; for n=1:length(names) ts = tsc.(names{n}); plot(tsc.time, ts.data, colors(mod(n,length(colors))), varargin{:}); hold on end hold off legend(gca, names); title(tsc.name);
TimeSeries names into a TSCollection
Once timeseries are put into a TSCollection object, its names are translated in a kind of hexadecimal convertion of non std ascii characters. For instance:
>> tscollection(timeseries((1:10)',(1:10)','name','anycharaters(\_)')) Time Series Collection Object: unnamed Time vector characteristics Start time 1 seconds End time 10 seconds Member Time Series Objects: anycharaters0x280x5C_0x29
This is clearly a problem to be able to retrieve your timeseries. It's possible to build a function implementing its translation, and to use it to retrieve timeseries with their orignal names:
<<translate4tsc.m>>= function z = translate4tsc(op, s) % TRANSLMATE4TSC - translation in to directions: % 'anycharaters0x280x5C_0x29' = translate4tsc('to-tsc', 'anycharaters(\_)') % 'anycharaters(\_)' = translate4tsc('from-tsc', 'anycharaters0x280x5C_0x29') switch lower(op) case {'20xhex', 'to-tsc'} %<* convert to hex code2keep = [48:57,65:90,97:122,95]; t = double(s); t = t(:); ikeep = ismember(t,code2keep); iconv = ~ikeep; if all(ikeep) z = s; return end z = repmat(' ',4,length(s)); tmp = dec2hex([t(iconv);100]); z(3:4,iconv) = tmp(1:end-1,:)'; z(1,iconv) = '0'; z(2,iconv) = 'x'; z(1,ikeep) = s(ikeep); z = strrep(z(:)',' ',''); %>* case {'0x2str', 'from-tsc'} %<* convert 0x hex to string idx = strfind(s, '0x'); if isempty(idx) z = s; return end h = char(hex2dec(s([idx(:)+2,idx(:)+3]))); s([idx(:);idx(:)+1;idx(:)+2])='_'; s(idx(:)+3) = h; s = strrep(s, '___', ''); z = s; %>* otherwise error('translate4tsc:InvalidMode','Invalid mode <%s>',op); end
This function is not perfect at this stage (problem with spaces into names), so feel free to upgrade it.
Functions
Simple manipulations
TimeSeries
- getqualitydesc
- getdatasamplesize
- Sample manipulations
- addsample
- delsample
- ctranspose
- detrend
- filter
TSCollection
- TimeSeries manipulations
- addts
- removets
- Sample manipulations
- addsampletocollection
- delsamplefromcollection
More complex operations
Synchronization
Synchronization of timeseries is a critical point. Unfortunately it seems impossible at this stage to synchronize TSCollections. To illustrate this we need to create two timeseries first:
<<create two timeseries>>= ts1=timeseries(cumsum(randn(100,1)),(1:100)','name','first'); ts2=timeseries(cumsum(randn(51,1)) ,(50:2:150)','name','second');
Then it is possible to try to synchronize them, for instance using the union option:
<<synchronize and plot >>= [ts1s, ts2s] = synchronize(ts1, ts2,'union'); tsc_plot(tscollection({ts1s, ts2s}), 'linewidth', 2, 'marker', 'o') hold on; plot(ts1,'.','marker','+','markersize',20) hold on; plot(ts2,'.r','marker','+','markersize',20)
The plotting options are such that the new values are plotted with o, the old one with the +.
The most interesting feature is clearly the synchronization one. Unfortunately, TSCollections cannot be synchronized (it's only available on timeseries) and a self made synchronization is far faster than the MathWorks one. The figure (at left) shows the relative performance (CPU time obtained by tic;toc) of the synchronization of TSCollections of different size versus an equivalent self-made synchronization on simple structures. The figure at right shows the CPU time ratio between built-in and self made synchronizations (blue) and instanciation (green) for some data sizes (size(.,1) on x axe).
The results are clear enough:
- for instanciation, TSCollection is around 1.500 times slower than a self made structure
- for synchronization, TSCollection is at leat 60 times slower than a self made one, the ratio for small sizes is very high (around 1.000 times slower), and decreases for largest sizes (around 100 times).
.
Self made TimeSeries equivalent
Because of the inefficiency of the timeseries and TScollection objects, we can try to implement our own equivalents.
Main object
As stated in another article (Swiss army knife MATLAB programs for quantitative finance) we can use a simple structure to store all what wee need:
<<simple_structure_example1.m>>= data = struct('title', 'my TScollection title', 'value', cumsum(randn(100,3)), 'date', (now-100+1:now)', ... 'names', {{'column1', 'column2', 'column3'}})
Here is a simple function to build such objects:
<<myTSCobject.m>>= function data = myTSCobject( varargin) % MYTSOBJECT - self made efficient TScollection % use: % data = myTSCobject('title', 'my TScollection title', 'value', cumsum(randn(100,3)), ... % 'date', (now-100+1:now)', ... % 'names', {'column1', 'column2', 'column3'}) data = []; for f=1:2:length(varargin)-1 field_name = varargin{f}; field_value = varargin{f+1}; data.(field_name) = field_value; end if ~isfield(data,'value') | ~isfield(data,'date') | ~isfield(data,'names') | ~isfield(data,'title') error('myTSCobject:field', 'fields <value> <date> <names> <title> mandatory for myTSCobject'); end [nv,pv] = size(data.value); [nd,pd] = size(data.date); [nn,pn] = size(data.names); if nv ~= nd | pv ~= pn | nn ~= 1 | pd ~= 1 warning('myTSCobject:check', 'problem with dimension of fields'); end
Main functions
Here we need at least a synchronization function. We will need an interpolation function, it's amazing that the MATLAB timeseries synchronization function does not use the MATLAB interp1 function: why doing twice what have be done once?
<<mySynchro.m>>= function data0 = mySynchro(data1, data2, varargin) % MYSYNCHRO - a simple self made synchronization date0 = union(data1.date, data2.date); value0_1 = interp1(data1.date, data1.value, date0(:), varargin{:}); value0_2 = interp1(data2.date, data2.value, date0(:), varargin{:}); data0 = myTSCobject('title', 'syncronized dataset', 'date', date0(:), 'value', [value0_1, value0_2], ... 'names', {data1.names{:}, data2.names{:}});
Which can be used like this:
<<simple_structure_example2.m>>= dt1 = (1:10:200)'; data1 = myTSCobject('title','A','value',[sin(dt1/100*pi),cos(dt1/100*pi)], ... 'names',{'sin', 'cos'},'date',dt1,'plot_style','points') dt2 = (10:1:200)'; data2 = myTSCobject('title','B','value',[(dt2/1000).^2,1./dt2], ... 'names',{'x2', '1/x'},'date',dt2,'plot_style','points') data0 = mySynchro(data1,data2) data0p = mySynchro(data1,data2,'nearest')
As you can see, we can now use all the |interp1| options.
Download code |