TimeSeries and TimeSeries Collections (Matlab)

From LiteratePrograms

Jump to: navigation, search

MATLAB offers two new object since release R14SP3 (mid 2005):

  • timeseries
  • tscollection

They implement a way to use structured datasets.

Contents

Core objects

TimeSeries

A timeseries is at least defined by:

  • a name
  • a date vector
  • a column of values
Enlarge
Plot of a MATLAB time-series

For instance:

ts1 = timeseries(cumsum(randn(100,1)),(1:100)','name','my first dataset');

To be able to plot such an object, we can use the polymorphic plot function with usual options:

plot(ts1, 'linewidth',2);

Some properties of the timeserie are very simple to access as:

  • the date vector, available through |ts1.time|
  • the vector of values, through |ts1.data|
  • the name, through |ts1.name|

The more generic |get| function allow access to more properties:

>> get(ts1)
             Events: []                       
               Name: 'first'                  
               Data: [100x1 double]           
           DataInfo: [1x1 tsdata.datametadata]
               Time: [100x1 double]           
           TimeInfo: [1x1 tsdata.timemetadata]
            Quality: []                       
        QualityInfo: [1x1 tsdata.qualmetadata]
        IsTimeFirst: true                     
  TreatNaNasMissing: true

TimeSeries Events

A special field called |Events| can be used to store sparse informations about the timeserie.

TSCollection

A TScollection is a collection of synchronized timeseries. It has its own name.

Enlarge
Plot of a MATLAB TSCollection using tsc_plot

To build a TSCollection, you only need some synchronized timeseries, like in the following codeblock:

<<my_first_tscollection>>=
ts1 = timeseries(cumsum(randn(100,1)),   (1:100)','name','one');
ts2 = timeseries(cumsum(randn(100,1)*.5),(1:100)','name','two');
tsc = tscollection({ts1, ts2}, 'Name', 'my first TSCollection');

Unfortunately, the plot function does not work on TSCollections, so we need to write one plotting function:

<<tsc_plot.m>>=
function h = tsc_plot( tsc, varargin)
% TSC_PLOT - plot TSCollection
% example:
%  tsc_plot(tsc, 'linewidth',2)
h = figure;
names  = gettimeseriesnames(tsc);
colors = 'brmckyg';
for n=1:length(names)
   ts = tsc.(names{n});
   plot(tsc.time, ts.data, colors(mod(n,length(colors))), varargin{:});
   hold on
end
hold off
legend(gca, names);
title(tsc.name);

TimeSeries names into a TSCollection

Once timeseries are put into a TSCollection object, its names are translated in a kind of hexadecimal convertion of non std ascii characters. For instance:

>> tscollection(timeseries((1:10)',(1:10)','name','anycharaters(\_)'))
Time Series Collection Object: unnamed
Time vector characteristics
     Start time            1 seconds
     End time              10 seconds
Member Time Series Objects:
     anycharaters0x280x5C_0x29

This is clearly a problem to be able to retrieve your timeseries. It's possible to build a function implementing its translation, and to use it to retrieve timeseries with their orignal names:

<<translate4tsc.m>>=
function z = translate4tsc(op, s)
% TRANSLMATE4TSC - translation in to directions:
%  'anycharaters0x280x5C_0x29' = translate4tsc('to-tsc',  'anycharaters(\_)')
%  'anycharaters(\_)'          = translate4tsc('from-tsc', 'anycharaters0x280x5C_0x29')
switch lower(op)
    case {'20xhex', 'to-tsc'}
        %<* convert to hex
        code2keep = [48:57,65:90,97:122,95];
        t     = double(s);
        t     = t(:);
        ikeep = ismember(t,code2keep);
        iconv = ~ikeep;
        if all(ikeep)
            z = s;
            return
        end
        z = repmat(' ',4,length(s));
        tmp = dec2hex([t(iconv);100]);
        z(3:4,iconv) = tmp(1:end-1,:)';
        z(1,iconv) = '0';
        z(2,iconv) = 'x';
        z(1,ikeep) = s(ikeep);
        z = strrep(z(:)',' ','');
        %>*
    case {'0x2str', 'from-tsc'}
        %<* convert 0x hex to string
        idx = strfind(s, '0x');
        if isempty(idx)
            z = s;
            return
        end
        h = char(hex2dec(s([idx(:)+2,idx(:)+3])));
        s([idx(:);idx(:)+1;idx(:)+2])='_';
        s(idx(:)+3) = h;
        s = strrep(s, '___', '');
        z = s;
        %>*
    otherwise
        error('translate4tsc:InvalidMode','Invalid mode <%s>',op);     
end

This function is not perfect at this stage (problem with spaces into names), so feel free to upgrade it.

Functions

Simple manipulations

TimeSeries

  • getqualitydesc
  • getdatasamplesize
  • Sample manipulations
    • addsample
    • delsample
  • ctranspose
  • detrend
  • filter

TSCollection

  • TimeSeries manipulations
    • addts
    • removets
  • Sample manipulations
    • addsampletocollection
    • delsamplefromcollection

More complex operations

Synchronization

Enlarge
Plot of a MATLAB TSCollection using tsc_plot

Synchronization of timeseries is a critical point. Unfortunately it seems impossible at this stage to synchronize TSCollections. To illustrate this we need to create two timeseries first:

<<create two timeseries>>=
ts1=timeseries(cumsum(randn(100,1)),(1:100)','name','first');
ts2=timeseries(cumsum(randn(51,1)) ,(50:2:150)','name','second');

Then it is possible to try to synchronize them, for instance using the union option:

<<synchronize and plot >>=
[ts1s, ts2s] = synchronize(ts1, ts2,'union');
tsc_plot(tscollection({ts1s, ts2s}), 'linewidth', 2, 'marker', 'o')
hold on; plot(ts1,'.','marker','+','markersize',20)
hold on; plot(ts2,'.r','marker','+','markersize',20)

The plotting options are such that the new values are plotted with o, the old one with the +.

Enlarge
Plot of the synchronization and creation relative performances
Enlarge
Plot of the synchronization prelative performances (Here obtained with MATLAB R14 SP3, with MATLAB 2006a, the first points (100 to 500 points) are around two times fastest (so always far slowest than a self made solution))

The most interesting feature is clearly the synchronization one. Unfortunately, TSCollections cannot be synchronized (it's only available on timeseries) and a self made synchronization is far faster than the MathWorks one. The figure (at left) shows the relative performance (CPU time obtained by tic;toc) of the synchronization of TSCollections of different size versus an equivalent self-made synchronization on simple structures. The figure at right shows the CPU time ratio between built-in and self made synchronizations (blue) and instanciation (green) for some data sizes (size(.,1) on x axe).

The results are clear enough:

  • for instanciation, TSCollection is around 1.500 times slower than a self made structure
  • for synchronization, TSCollection is at leat 60 times slower than a self made one, the ratio for small sizes is very high (around 1.000 times slower), and decreases for largest sizes (around 100 times).



.

Self made TimeSeries equivalent

Because of the inefficiency of the timeseries and TScollection objects, we can try to implement our own equivalents.

Main object

As stated in another article (Swiss army knife MATLAB programs for quantitative finance) we can use a simple structure to store all what wee need:

<<simple_structure_example1.m>>=
data = struct('title', 'my TScollection title', 'value', cumsum(randn(100,3)), 'date', (now-100+1:now)', ...
              'names', {{'column1', 'column2', 'column3'}})

Here is a simple function to build such objects:

<<myTSCobject.m>>=
function data = myTSCobject( varargin)
% MYTSOBJECT - self made efficient TScollection
%  use:
% data = myTSCobject('title', 'my TScollection title', 'value', cumsum(randn(100,3)), ...
%                    'date', (now-100+1:now)', ...
%                    'names', {'column1', 'column2', 'column3'})
data = [];
for f=1:2:length(varargin)-1
   field_name  = varargin{f};
   field_value = varargin{f+1};
   data.(field_name) = field_value;
end
if ~isfield(data,'value') | ~isfield(data,'date') | ~isfield(data,'names') | ~isfield(data,'title')
   error('myTSCobject:field', 'fields <value> <date> <names> <title> mandatory for myTSCobject');
end
[nv,pv] = size(data.value);
[nd,pd] = size(data.date);
[nn,pn] = size(data.names);
if nv ~= nd | pv ~= pn | nn ~= 1 | pd ~= 1
   warning('myTSCobject:check', 'problem with dimension of fields');
end

Main functions

Here we need at least a synchronization function. We will need an interpolation function, it's amazing that the MATLAB timeseries synchronization function does not use the MATLAB interp1 function: why doing twice what have be done once?

<<mySynchro.m>>=
function data0 = mySynchro(data1, data2, varargin)
% MYSYNCHRO - a simple self made synchronization
date0    = union(data1.date, data2.date);
value0_1 = interp1(data1.date, data1.value, date0(:), varargin{:});
value0_2 = interp1(data2.date, data2.value, date0(:), varargin{:});
data0    = myTSCobject('title', 'syncronized dataset', 'date', date0(:), 'value', [value0_1, value0_2], ...
                       'names', {data1.names{:}, data2.names{:}});
Enlarge
plot of data1
Enlarge
plot of data2
Enlarge
plot of data0
Enlarge
plot of data0p

Which can be used like this:

<<simple_structure_example2.m>>=
dt1   = (1:10:200)';
data1 = myTSCobject('title','A','value',[sin(dt1/100*pi),cos(dt1/100*pi)], ...
                   'names',{'sin', 'cos'},'date',dt1,'plot_style','points')
dt2   = (10:1:200)';
data2 = myTSCobject('title','B','value',[(dt2/1000).^2,1./dt2], ...
                   'names',{'x2', '1/x'},'date',dt2,'plot_style','points')
data0  = mySynchro(data1,data2)
data0p = mySynchro(data1,data2,'nearest')

As you can see, we can now use all the |interp1| options.

Download code
Views