Introduction

TREE Data Server: Introduction and Overview


This document gives an overview of the TREE Data Server system,
including brief specifications, a discussion of the system architecture,
and project status.


Specifications


The TREE Data Server system provides for capture of real-time financial data
from one or several datafeed services. Tick-level data can be captured for
equities, indexes, futures, and options. Incoming data is translated into
a common internal format that is the same across all supported datafeeds.
Live data is made available to client applications via TCP/IP connections;
therefore, the clients can run on the same machine as the datafeed parser
or different machine(s). Collected data is also stored into a historical
database.


The system has extremely good real-time performance; it copes easily with
a high-speed datafeed, even on machines that are small and slow by current
standards.


The data server code is intended to run on most Unix platforms.
See tree/doc/INSTALL for supported platforms. Client applications
can run on anything that supports TCP networking and NFS.


The code SSS released as open source includes the following
components:


  • Data capture/parsing daemon

  • Data parsing modules for S&P Comstock and Reuters Selectfeed services in (VERSION 1.0.0 ONLY)

  • Data server daemon for sending live data to client applications

  • Database updater daemon for filtering collected data and storing it
    into the permanent database

  • Sophisticated self-tuning tick filter for removing erroneous ticks

  • Elementary data import/export programs and other administrative tools
    for the tick database

Where'd the name TREE come from, anyway?


TREE (Trade Research and Execution Environment) is the project name
SSS used for the trading support software. SSS chose to release the data server
and database components of the project as open source, because these
components needed more development effort than SSS could spare for them.


Project status as of February, 2008


The SSS code base has been updated to support the following platforms
and datafeed:

  • Platforms(32-bit):
    • Linux x86 (Ubuntu 7.10)
    • MacOSX PPC (10.4.11)
    • Windows XPSP2 x86 (MINGW port see tree/doc/ib.notes)
  • Datafeed: Real-time tick(snapshot) data from Interactive Brokers TWS API
    • VERSIONS >= 1.1.0.x ONLY

See tree/README, tree/doc/ib.notes and tree/doc/TODO.


Project status as of March, 2000


SSS has been using this code in-house for about five years, so the core
components are pretty thoroughly debugged and useful. The administrative
tools are fairly crude and could use a lot more effort --- for example,
enabling collection of a new futures series is not as simple as one could
wish. Better documentation is badly needed as well.


We expect that the initial thrust of open-source development will be on
smoothing out portability glitches (as the code is used on platforms we
haven't tried), interfacing to new datafeed services, and design of a
widely useful client-side interface library. (Our own client-side code
is tightly integrated into applications that we don't intend to release,
and probably wouldn't be very useful to other developers anyway.)


Further down the road, things that we hope to see developed in the
open-source project include:


  • Collection of "fundamental data", such as corporate earnings, on datafeeds
    that transmit such data.

  • Capture of news stories from feeds that transmit financial news, and
    development of code for storing and presenting news.

  • Interfacing to industry-standard APIs such as TIB.

System architecture overview


A running TREE system contains the following components:


  1. Master data capture/parsing daemon. This process receives all incoming
    data from the datafeed source(s). Since it could not be taken down without
    losing data, SSS ran it 24x7, with planned outages on weekends when they needed to
    update the software. In the latest VERSION >=1.1.0.x, the parser
    runs in non-daemon mode by default allowing the parser to be taken up and down
    flushing all tick buffers to disk. See tree/doc/INSTALL for further details.
    The parser keeps track of which instruments data is being collected for, and
    discards data for uninteresting instruments. The parser stores interesting
    ticks into a "tick holding area", and also transmits them to any active data
    server daemons that have requested data for the particular instrument.

  2. Data server daemons. An instance of the server process is fired up
    (typically from the standard Unix daemon inetd, In the latest VERSION >=1.1.0.x
    the server can be started manually from the command line. see tree/doc/INSTALL.)
    whenever a client application connects to the data server system. The server
    daemon exists mainly to insulate the parser from direct contact with client apps,
    to ensure that a misbehaving client won't interfere with parser activity.

  3. Database update daemon. The updater is launched once a day (typically
    late at night, local time) by the master data capture daemon. The updater
    retrieves the ticks collected today from the "holding area", applies a tick
    filtering algorithm to eliminate bad data, and stores the cleaned data into
    the permanent database. The updater also constructs and stores daily-bar
    data, as well as intraday-bar data if wanted.
    In the latest VERSION >=1.1.0.x the updater can be run manually once at the end
    of the day after the parser has been shutdown to update the database.

  4. Periodic maintenance tasks. These are launched from cron on an appropriate
    schedule. Originally SSS had just one such task, which scanned the permanent
    database for inactive data files and compressed them to save space.

In typical usage of the system, client applications read historical data
(everything up through yesterday's close) directly from the permanent
database files. Activity since yesterday's close can be retrieved from
the tick holding area via a data server, and the server provides a smooth
transition into reading live ticks as the application runs in real time.


This combination of methods was chosen to ensure high performance for
back-testing of automated trading systems, which was one of the key
concerns at SSS. As long as the back-testing program is on the same machine
as the permanent data files --- which need not be where the data capture
process is running --- it will be able to read its input data at essentially
full disk speed. Lower-performance clients can access the permanent data
files from other machines via NFS mounting.


(SSS recognized that NFS access would not be a suitable answer for all
situations, so they expect that one of the high-priority tasks for the
open-source project would be to extend the data server daemon to support
retrieval of historical as well as current-day data. In that way, clients
will need only a TCP connection to the data server machine, not NFS access.
However, for typical in-house setups where the clients and servers are all
on the same LAN, the existing approach is probably preferable. In any case
one would want the access mechanism to be isolated inside a client-side access
library.)


The main data capture daemon expects to read all its input source(s)
across TCP/IP connections. This assumption works fine with modern
datafeed arrangements such as Interactive Brokers TWS API,
S&P Comstock's "CSP" datafeed handler, etc.
and should pose no problem for Internet-based feed delivery arrangements
either. For legacy setups where the datafeed is provided via a serial
port or similar hardware, we recommend installing an additional data
capture process that reads the data port, buffers data (at least a few
seconds' worth), and transmits it to the main capture daemon. By running
this data capture process at higher-than-normal priority, loss of data
can be avoided without having to worry about ensuring hard-real-time
response in the main capture daemon.