Comments, suggestions, feedback and questions on TREE Data Server
can be sent to George Paul
This document gives an overview of the TREE Data Server system,
including brief specifications, a discussion of the system architecture,
and project status.
The TREE Data Server system provides for capture of real-time financial data
from one or several datafeed services. Tick-level data can be captured for
equities, indexes, futures, and options. Incoming data is translated into
a common internal format that is the same across all supported datafeeds.
Live data is made available to client applications via TCP/IP connections;
therefore, the clients can run on the same machine as the datafeed parser
or different machine(s). Collected data is also stored into a historical
database.
The system has extremely good real-time performance; it copes easily with
a high-speed datafeed, even on machines that are small and slow by current
standards.
The data server code is intended to run on most Unix platforms.
See tree/doc/INSTALL for supported platforms. Client applications
can run on anything that supports TCP networking and NFS.
The code SSS released as open source includes the following
components:
TREE (Trade Research and Execution Environment) is the project name
SSS used for the trading support software. SSS chose to release the data server
and database components of the project as open source, because these
components needed more development effort than SSS could spare for them.
The SSS code base has been updated to support the following platforms
and datafeed:
See tree/README, tree/doc/ib.notes and tree/doc/TODO.
SSS has been using this code in-house for about five years, so the core
components are pretty thoroughly debugged and useful. The administrative
tools are fairly crude and could use a lot more effort --- for example,
enabling collection of a new futures series is not as simple as one could
wish. Better documentation is badly needed as well.
We expect that the initial thrust of open-source development will be on
smoothing out portability glitches (as the code is used on platforms we
haven't tried), interfacing to new datafeed services, and design of a
widely useful client-side interface library. (Our own client-side code
is tightly integrated into applications that we don't intend to release,
and probably wouldn't be very useful to other developers anyway.)
Further down the road, things that we hope to see developed in the
open-source project include:
A running TREE system contains the following components:
In typical usage of the system, client applications read historical data
(everything up through yesterday's close) directly from the permanent
database files. Activity since yesterday's close can be retrieved from
the tick holding area via a data server, and the server provides a smooth
transition into reading live ticks as the application runs in real time.
This combination of methods was chosen to ensure high performance for
back-testing of automated trading systems, which was one of the key
concerns at SSS. As long as the back-testing program is on the same machine
as the permanent data files --- which need not be where the data capture
process is running --- it will be able to read its input data at essentially
full disk speed. Lower-performance clients can access the permanent data
files from other machines via NFS mounting.
(SSS recognized that NFS access would not be a suitable answer for all
situations, so they expect that one of the high-priority tasks for the
open-source project would be to extend the data server daemon to support
retrieval of historical as well as current-day data. In that way, clients
will need only a TCP connection to the data server machine, not NFS access.
However, for typical in-house setups where the clients and servers are all
on the same LAN, the existing approach is probably preferable. In any case
one would want the access mechanism to be isolated inside a client-side access
library.)
The main data capture daemon expects to read all its input source(s)
across TCP/IP connections. This assumption works fine with modern
datafeed arrangements such as Interactive Brokers TWS API,
S&P Comstock's "CSP" datafeed handler, etc.
and should pose no problem for Internet-based feed delivery arrangements
either. For legacy setups where the datafeed is provided via a serial
port or similar hardware, we recommend installing an additional data
capture process that reads the data port, buffers data (at least a few
seconds' worth), and transmits it to the main capture daemon. By running
this data capture process at higher-than-normal priority, loss of data
can be avoided without having to worry about ensuring hard-real-time
response in the main capture daemon.