the ultimate stock trading data model



Just looking for more feedback on a very forward looking OPEN SOURCE
data model. XL traders could really help us out here!!!

(see below for the general dialogue going on) Requirements
(see the discussion section)

(main site is



From Garth
Hi folks, couple of thoughts.
1. I think that the choice of development lanaguage has already been
made for us. Since we all want to go the open source route the only
sensible option would be JAVA. My rationale for this is:
a. Cross platform (runs nice on linux)
b. Volume of Developers
c. Libraries, middleware, etc.
2. I agree that we need the realtime AND end of day system.
3. The scope of the project will depend alot on the type of user we
are targeting. My beef with alot of investment software is that there
is an arbitrary divide between "consumer" and "institutional"
investor. To be fair, the institutional investor does have greater
needs (corporate compliance and integrating (with other systems))
HOWEVER all the needs of the corporate user i.e, risk management,
portfolio management are just as CRITICAL to us as "consumer"
investors if not more so.

More from Garth

Another thing that I would like to add to our functional requirements
is a module that can calculate yield given commissions, leverage,
interest rates etc. A combination risk managemnt and portfolio
managemnt. I have already done some work on this and it's not as
daunting as it first appears.

Rick to Garth

Java is a great choice. Still may have to deal with some mixed code
in various parts of the system. Hopefully the messaging system will
accommodate bridging some applications in different languages.

Major role for Excel as a programmable user interface, data entry,
reporting, charting, etc.

Real-time and end of day data systems don't have to be concurrently
developed. Still, its a great idea to allow the design for both to
work together in the future. Especially combining time buckets into
days so the real time system will mesh with the end of day/week/month

Target user (proposed for discussion): One person, managing 100
separate accounts, with securities in 100 different countries,
average account size 1 million dollars U.S., for total assets managed
100 million dollars. Also fully usable by a mutual fund that manages
ONE account, of 100 million dollars, across 100 different countries.
Also fully usable by a hedge fund that manages 100 million for three
investors, long short, with derivatives, 100 different types of odd
fixed income paper, lots of FX and all types of exchange traded

The above is to set the design direction that this ultimately is "a
single person aircraft." No need for co pilot, radio man, navigation
officer, several stewardesses, full ground crew, etc. Push off from
the gate and fly this thing well all by yourself.

On the other hand, it is a VERY capable system. Designed from the
ground up to be a turret of action whenever the market becomes
exceedingly tumultuous. It has the capability of dealing very well
with 5 million dollar transactions (5% of a $100 million portfolio)
in a single ticker without the investor getting screwed.


Rick Replys to Garth
Hi folks, I am really enjoying this discourse. It is good to have
like minded individuals to bounce things off. I agree with everything
Christian said. The Data model is crucial. I have been thinking
through the data model and here are some of my thoughts. Feedback
Have a planet class, then country class, then exchange class, then
financial instrument class, then data point classes (for intraday
data points and end of day). Obviously, we will have to fill out the
attributes and create other classes via inheritance. I think that we
could use Hibernate to manage the object-relational mapping to mysql.
This could be done fairly quickly. Just some thoughts to get things
rolling. An all open source data layer :) !!!

Planet Class
One of the best ideas yet! Not because we intend to trade with
extraterrestrials but to illustrate to everyone we want some "room"
designed into the system to possibly use later. We are going for 20
year forward foundation code, so lets leave room "on both ends of the
string" for expansion. Remember when Bell made telephones, they put
in 4 wires when they only really needed two? When you wanted the
Princess Light those two wires came in handy!

Country Class
Would prefer this to be a "geodemographic" class with latitude,
longitude and boundary vectors. Remember, in the United States you
have States, Cities, Towns, Counties, Municipalities, Zip Codes,
Census Tracts, Sales Tax Jurisdictions, Private Toll Roads, ect all
which overlap and follow no rules. Why is geography important?
Commodities are delivered, unloaded to a specific port, firm names
can be duplicated (same name but different firms in two countries,
etc.) Also emerging countries boarders can change. Hell, large
established countries boundaries change. Debt can be issued by a firm
or municipality, or project. Things like SEC fees may be due on U.S.
transactions in listed securities, even if they are exchanged across
just a bulliten board communication network. Etc.

Plus, in doing research you might want to know "odd things" involving
geodemography. For instance which insurance company (or reinsurance
company) will be hit hardest after the hurricane. (Requiring you to
be able to search company records as to who they insured, map that to
geodemograpic coordinates, establish the likely geographic track of
the hurricane, estimate damage, etc. Also, similar, estimate the
damage to oil refining capacity in the U.S. using same technique,
find oil facilites by zip code, translate to zip code centroids (or
better yet map facility by satellite photo to lat/long coordiantes),
plot the hurricane probability paths, estimate the capacity
reduction. Another key data point to have for each facility would be
the VERY CRITICAL "feet above sea level" statistic, building
earthquake rating, etc. All this just illustrates the need for a good
geodemographic data structure.

See also:
ISO 3166 Domicile codes codes for countries and
ISO 4217 Currency codes Currency codes

Exchange class

Not only physical, organized exchanges but also Electronic
Communications Networks (ECN), Auctions and other markets. Some of
these do NOT map to a geographic point. and%
ISO 10383 Codes for exchanges and market identification (MIC) Codes for
exchanges and market identification (MIC) wiki page excellent

Financial instrument class
We don't need to do too much orginal thinking here...."except" ...
ISO 6166 International securities identification numbering
system (ISIN)
ISO 10962 Classification of Financial Instruments (CFI code)

The design should allow the derivatives and underlyings to be
built "like tinker toys" - snapped together and taken apart. The VaR
module and portfolio must be able to "see" what the combinatorial
possiblities are. For example the data structure must support the
ability to see all "Synthetics" you have created in the test
portfolio or the impact of adding a certain derivative:

Some Synthetic Examples
Long Call - long put and a long stock or future
Long Put - long call and a short stock or future
Long Stock - short put and a long call
Short Call- short put and a short stock or future
Short Put - short call and a long stock or future
Short Stock - short call and a long put
Straddle - Futures and options combined to create a delta neutral
Underlying - long (short) call together with a short (long) put. Both
options have the same underlying, the same strike price and the same
expiration date

ISO 19312 Financial Instrument Attributes and other Market Data
Model (Extended with 20022)
ISO 20022-1 UNIversal Financial Industry message scheme - Part 1:
Overall methodology and format specifications for inputs to and
outputs from the ISO 20022 Repository Repository where processes,
messages and the data dictionary are modelled together using the
Unified Modelling Language (UML)
ISO 20022-2 UNIversal Financial Industry message scheme - Part 2:
Roles and responsibilities of the registration bodies
ISO 20022-3 UNIversal Financial Industry message scheme - Part 3:
ISO 20022 modelling guidelines
ISO 20022-4 UNIversal Financial Industry message scheme - Part 4:
ISO 20022 XML design rules
ISO 20022-5 UNIversal Financial Industry message scheme - Part 5:
ISO 20022 reverse engineering

data point classes (for intraday data points and end of day)

Need a Time Class
Note at its most detailed level we probably need 1/100's of a second
or more to properly sequence bids/offers/transaction in a very highly
traded security during a hugh volume day. Our system needs to be MUCH
better than typical exchanges (which report only Seconds Since
Midnight and use a propiretary sequence number that is never made
public, as a result only THEY and a court order can see who exactly
had priority.)

Need to be able to store "tick by tick" data, timestamp, change in
any of the following: bid, bid volume, offer, offer volume,
executions, execution size, limit orders, special order instructions
(fill or kill, all or none, etc) plus others.

Need a way to consolidate tick-by-tick data into "bars" or time
buckets. E.g. Hi/Low/Last for 5 minutes, 10 minutes, hour, day, week,
month, etc.

Obviously, we will have to fill out the attributes and create other
classes via inheritance.

Need an Entity class for financial reporting

XBRL eXtensible Business Reporting Language (Corporate business
SWIFT ? Tends to deal mostly with settlement, ISO authorized for
Bank Identifier Code see ISO 9362, 15022?

Need a map from Reporting Entity to Financal Instruments issued
and/or traded (derivatives)

Need a messaging class?
See ISO 20022

I think that we could use Hibernate to manage the object-relational
mapping to mysql.

I'm don't have a great background here but was thinking we should
have an "Industrial Strength" messaging layer in the architecture
between application modules and persistent storage.

Other thoughts:

MySql - nice stable db but how about a "connector" so multiple
databases can be easily hooked up to. For example Must be something very similar for Java?

This could be done fairly quickly.

I'm hoping we move a bit more slowly, especially when laying in
cornerstones of the artecture. If people are really just itching to
run forward they should take just a small section and run (for
example - arbitrage commodity to underlying stock using end of day
data. Just go for it as a separate project and try to make some $$$,
hell that's what this is all about!!!) Keep the eye on both balls 1>
the current "make a quick buck" stuff (which by the way will be very
helpful testing out some of the longer term areceture stuff) and 2>
the longer term 20 year project.

Just some thoughts to get things rolling. An all open source data
layer :) !!!

Excelent thoughts! Open discussion of the issues is the only way we
will ever get close to getting this right.


More ... Requirements
(see the discussion section)

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question