Jim Green, Chairman & CEO, Composite Software

Jim Green
So we’ll try to bring you up to speed very quickly, and also a bit of a reflection from those of you who have worked in the space for several years; following me there will be Ted Freeman, who’s a distinguished analyst in All Thing Data from Gardner, and he’ll give an industry perspective on data virtualization, and Gardner’s take on the whole thing. Through the course of the day we have customers who will talk about their projects, about their methodologies, about their practices, about their experiences, and that may be something that is actually – that you would relate to very strongly. At the end of the day, Dave Besemer, our CTO will then wrap up the day and talk some about the future of data virtualization, where we see it going, what we think is going to happen.
So with that introduction, let me talk a little bit about the journey that got us to this point. The data virtualization work has been underway for a very long time; I don’t know how to give a perspective of the industry as a whole. I will give so through the limbs of Composite Software, and talk a little bit about our evolution, which I think mirrors the evolution of data virtualization, since we’re the leader in the space and we’ve been working in it for a very long time.

Our particular company was formed in 2002. In 2003 the first product came out, and the first problem that we solved was query optimization, to do queries across multiple databases. This diagram should be straight forward; in the beginning it was a bit mystical because people had to figure out some really hard problems in order to make this about work, but we grappled with that, got that under our belt, and then subsequently very quickly expanded beyond the database. There was an emerging world at that point in time, called XML; and people immediately wanted to go beyond the database and understand how we could actually deal with a wider range of the data, and actually become more encompassing in terms of the scope of what everything was grappling with.

Of course, once we introduced XML we introduced a new data type, and then that required us to introduce a bunch of real time data transformations and reshaping and actually added quite a bit of complexity, so it took a year or so in order to get that under our belt.

Once we did that, we figured why stop at that point, and so then somewhere around 2005 we started dealing with packaged applications. It should be not underestimated that packaged applications actually have quite a different model, instead of doing a fetch from a database, or a call to a rest or a soap service, instead you’re making procedure call, and instead of passing (inaudible 0:03:07) you’re passing parameters, so the mappings on that actually were quite a bit sophisticated; nonetheless, the bottom line is now people have a single place to go where they can access all the data in the corporation, regardless of whether it was in the databases or their XML documents, or their packaged applications.

Of course, at the same time, there was a parallel universe going on, which was this application integration. And that’s gone by several names, historically it was known as EIA, today it’s sometimes referred to as business integration. With application integration, there was an attempt to standardize on that, starting around the year 2000, and SOA became very popular. Of course, with SOA there was another set of standards, there was another set of conventions people used, we got into WSDL interfaces and protocols, etc., and we thought it would be extremely exciting, about this juncture, to try to bring some application integration and data integration together. So instead of simply talking about data virtualization, the term data service was coined, somewhere along this timeframe, and people were trying to figure out how to build an application service which would have the singular function of retrieving data so that it would fit and bridge the gap between application integration and data integration. It was also required that at this point we become much more sophisticated in our XML handling, so an XQuery engine started showing up as a prerequisite for the product, in addition to the SQL engine that sits underneath the covers.
At this point, we also I think were transcending over about a million lines of code, and we were still going as fast as we could in order to handle everything that people threw at us. But we did figure out to handle data services as well as application services, and we figured out how to bridge the gap between these two endeavors and some of our customers, most noticeably customers in the government and typically in the Department of Defense had very significant SOA initiatives that actually combined data as applications.

As people then moved beyond this point and got comfortable with that particular concept, we started looking at the composite information server, not as a project based tool, but as an enterprise based tool. Not as a single server, but as the potential of actually creating an abstraction layer that would host multiple applications and connect to multiple databases underneath, and we have seen projects that go out to 30 or 40 applications connecting to 15 or 20 different databases, all under a single architecture where people started propagating this horizontally throughout their system. That is a new form of three tier architecture, computing is seeing three tier architectures at numerous different times, but at the bottom tiers of data and the middle tier is your abstractions, and then you see the proliferations of data views or perhaps data services at the top, you have multiple applications. So this trend continues strongly, and we see a number of people start with a single project, perhaps starting with a reporting project and then expanding to other applications and finally expanding into the point where it becomes an architectural still point in their IT systems.
That actually brings a whole bunch of interesting new problems for us, as we start to look at the issues of reuse and we start to look at the issues of tracking of the services, etc. But also led us to the next great leap that was required, and that was instead of dealing with project level computing, we were dealing with enterprise level computing. Instead of dealing with sporadic queries, we were dealing with continuous queries. And in fact, I think one of the speaker yesterday said that they processed 25,000 queries a day through the system, so the requirement then became one for high availability.

So how do you handle high availability, keep in mind that the server knows a great deal about the meta data, but since perspective from the data sources that the developers have loaded some specific views or data sources into the server which may not be in other servers, keep in mind that the server has actually done some queries, and has cached the results, so the results are local to a single server. If you want to actually have high availability fill over, you have to share all of that, and so you have to have the servers working in conjunction with each other.
We did the hard thing, we built an active acu-system and we did high availability so that people could run this thing on a continuous basis.

Of course, once you go enterprise, then there’s a number of other requirements that hit you, for example, people are interested in version control. They want to run the old version of the service as well as the new version of the service, and they want to have it all recorded in a repository where they can access it. They also want to expand on the number of interfaces, so today we support JMS, and ESBs and other applications as well as rest interfaces. They want to see team development, because the projects get bigger, and because you’re now deploying reuse where somebody uses somebody else’s data service, the subsequent user changes it, what does it mean to the original user, etc. There were increasing number of security options, in fact we still expand the security models as we move from release to release, and we incorporate things like NTLM and Kerberos, and SAML and PKI systems, as well as the typical user password mechanisms.
As time went on, we also embraced more data sources. Some of you are familiar with the recent announcement that we made with Netezza; we’re going to market with Netezza and we’ll be the data federation technology for their chassis and their racks. As we move through time and we discover more and more scenarios, we also come up continually with more optimizations. Some of the optimizations are specific to certain data types, some of them are specific to certain joins and certain circumstances, but there seems to be a continuing stream of query optimizations that you discover, that you grapple with, and that you capture over the course of time. So it’s been a really fascinating journey for us, and we’ve really enjoyed working with technology, inventing new things, and actually pushing the state of the art forward, as we move forward and continue to add more capability to the product.
Somewhere along the line, over the past couple of years, we gradually started understanding that this model was a very sophisticated model, but we actually had missed something. There was an element in this that could be improved upon.

If you take this simple picture here, I would depict the issue as a gap that exists between the bottom of the chart and the top of the chart. For example, we typically say what you should do is take the server, take the tool; point to your data sources and then create a view. In some cases however, people didn’t know enough about their data to actually know how to create that view. I remember one meeting I went to a company said they had 5000 databases, and you really were unclear about the schema of their databases and what they – where the authoritative data was that they needed to get access to. I was kind of reporting that in a general fashion, I didn’t name the customer, in a subsequent meeting, and everybody started chuckling, and they said “well, we tried to count our databases, and we stopped when we got to 10,000”. So we have these scenarios where people don’t understand the data very well, and there is some precursor work needed prior to creating a view, prior to creating a service, prior to putting anything in place. They want to understand what they’ve got, they want to understand the data better, and they want to make sure that what they’re doing is correct.
We work with some investment banks; we’re responsible for generating reports to the SEC. These compliance reports are enormously important, because obviously investment banks are under a lot of scrutiny from the government at this point. How do you know you’ve got all the data? How do you know you’ve captured all the trades and all the activity? How can you be sure of what’s going on? So there was a need for a capability to actually understand your data to get you to the point where you could actually use data virtualization. Therefore, what we did is we built a tool that allowed you to create a model. We don’t create a model in the abstract, this is not Erwin or ER Studio, this is a tool that actually looks inside your databases and extracts the Meta data, and understands what’s going on and constructs the model for you. We’ll show you that tool later today, we think it could be a very powerful accelerator in terms of making sure that you do your work correctly, making sure that you do your work efficiently, and also we think that the tool is unique. It’s unique in that it actually creates the model on your behalf. It’s unique in that it doesn’t just give you statistics about your data, like a profiler would, but it shows you the relationships of your data, as is depicted in this diagram.
It’s unique in it shows relationships of data across different databases, across different vendors, across different localities, and so can create a model for you of all the things that you pull from that. It’s also unique in that with just a couple of clicks of the mouse you can turn sections of this model into views and then you can do runtime with the views, so we’ve bridged the gap between the abstract modeling tools that exist in the industry today versus the runtime systems that everybody wants to move to as quickly as possible. This is really fascinating to us, because once again we’ve found another way of actually pushing that state of the art forward, and doing something which we thinks adds value to that which we’ve done before.

We love innovating, we love moving forward, and in fact that is kind of the story and history of data virtualization. As we move forward, we’re going to be doing some new things. In 2011 we’re going to be bringing some new capabilities out. Another thing that people talk to us about a lot was as you put together these large systems, as you employ data virtualization in a serious way that became mission critical to your company, then it really became required that you understand what’s going on. For example, I talked about the clustering; you can cluster eight servers together and they will share each other’s work, they will automatically balance the load between the clusters, and things will operate in as efficient a way as possible.
What happens when one of the servers fails? Well, when one fails, the others pick up the work. The problem is, since they pick up the work, how do you monitor and maintain your systems to make sure that everything is working the way it is actually supposed to be working? In order to do that, we have to provide some visibility; we have to provide some capability. This goes into a much different direction, for example, some people are interested in which applications are most active, which applications represent the most load in the databases, even to the point where people want to actually do chargebacks to departments for their use of the system so they can actually track the business use of what’s happened.
As a result, we built a monitor. The monitor is in its final stages in development, we’ll be bringing it out in early 2011, and in a manner that’s very typical at Composite Software, it actually has a really cool graphical user interface and looks more like an application than a piece of infrastructure. So it is a bit of a switch for us, in that we’ve actually brought everything out to the surface so people can manage it and monitor it. We no longer have to ignore the guy on the sales call who represents operations; we now turn to him and say yes, we actually considered you too, as well as the developers as well as the other members of the team. So now we can handle the entire suite.
Of course, what’s happening in the industry is some interesting scenarios where people are talking about revamping their IT architectures and employ something that they refer to as the Cloud. The Cloud, I think, is unique in that just about everybody who talks about it has a slightly different definition of it, so we’re like everybody in the industry, trying to struggle with what is the definition of the Cloud, what is the difference between public and private Clouds, what are the issues behind the Clouds, etc. But regardless of the details of how you define elasticity and some of the other topics that people were referring to, the Cloud will hold some data. So taking it down to the base elements, you’re going to have some of the data in the Cloud, whether you use the software as a service application, or whether or not you actually do custom development and then host that in the Cloud. There will be data in the Cloud; there will be data in your corporation, now and forever. So it represents yet another mechanism that has to be handled, as we try to do data virtualization in various settings with various Cloud topologies through various firewall settings, etc. And again, optimize the experience, have a rapid development mechanism and create a system that is actually maintainable and monitor it. So we are in the process of working on some technologies that will allow you to incorporate your Cloud data along with your corporate data into a single data system that we can then virtualize, so that your applications can have access to the data, regardless of whether or not it’s hosted in the Cloud or on the local premises in a transparent fashion.
Dave Besemer will talk some more about that at the end of the day, however I would encourage if any of you have questions about the Cloud or designs of what you would like to do or what you would like to see us do, by all means find us, throughout the course of the day, and let’s chat. We’re very interested in this as a next area of emerging technology and things that we could do about it. So that is a quick tutorial on data virtualization, it is a quick history, a composite. It is probably an accurate history of technology through the course of time, as you can see we’ve been working on it consistently. We weren’t the first to do it, but we were one of the earliest, and we were the most serious and we’ve been the most successful, but mostly we have a singular focus on it. It’s the only thing we do; it’s the only thing we want to do. We want to be the best in this particular topic, in this field of computer science.
With that background, let me introduce Bob Eve, the VP of Marketing, and we’ll move ahead to our agenda. Thank you.