Closing the Gap between IT and the Business by Virtualizing Data
The Issue: Business units are making decisions outside of IT in regards to Information Access applications and tools – and then expecting IT to quickly provision and support those applications. Information Access applications include every business facing application – from Word to a trading system to CRM to e-Discovery. Priority one mandates – such as regulatory compliance and legal, are especially "hot" currently. Business critical applications – those designed to extract incremental value from existing information, are taking a backseat.
The Result: IT is becoming further marginalized in the eyes of the business. IT is forced to say "no" to business requests, as it simply cannot bring new applications online in any short term window due to legacy issues. As "hot" applications are brought on-line, they further stress IT resources as they tend to be implemented in a stovepipe fashion – where the business unit only cares about that application but not in context to the impact it may have on other back-end IT operations. The Business Unit is therefore acquiring these tools/services, and handing them off to IT to support AFTER the decisions have been made.
The situation today is becoming flammable. The business wants to be able to react to requirements quickly, without having to be overly concerned for IT and their ability to deliver. The BU wants known costs for known services in a known timeframe – and the ability to add or delete service levels based on costs and requirements. The BU believes it is mandated to act, so as IT pushes back, the BU moves ahead regardless.
IT wants to be able to fulfill all the requirements of the BU, but must attempt to do so within the encumbrances IT has – from people to power and cooling to space. IT has been addressing the independence of the business unit in one of a few basic ways;
- IT attempts to support the timeline demands of the business by creating yet another stovepipe operation – intentionally keeping the infrastructure, data, and operations separate from the mainstream. While all recognize this is the most expensive, least efficient, and worst case scenario from the ability to create common data value, protection, usage, and management, it is more often than not the solution IT is getting "jammed" with from the business unit.
- IT attempts to support the demands of the business but requires the new Information Access application adhere to existing IT standards operationally, and preferably with better utilized, shared infrastructure assets and people. This will always take longer, require greater planning, testing, and implementation, and require downstream regression testing on what cause and effects the new application will have on existing processes, people, and infrastructure. This takes much greater time, resources, planning, and money typically.
- The business bypasses IT altogether and either sets up the application as an external service offering, or worse, brings the solution in house with no IT involvement at all.
EXAMPLE:
Archive/E-Discovery: According to ESG Research (E-Discovery Requirements Escalate, November 2007), only 7% of the time does IT make the decision to use funds to build out infrastructure, tools, applications and processes to support E-Discovery mandates, whereas 37% of the time the Legal Department makes those decisions by themselves – with no involvement up front from IT at all.
Examples such as this are becoming more common as unknown business unit requirements continue to appear – causing an increased rift between an already tenuous relationship involving the core business and internal IT.
The True Result:
Situations as described previously are bad for business – but in order to facilitate change we must acknowledge and understand the realities within the cycle. There is a common flow that tends to occur regardless of when IT is brought into the loop.
- The business unit has a requirement.
- The business makes a decision on E-Discovery tools and policies.
- IT is handed a mandate from the business to implement and support the decision.
- Even if the implementation if flawless, a new stovepipe has been created.
- That application only looks for data that it ingests – requiring decisions to be made as to what that data is and how to get it into the system.
- Applications such as this may crawl existing data sources to ingest, but must be directed as to what specific data types to look for.
- Applications such as this normally only support one or two different data types – email for example, but not database/transaction records, or unstructured data living outside the core data center.
- A discovery request from the new application "archive" is only successful if the request contains all of the relevant data – which rarely (if ever) exists entirely within that archive.
- IT tends to attempt to evolve the new application stack into existing processes for backup/recovery, disaster recovery, etc. stressing existing systems and processes. It is easier for already taxed IT personnel to add to existing operating processes versus creating new ones – regardless of the applicability.
Summary Result: IT is already operating above capacity. The business unit views IT's inflexibility and time to service delays as unacceptable, and as such begins to make decisions independently of IT. The businesses may gain accelerated time to deployment of the new application, but has little knowledge of the fact that they may be causing more damage than good overall. IT becomes further stressed and the cycle continues until something breaks.
I worry about the inevitable long term effects of this cycle, but recognize that most will not have the time or luxury to concern themselves with such things. In the short term, IT is being further removed from the decision making process for business unit information access decisions. The result is that IT ends up having to support the goals of the business unit but has no (or limited) control over the decision processes and the effects of those decisions on IT's overall ability to deliver services.
The Solution:
The fundamental problem of running IT as a service bureau is rigidity - that is because infrastructure is stove-piped, complex, requires hyper-specialization at every element, and has incalculable points of interdependencies. The concept of "fluidity" is abstract at best. In an ideal world the data center would simply be a collection of infrastructural resources capable of morphing into virtual stovepipes in turn capable of delivering on the immediate and long term needs of the business – and to be malleable in semi-real time in order to deal with unknown new requirements or unforeseen events.
In short, data center virtualization is required such that the business no longer needs to be concerned with IT and its idiosyncrasies and IT no longer needs to say "no". If the data center were "liquid", IT could say yes first, bring up the application, and pick up the pieces as a background task.
Server virtualization technologies are the first infrastructure layer that begins to enable this reality. By creating a server infrastructure that provides for virtual machines, server fluidity is enabled. Virtual machines can move between physical machines at will and even automatically in the event of failure, new performance criteria, or any other new event or issue. Server virtualization means that at least from the perspective of "always having a machine ready for the unknown", we can appear fluid.
Being able to provide a virtual server to a business unit on a moment's notice is nice, but limited. It doesn't address all the other issues downstream. It is a good start to begin to alter the perception of IT and to close the gap by providing a "can do" answer up front, but it will only slow the problem.
What is really required is to stop the primary focus on infrastructure and begin to focus on data. The business application doesn't care about infrastructure – it assumes infrastructure can support its requirements. The business unit cares about the data associated with that application – while the overall corporation needs to care about the data from a holistic perspective. Nobody outside of IT cares about infrastructure. IT needs to focus on how the data can be best managed – since housing, manipulating, finding, and protecting data is the baseline reason for IT's being.
Data Virtualization is the next next thing. Applications connect to information via infrastructure. Infrastructure change interrupts that connection. By creating a virtual connection between the application and data, we can solve most of today's primary IT problems and re-establish a tighter bond between IT and the business.
The business owns the application – it should decide which requirements it needs to perform its stated objective – and not IT. IT should own the data (note: not information, but data – the individual applications create and manipulate that data which when utilized becomes information.). When the business unit executes on their own without IT, IT ends up controlling nothing and reacting constantly in a no-win situation.
As long as IT can say "yes, we can provide you a way to execute your application and provide you access to your data based on your requirements", the business will gladly change its perception and hand off infrastructure and data control to IT.
Here's how I see it working in the real world. In the previous example, Legal chose an E-Discovery application (glorified search) and created corporate governance policies that got shoved into IT. Everything in the solution ended up stovepipes, which means it is invariably riddled with holes. In the new world of data virtualization combined with infrastructure virtualization, IT starts with one simple rule to the business: Your application must house its data "here".
"Here "is a virtual data abstraction interface that accepts any and all types of data, from any and all types of applications – in one common virtual place. Want to have your E-discovery tool query against our email data? Point it here. Want to search across email and structured transactional data? Point it to the same place. Want to write new data generated by a new application or an old Word file? Yes, click "save" and here is where it will be.
If there is only one virtual place to put all data, then there is only one virtual place to find all data. Behind that data abstraction IT still has to do all the hard things it's always done – decide what data is going to reside where, for how long, how to protect it, etc. – but if it can be done "fluidly" then change suddenly isn't paramount. If you can react to changing infrastructural requirements without the business unit calling, did it even happen? I suggest that if the phone isn't ringing, things are good.
Server virtualization enables fluidity of virtual machines executing application stacks – so that if a failure occurs or if new powerful machine technologies come out they can be integrated dynamically, and based on priorities we might move a virtual machine to a whole new environment – without the business unit knowing or caring. Server migration, high availability, disaster recovery, performance optimization, and asset utilization/optimization are all functions within change states that normally cause disruption – or at the very least they cause the phone to ring. Virtualization enables the automation and fluidity beneath that abstraction layer to be invisible.
Fluidity now exists between the business unit/application and the server layer. By adding the same construct to the data layer, we further the overall fluidity goal – and now create an abstracted path between the business unit/application and the data itself. By implementing the data abstraction layer, we can now (manually or automated) create fluidity for tactical functions that today cause disruption and phone calls – such as data migrations, failure scenarios, data protection (recovery operations), capacity addition, capacity reductions, etc.
Data virtualization is not storage virtualization. Storage sits at the bottom of the data layer, and like the rest of infrastructure, should also be virtualized. By creating basic data abstractions, logically all data can exist in one place – making it easier to perform any application or data operational function. Data layer services – such as database management, logical provisioning, file system management, performance optimization, protection, etc. are functions that can be more easily addressed simply because all data exists in one virtual location. IT managers would continue to have to operate and optimize the physical storage layer beneath, but by creating a fluid data abstraction layer, they are able to mitigate the physical effects of change, which results in less negative visibility and less phone calls.
One of the reasons storage virtualization has been slow to move upstream is that specialized skills and knowledge about devices and functions within this layer are lost when the abstraction moves above those devices. For example, if your storage administrators are guru's at managing and operating EMC Clariions, giving them the ability to see those Clariions as generic disk storage has not been enough overall benefit versus the losing the ability to utilize the specific tools and skills acquired in order to manage those devices. It has been a losing proposition for industry to take non-commoditized functional infrastructure and say "now you can treat these expensive devices as disposable – all you have to do if forget all the skills and tools you know and learned and instead do everything my way". By creating a data virtualization approach, you don't have to throw out the baby with the bathwater – you can simply buy time to do it the right way.
What's required? In simple terms, a global virtual data access layer that encapsulates and centralizes data management functionality in one place. Ideally, this virtual data "portal" will present itself as whatever the application wants it to be – regardless of the type of data it spits out. It would ingest the data and route it to whatever appropriate underlying infrastructure meets the business unit requirements. As a central data management engine, it would be able to apply universal and object specific policies (retention, protection, security, categorization, performance/lifecycle management/HSM, etc.) based on a "menu" of options the business unit chooses (each with a known cost).
Consider data as either "dynamic" or "fixed" (non-changing digital asset). I'd suggest that every single data object lives in this layer once it becomes fixed – or persistent. In this way, all the data within the organization is "alive". It may be physically relegated to offsite, offline media – but to the application or the business, it is alive – until a policy states that it must be destroyed. In that way, when legal wants to bring a new e-discovery search tool online in the future, it can point at ALL of the living corporate data – not just portions. When the marketing department wants to mine data for business intelligence and new value creation – it points its gun at the one place where everything lives. Imagine how much easier this could make it to garner new value from old data – and destroy the "chasm" that exists between the business and IT at the same time.
This approach provides for IT to re-evaluate and completely alter faulty processes, and enable consistency and speed in our ability to deliver services to the business. From media management to regulatory compliance, all the tactical and difficult IT functions which cause us to say "no" so often could now be centrally managed and controlled, enabling IT to say "yes" first, and dynamically make the necessary changes happen without being a drag on the progress of the business – and that will one happy day for a lot of people.
Now, somebody go figure out how to package this idea up.



Either I don't understand, or I don't agree with you.
Virtualization (whether server or storage) is implemented as an abstraction layer. If we picture an application or service stack, this is an additional layer. This is very valuable when, as you said, it allows changes to take place at layers below the virtualization layer that are transparent to the higher layers.
Where I disagree is your statement that the administrators can "forget all the skills and tools you know and learned and instead do everything my way". The lower layers of the stack still exist and they still have to be managed, optimized, and tuned. The value is NOT that those layers can cease to exist. The value IS that the management of those layers can be transparent to the high-level applications, and the BU.
In fact, virtualization adds complexity and management requirement to the data center. It is additional technology. I'm not suggesting that people stop virtualizing - I'm a fan; there are a lot of good reasons to add virtualization. However, it's important to recognize that adding a layer to the stack and adding another technology to the data center will result in adding complexity and a requirement for administration of that additional layer.
-----The good news is we agree - the bad news is clearly I wasn't as concise and clear as I would have liked. The quote I reference is what the VENDORS say - not the user. In other words, vendors think they can solve any problem as long as you throw out everything you already have and replace it with their stuff. I was trying (badly) to make your exact point - you still need to do all the management things tommorow as you do today - but if we can abstract those functions from the business, at least the phone won't ring. Ultimately, we want to automate as many of those functions as possible.
Virtualization definetely adds complexity at certain points regardless, so I agree with you there. However, it seems inevitable to me that we find ways to mitigate infrastructure knob twiddling by killing the direct dependence on specific devices or interfaces to devices. I'm not even suggesting a new layer is required in the software stack - since I posted this I've had 97 vendors tell me that they already do this! A few have a legitimate argument - Bridgehead Software positions itself as an archive play, but in reality they are a universal ingestion engine that can apply management and automation across any type of content. Seven-Ten software can make a similar claim - they just present a giant virtual disk to the applications. There are plenty of others fighting to dominate this new category already. I can't wait to see the first ad.
More than anything, I wanted to get folks thinking about solving the problem a different way, and it appears to me that instead of moving serially as normal (app to server to network to storage) that if we focus on app to data directly we could expose (and presumably fix) the holes in the middle faster.
Thanks ------Steve
Posted by: Carl Isenburg | December 05, 2007 at 03:25 PM