A Practical Example Of How Process Change Will Save Huge Dough....
Lets assume we are Joe IT guy in the ABC company – an upper middle market company with a few thousand employees, a dozen sites, and all the problems folks like us deal with. We run our transactional production systems and our distributed windows stuff. We have big SANs and file servers. We have stuff everywhere. We back things up, we do some DR. We are tragically overworked and chronically undervalued. We are Joe IT.
Let’s pick one little area where process improvement can yield big results – Test and Development. Everyone has T/D operations. Most operate in some version of the following:
- T/D is used to make sure internal and external software applications, new infrastructure, and upgrades work the way the are supposed to prior to them being rolled out into production.
- In order to perform step 1, T/D has to get real life data that is complete and current from production systems into their own systems.
- In order to perform step 2, T/D has to beg, borrow and steal – and lives at the mercy of the production people. It takes time, planning, and prayer. The application normally has to come down, the database quiesced, the infrastructure specialists turning knobs and pushing buttons, and the data moved.
- Once step 2 is complete, the actual testing can occur. Usually T/D will make additional copies of the data sets (which are probably already out of date by the time they are used) to test different things.
Now, let’s talk about the practical inefficiencies of what just happened. First, the production system runs some application(s), most likely on top of a database. We tend to find our production data really important, so normally we would have at least two copies of that data. We tend to keep our production data on very expensive, very big, very power hungry infrastructure. We create the data on that infrastructure, and then we keep it there. We then make more copies of that data, also tending to keep it there. We might have 2, 3, 4 + copies of the exact same data, at various points in time or at the same point in time. That means our cost of housing that data is 2, 3, or 4 plus times the cost of housing it as a single instance – and not just for capital – but for power, cooling, footprint, etc.
When we make our test copies, we also tend to create and keep those copies on our production system. Sometimes we delete them when we are done, but sometimes we leave them around for a long time. Sometimes we forget about them altogether. Those copies take up more space, power, and brains. Those tend to be backed up along with everything else on that mission critical production system. That means we might be backing up 23 copies of the exact same data in one single backup. If we do a full backup each week, we will create new backups of the exact same 23 copies of the exact same data each time. You see where I’m going with this?
We have test servers that we run that sit around sucking juice whether they are used or not.
It gets worse. Not only was the process of getting the database copies difficult, and expensive, but no one considers the security implications associated with having potentially hundreds of copies of real production data floating around – in the production system, in the test systems, on the backup systems, at the DR site, and on the tape at Iron Mountain. Backup is good, right?
Having 2, 10, or 100 copies of the same, non-changing data sitting on “production” systems isn’t even the real problem. The downstream effects are the issue. Extra stuff on the production system slows down the production application, slows down the database, and slows down the user. We tend to combat this by buying more hardware. Bigger, faster hardware that sucks more juice, takes up more room, and causes more disruption. More “data” means other processes suffer – networks get clogged so we need more bandwidth, backup servers get bogged down so we need bigger machines, backup targets get full faster, etc. The rest of our processes don’t know or care that it’s the same data – only that there is more. More causes problems. Problems cost money.
The conundrum is that if you are the vendor in the production system, you kind of like it when Joe IT calls for more stuff. It’s hard to tell Joe not to buy more – but instead to just change some of the behavior that causes the problems. If we were to try to really help Joe, however, we’d lay it out like this;
- Define the objectives for Test and Development – in a perfect world
a. Get a complete, accurate, and timely copy of the exact production database
i. Zero impact – non-disruptive to production
ii. Non-disruptive to production IT
iii. Automated
b. Put that copy somewhere else – NOT on the production system
i. Don’t create work for production systems or people
ii. Do it dynamically
iii. Run virtual machines everywhere you can
c. Create a protection policy for the Test/Dev data
i. Do we back it up?
ii. If so, when, why, and how often?
d. Create a security policy for the Test/Dev data
i. Protect the assets as if it were still in production
1. This assumes you are not TJX or TSA
ii. Enforce disposition/destruction
e. Define a Data Repurpose Policy
i. Who else could use a copy of this data?
1. Should we use this copy as a backup copy?
2. Should we replicate this copy as a DR copy?
ii. Are there other applications that could use this?
1. Data Warehouse
2. Business Intelligence
3. Those guys in marketing
4. Business partners
If Joe did this, nice things happen. By simply grooming the production systems of Test/Dev copies, Joe unclogs a lot of space, performance, and all the other associated costs. Joe just took a universal pain in the rump from the production IT staff and made it disappear. The Test/Dev folks have a consistent schedule to get fresh whole data sets to play with. The security people are happy that there are less copies of stuff everywhere. The backup guy is happy that the extra stuff isn’t killing his systems. The Finance guy is happy because the cost of the test/dev infrastructure is an order of magnitude lower than the production stuff. The licensing costs alone for running the applications and database on bigger machines are huge – and now they will be pushed our or even go away.
If Joe did this, he might also figure out that not only should ABC keep Test/Dev data off of production – but that even the production data itself is “groomable”. 90% of the data that makes up production is fixed content, or data that isn’t going to change. It is no longer “dynamic”, and as such, the same questions can be asked of it. If it isn’t going to change, should we still back it up the same way? Should we still have 4 primary copies of it? Shouldn’t we put it on infrastructure that has attributes that are more aligned with the current state of that data?
If the data isn’t changing, but the attributes are – then continuing to do things the same way is illogical. If, after some period of time, access of a certain piece of “formerly transactional” data went from frequent to never, why don’t we put it through the same exercise that we did with the Test/Dev data? If we moved that data out of “production” physically, but maintained its place logically (i.e. we could still access it in the same way if required – but it might take longer to show up) – we could then do some really interesting things. By delineating our production data into Dynamic and Static (fixed) buckets, under a single consistent logical view, we would change things forever. Our static data would have a different “lifecycle” – with different attribute requirements as time (or whatever metric) moved on. We would move it out of our “tier 1” outrageously expensive gear and onto much lower cost gear, which would mean that our tier 1 gear would perform at optimal levels all of the time. In theory, if our Dynamic to Static flow was predictable, we may never have to buy another piece of hardware or upgrade another license for our production systems again. We would not only move data off of our most expensive, mission critical stuff, but we would change the attribute requirements of that data when we did. For example, we might keep 4 primary mirrors of that data while it is dynamic, and back those up every hour to a disk target, and replicate those targets every 8 hours off site, etc. – but we’d stop doing that once the data became static and moved to the “static production” state. Perhaps we would make sure we had one mirror locally, and one on-line backup locally, and another copy at the DR site, with 3 “oh no” copies on tape. Then we’d never back it up again – because what would the point be?
If you did that you’d save a lot more than 25% of your power and cooling budget. You would save so much money you would probably affect the earnings per share of your stock. You might want to negotiate a bonus up front!
Vendors, who naturally are opposed to such intelligence unless they aren’t the ones getting all of your money, are starting to wake up. It’s hard to upset the gravy train, but they are starting. The fact is, the very nature of data is changing, so just jamming more of the same down your throat might help them short term, but eventually will cause them more problems because you won’t be able to ingest any more stuff from them. They are already seeing it. Those that learn to cannibalize themselves are the ones who survive long term. The principal of adaptability in Darwinism works.
IBM bought Softek to do things like this – at least some of the things like this. They are the ones who are at the very core – they are the Mainframe. By providing customers a way to groom production systems, they take food out of their own mouths. There have been heated arguments inside the blue machine for just this reason. The bet is that while they may push off mainframe revenues, they pick it up in other areas. EMC owns tons of the storage market attached to those mainframes. Oracle gets a big chunk of that pie too. By helping customers groom their production worlds and create new efficiencies, IBM takes money away from EMC and Oracle – and gives themselves a shot to compete for that storage business, and the data management functions that Oracle provided. Network appliance has been working with Solix to pull things out of those same environments and have them land on Netapp boxes. Not only is a Netapp box a heck of a lot less money than a big giant tier-1 mega array, but once the data is there people are astounded at how easy it becomes to do really useful things. One big wig at a Fortune 1000 told me that they could now create zero-footprint copies of their production data instantly using Netapp’s writable snapshot, which lets them do things that haven’t even been remotely possible previously. He said “it was like we hired 4 more people with the time we saved – and now things that took days take seconds.” Most folks really can’t afford to stop production to do backups or run reports, but they have had no choice. Now they do– and they now do it securely. EMC won’t like losing the core disk business, but if they can take the server dollars, the application/database dollars and get the customer to see the light, they have a whole new opportunity to sell the other thousand things they have in their bag. If it’s fixed content, why not let Documentum manage it? Who needs Oracle? It’s sort of like being a stock trader – you don’t care whether it’s up or down, only that a transaction is occurring. If you are an “emerging” player – Isilon, Pillar, or anybody else – this is an opportunity to shine. If you are a little guy trying to displace NetBackup, this is your chance. Real change opens doors.
Process changes of this magnitude are truly game changing – for IT and vendors alike. It tends to take a massive event to get either side to move out of their comfort zone however, and not being able to buy any more power might just be that event. Once something like that occurs, and people are forced to look at new ways of doing things, and then Pandora’s Box might fly open. With the lights turned on, people can see. Sometimes things don’t look as good with the lights on – just ask my wife.



Comments