Most IT operational issues can be pared down to one common denominator – data growth. If it were not for endlessly growing volumes of data demanding equally endlessly growing amounts of infrastructure we would have solved every major hurdle to creating a stable, predictable service based utility model within IT.
In short, data growth strains and eventually breaks every single process within IT. Data growth:
- Causes the need for more and more infrastructure, which in turn causes:
- Greater likelihood of infrastructure failure/outages
- The need for additional IT staff to manage the additional infrastructure
- Increased levels of interdependencies between infrastructure elements
- Larger footprint, power, and cooling requirements within the data center
- Increased difficulty in problem isolation/resolution
- Increased likelihood in missing SLAs, being outside governance parameters, etc.
- Causes process failures and magnifies downstream issues
- Backup/Recovery – more data requires more time, more horsepower and more precision to ensure adherence to established SLAs (RPO/RTO)
- Further exacerbated by multiple copies of primary data used for various reasons
- Requires more media, more off-site vaulting expense
- Disaster Recovery – more data requires more time, more bandwidth, more infrastructure in order to ensure recoverability SLAs continue to be met
- Application slowdowns – more data to comb through requires more infrastructure to maintain application performance. Data grooming/archiving may fix performance burden but simply shifts the burden off of production environments, rarely eliminating other downstream negatives.
- Delays new business applications and initiatives – more data/infrastructure/processes require more planning and downtime associated with rolling out new applications.
- Data migrations, infrastructure rollouts/replacements further complicated with growing data – equates to slower overall IT response time to business
- Negatively affects Compliance/Governance or simply finding the right information becomes harder as the volume of information increases. Having more than a single place to look for data creates more opportunities to have data fall through the cracks – and makes it far more difficult for organizations to derive incremental benefit from an information asset after the primary use.
- Creates increased opportunities for security issues, lost information, unprotected data, etc.
Without data growth – or at least the meteoric compound rates of data growth we have experienced over the last several decades, most every popular problem discussed in IT would have been solved. Processors, storage, and networks get faster, cheaper, and easier. Management and automation tools are more than adequate – until data growth creates an unforeseen dynamic. We would have mastered the required processes to protect, deliver, store, manipulate, move, and access our data if it weren't for the never-ending growth.
The Realities:
Data growth is not going to abate – it is going to accelerate. More people connected more ways have more ability to create data than ever before and whether we like it or not, they will do so. The usage, size, shape, and requirements of data will change as well. Faced with these realities, it is apparent that continuing down a dysfunctional path can only end in failure, and as such something needs to change.
The Solution:
Since you cannot really impact the growth of newly created data, you have to deal with elements that do reside within your control. For the point of this blog, we assume that not allowing data to be created is not a realistic alternative. Therefore, there are only two fundamental control points that can reduce the burden of data growth which you can affect; the number of copies of data which exist at a given point in time, and the treatment (processes) of that data at a given point in time. Focusing on these two metrics will give you valuable insight and enable you to add predictability to your entire operation.
In order to best frame the suggested methodology, we have devised four (4) simple stages of "life" to consider for any type of data – and for each stage there is a common set of questions you should answer which will result in a conscious understanding of what is currently occurring and the ramifications of that, and provide you with an opportunity to alter those results.
The Basic Lifecycle Stages of Data:
Stage 1: Dynamic Active Online Data
Stage 2: Persistent Active Online Data
Stage 3: Persistent Inactive Online/Nearline Data
Stage 4: Persistent Inactive Offline/Deep Archive Data
Definitions:
There are two primary data life forms – Dynamic and Persistent. Dynamic data is any data that is living in a change state – where fluidity and changes occur. Persistent data is non-changing, fixed, or static. A change of data in a persistent state creates a new data object.
Both definitions are subjective – they are not hard and fast rules. While true there is most likely an exact time when some piece of data stops all changing and becomes persistent, that is not the point. Eventually all data becomes persistent – when you deem that occurrence to have happened – and what you do about it – are entirely up to you. It is the consideration that is important – reflecting on the state of data and the treatment of data at that state is where value is to be realized.
If you do nothing else, recognize that while you don't have to do anything differently – you should take the opportunity to think about whether you should do anything differently each time some major set of data evolves to a new stage. You should ask yourself two simple questions when that occurs:
- Should the data itself remain on the same physical infrastructure required during its dynamic phase?
- Should we alter any of the process requirements associated with the data at this stage?
My guess is most of the time the answers to both of these questions will be "no" – but the answer to the question of "are you going to do anything about it?" is probably also "no". If you at least stop to consider it, however, you provide yourself a ray of hope.
More to come……



On the subject of file backup, sharing and storage ...
Online backup is becoming common these days. It is estimated that 70-75% of all PC's will be connected to online backup services with in the next decade.
Thousands of online backup companies exist, from one guy operating in his apartment to fortune 500 companies.
Choosing the best online backup company will be very confusing and difficult. One website I find very helpful in making a decision to pick an online backup company is:
http://www.BackupReview.info
This site lists more than 400 online backup companies in its directory and ranks the top 25 on a monthly basis.
----While the flagrant plug which has really not contextual bearing on the blog point at all would normally be nuked by me, the site is worthy of notation.
This site concept would be really cool if you supported user generated reviews and didn't focus on companies "getting reviewed". With the amount of users leveraging backup services, centrallizing user opinions would be useful. ----- Steve
Posted by: Jennifer | June 17, 2008 at 12:47 AM
I agree completely with the problem statement. Lots of solutions of course...Compellent is intriguing in that they seem to be able to do it all in a box without much fuss.
Where do you see their solution fitting best?
-----There are tons of infrastructure solutions, that isn't the point here. The point is around the mental state of those in IT. It boils down to "what problem am I trying to solve?" more than anything else. I love what Compellent does, but even they aren't the answer to questions such as "how many copies of stuff will I have at any even point in time, or, is it good that 80% of the stuff on my brilliant box is Napster based?" Finding moments to reflect on life from the data's perspective gives users some reasonable context in which to question the basics, like what should my infrastructure be capable of at each stage, and what should I do with that data once it reaches each stage are more germaine.
Finding the proper delineation by data first will help people consider the infrastructure and process context more clearly. Cheers
Posted by: Pete Steege | June 17, 2008 at 12:41 PM