Storage Capacity KPIs
01/22/2010 Leave a comment
When I first started working with Distributed Storage for many years I worked with Asset Management and various other departments to answer the question, “How much storage do we have available and how much is used?”. The problem was depending upon how the numbers were sumarized and presented various impressions were left with management that didn’t communicate a complete picture. This invariably led to inaccurate assumptions that required many subsequent explanations. If we are to overcome these problems and communicate a clear picture of storage capacity we must address several issues.
The first issue to be addressed is whether to report storage capacities as raw capacity or usable capacity. The simplest method is to report raw but since these numbers do not take into account protection and management overhead service delivery management is tempted to think that more storage is available then what is available in reality. If these overheads aren’t taken into account when projecting future demand the projected supply may be overstated. For this reason it is probably best to provide facilities for reporting both raw numbers which will be used more in the day to day support and usable numbers for planning and estimation purposes.
To further complicate this problem most corporations have heterogeneous storage environments. Each vendor uses their own terminology for features and metrics. For example EMC uses the terms SRDF and TimeFinder for replicated storage and in EMC Control Center reports capacity used by these facilities as Local Replica Usable and Remote Replica Usable where as IBM terms replicated storage as Copy Services and storage allocated to Copy Services is reported as Remote Copy Usable and Local Copy Usable.
The third and final problem to overcome is one of perspective. For the most part storage technicians when talking about capacity are speaking from the stand point of the storage hardware and have in mind spindles and logical abstraction of this storage by disk subsystems for presentation to host systems. Where-as many service delivery analysts and IT system users look at storage from the stand point of how it has been further abstracted by host systems which have formatted the capacity into file systems. One group is thinking about file space and the other is thinking about drive space. I have seen this problem of perspective result in confusion when management looks at a new application that needs X amount of storage and they see free space on another system and erroneously assumed that the free space is just available to any other system. The truth is; free space within a file system may only be used from that file system, free space on a disk partially allocated may only be used by either a new file system or an existing file system. Free space on a disk assigned to a system but not yet allocated through logical formatting may be used by another system but requires additional work to remove it from that system and place it on another system. In other words there are degrees of freedom with regard to available capacity.
To resolve some of these problems the first step is to define a terminology which is vendor agnostic. Storage capacity terms must first start out at a high level defining terms that generically encompass functionality implemented by any storage vendor. To resolve the issue of degrees of available capacity the relationship of these terms to over-all capacity must be established.

Storage Capacity KPI Hierarchy
A generic set of terms is depicted in an hierarchy in the “Storage Capacity KPI Hierarchy” diagram. We will quickly review each term. Every storage solution has a potential capacity, the most amount of storage that can be installed in a storage sub-system. Drilling down into a storage sub system what ever capacity is already installed in a sub-system is termed the total capacity and anything left over related to the potential capacity would be called the expansion capacity. Within the total capacity some of that capacity is considered configured for use and whatever remains is considered unconfigured. Then within the the configured capacity some of that capacity may be reserved for maintenance, replication, protection (RAID sparing) or other over-head, some is assigned and what remains is unassigned capacity. Now assigned capacity may be available to be used by a system to which it is assigned but it is considered unallocated until it is logically formatted into a logical volume and/or file system. Capacity terminology shifts from being storage sub-system centric to being file space centric after being assigned to the host. Once allocated to a file system the space within the file system is either used by file data or free.
Once this terminology and these high level metrics are established the next undertaking is to determine how the vendor specific metrics may be used to calculate these metrics. After this is done then the reports using these metrics must be defined. It may also be useful to combine some of the metrics to simplify reporting. We will focus our metrics on usable capacities for reporting back to Service Delivery and users. Below are some examples of metrics and formulas which may be used to calculate them from EMC Control Center and HP EVA’s Virtual Controller Software.
Expansion: Usable* GB expandability based upon the number of free slots in the array that may be used to expand capacity multiplied by standard spindle sizes (min(disk size))
ECC: =(Total Slots for indicated model – “# Disks”) * (min(spindle_size)/2)
VCS: =Max Capacity – Current Capacity
Total Capacity: All usable* storage in the array including unconfigured, unallocated, reserved free, and used storage.
ECC: =(“Configured – Usable (GB)” + (“Unconfigured – Total (GB)”/2) or on SP1 of ECC “Configured – Usable” may not be accurate use: =(“Local Replica Usable”+“Remote Replica Usable”+”Primary Usable”) + (“Unconfigured – Total (GB)”/2)
VCS: Current Capacity
Unassigned: Unconfigured usable* storage within the array and configured usable* storage not presented to any host or reserved for any application including COD.
ECC: =“Unallocated – Unmapped – Usable – Total (GB)”
VCS: =Current Capacity – (Allocated Storage + Lost to Overhead)
Reserved : Usable* GB of Configured storage which is either reserved for a specific application or reserved through presentation to a specific host or front end adapter which is not being physically used or is not under volume management on a host.
ECC: =“Accessible – Free – No Vol Grps (GB)” + “Unallocated – Mapped – Usable (GB)”
VCS: Lost to Overhead = Current Capacity * 0.25
Assigned: Configured usable* storage presented to a host and in use, including free space within the file system and LVM structures.
ECC:= Total Storage – (Unallocated + reserved storage)
VCS: Allocated Storage
These metrics may be extracted and reported on in many ways including reporting tools like Birt or Crystal Reports, php and others . ECC provides an oracle API which creates an ODBC connection and other solutions either provide database connectivity or the information could be scraped from web pages if necessary. (The reports below were generated using an access database with table links through ODBC to a couple ECC implementations and data access pages. I will work on a follow up article on how to do this.)
Now that we have defined the formulas for extracting some metrics from our storage platforms the next step is to define some reports. The first report should present an overall picture of capacity. A pie chart is best suited for presenting an over-all picture of the percentage of capacity utilization. The chart below is an example of a Storage Capacity Pie Chart.

Storage Capacity Pie Chart
The next report should drill down into the capacity to further break it out by location and storage array. A percentage bar chart will normalize the data to a percentile and then the actual capacities may be inserted as labels. This will allow analysts to quickly assess where resources are running low.

Storage Capacity Bar Chart
The next task would be to further drill down by expanding the categorization of the storage resources by tier. This assumes that a tiered approach has been applied to storage resources. Most large corporations have categorized storage array platforms into tiers or classes of storage based upon architecture, performance and resiliency. This classification of storage has become more complicated in the past five years with the advent of unified storage, cloud storage and storage visualization. These technologies have introduced the ability to provide varied classifications of storage within the same storage subsystem. Since these technologies have complicated tiered storage capacity reporting the topic is somewhat beyond the scope of this article and will be handled in future articles.