<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>DCA Wired</title>
	<atom:link href="http://dcawired.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://dcawired.com</link>
	<description>Information Technology Perspective</description>
	<lastBuildDate>Sun, 13 Nov 2011 03:32:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='dcawired.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/04ea3af0a790c0d27771a38066e5c739?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>DCA Wired</title>
		<link>http://dcawired.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://dcawired.com/osd.xml" title="DCA Wired" />
	<atom:link rel='hub' href='http://dcawired.com/?pushpress=hub'/>
		<item>
		<title>Fixing Excel Data &#8211; Text to Rows and Fill Down</title>
		<link>http://dcawired.com/2011/09/30/massaging-excel-data-text-to-rows-and-fill-down/</link>
		<comments>http://dcawired.com/2011/09/30/massaging-excel-data-text-to-rows-and-fill-down/#comments</comments>
		<pubDate>Fri, 30 Sep 2011 03:13:03 +0000</pubDate>
		<dc:creator>Devin Adint</dc:creator>
				<category><![CDATA[Excel Procedures]]></category>
		<category><![CDATA[IT Tools]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[Excel Data]]></category>
		<category><![CDATA[Fill Down Copy]]></category>
		<category><![CDATA[Text to Rows]]></category>

		<guid isPermaLink="false">http://dcawired.com/?p=585</guid>
		<description><![CDATA[Excel spreadsheets are used at almost every level of business.  As such spreadsheets are created by people with different goals and different levels of familiarity.  What many don&#8217;t realize is that excel is primarily a data processing and information storage tool.  Excel is much like a database and is most effective when data is stored [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dcawired.com&amp;blog=5111608&amp;post=585&amp;subd=dcawired&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img">
<div class="wp-caption alignright" style="width: 250px"><a href="http://en.wikipedia.org/wiki/File:Pivottable-Flatdata.PNG"><img class=" " title="Pivot table" src="http://upload.wikimedia.org/wikipedia/en/thumb/8/81/Pivottable-Flatdata.PNG/300px-Pivottable-Flatdata.PNG" alt="Pivot table" width="240" height="112" /></a><p class="wp-caption-text">Image via Wikipedia</p></div>
</div>
<p>Excel spreadsheets are used at almost every level of business.  As such spreadsheets are created by people with different goals and different levels of familiarity.  What many don&#8217;t realize is that excel is primarily a data processing and information storage tool.  Excel is much like a database and is most effective when data is stored as one would store information in a database.  The problem is that data in a database isn&#8217;t necessarily that presentable.  So many people input information into excel and attempt to format it for presentation.  Many people create groups and merge cells or insert multiple entries into a single cell and then apply special formatting.  The down side is when someone wants to use the graphing and report building features of excel the combined or grouped data isn&#8217;t accessible because it is either missing from cells due to cell merging or each entry isn&#8217;t accessible on its own because it has been entered into a single cell.  The best method is to treat the data as a database and ensure that each cell in a column has data in it relative to the cells in the row so that a complete record is available.  Then use the grouping and pivot table features to create reports that are presentable and much more functional because the data can be grouped and categorized in two dimensions.</p>
<p>But what do you do if you get  a spreadsheet in which data has been either entered into one cell but not in subsequent rows in some cases or has been merged through rows?  The first thing to do is to select the column with merged data and unmerge it. But then what now.  You can either go through manually massage the data adding rows and extracting the entries entered in a single cell and use a fill down copy where an entry was merged through rows.  But if you have hundreds of rows this can be time consuming.   I have run into this problem frequently enough that I have taken some time out to write a couple functions that spead up the process.</p>
<h3><span style="color:#666699;">Text To Rows</span></h3>
<p>If you have ever imported data into excel you may be familiar with the text to columns function under the data menu.  If you didn&#8217;t specify delimiters or missed one while importing you can select the column with multiple fields of data and tell it what to split the data on and it will automatically split the data across columns.  But what about rows.  The procedure below may be run on any column where multiple entries have been combined into single cells.  The default delimiter is a comma but this may be modified to meet your particular needs.</p>
<div style="color:#555555;background-color:#e0e0e0;border:#000000 2px dashed;font:8pt/8pt Courier New;padding:2px 6px 4px;">
<p>Sub TxtToRows()</p>
<p>&#8216; Procedure TxtToRow<br />
&#8216; Author : Devin Adint<br />
&#8216; Purpose: This procedure extracts multiple entries in a single cell into<br />
&#8216; newly inserted subsequent rows, preserving the contents of the<br />
&#8216; surrounding cells in that row. This procedure helps to address<br />
&#8216; times when excel isn&#8217;t treated as a database and splits out the<br />
&#8216; unique data into its own row as one would expect in a data set.<br />
&#8216;<br />
Dim ThisRow As Long<br />
Dim vCellData As Variant<br />
ORIGSHEET = ActiveSheet.Name<br />
ORIGADDR = ActiveCell.Address<br />
CrntRow = ActiveCell.Row<br />
CrntCol = ActiveCell.Column<br />
ThisRow = CrntRow</p>
<p>With ActiveSheet</p>
<p style="padding-left:25px;">Do While .Cells(ThisRow, CrntCol) &lt;&gt; &#8220;&#8221;<br />
&#8216; Extract cell data into array<br />
vCellData = Split(Cells(ThisRow, CrntCol), &#8220;,&#8221;)<br />
vCellEntryCount = UBound(vCellData)<br />
&#8216; loop through each delimited value from cell stored in the array<br />
&#8216; and copy the current line inserting a copy below<br />
&#8216; then over-write the columns cell with the delimited value<br />
&#8216; If there was only 1 entry move to the next row<br />
If vCellEntryCount = 0 Then</p>
<p style="padding-left:50px;">ThisRow = ThisRow + 1</p>
<p style="padding-left:25px;">Else &#8216; else loop through the entries adding a new row</p>
<p style="padding-left:50px;">For vCellCount = 0 To vCellEntryCount</p>
<p style="padding-left:75px;">vCellEntry = vCellData(vCellCount)<br />
&#8216; If this is not the last entry insert a copy of the current line<br />
&#8216; for the next entry<br />
If vCellCount &lt; vCellEntryCount Then</p>
<p style="padding-left:100px;">.Rows(ThisRow).Copy<br />
.Rows(ThisRow).Insert Shift:=xlDown</p>
<p style="padding-left:75px;">End If</p>
<p style="padding-left:75px;">&#8216; Update the current line of the multi-entry cell being processed<br />
&#8216; with the current entry<br />
.Cells(ThisRow, CrntCol) = vCellEntry<br />
&#8216; Move to the next row<br />
ThisRow = ThisRow + 1</p>
<p style="padding-left:50px;">Next</p>
<p style="padding-left:25px;">End If<br />
Loop</p>
<p>End With<br />
End Function
</p></div>
<p>&nbsp;</p>
<h3><span style="color:#666699;">Fill Down Copy</span></h3>
<p>If you have data in one cell which is either merged through rows or in a single cell with subsequent rows blank you will not be able to group by this column until each record has the corresponding data in each cell in the column.  You could page down and select each populated cell and then double click the little corner box to perform a fill down copy or you can execute a procedure.  The procedure below may be run to copy down data into unpopulated cells.</p>
<div style="color:#555555;background-color:#e0e0e0;border:#000000 2px dashed;font:8pt/8pt Courier New;padding:2px 6px 4px;">
<p>Sub FillDownCopy()</p>
<p>&#8216; Procedure FillDownCopy<br />
&#8216; Author : Devin Adint<br />
&#8216; Purpose: This procedure moves down a column copying down data<br />
&#8216; into unpopulated columns</p>
<p>While Selection.End(xlDown).Row &lt; 1048575</p>
<p style="padding-left:25px;">If (ActiveCell.Offset(1, 0).Value2 = &#8220;&#8221;) Then</p>
<p style="padding-left:50px;">Selection.Copy<br />
Range(ActiveCell.Offset(1, 0).Address).Select<br />
Selection.End(xlDown).Select<br />
Range(ActiveCell.Offset(-1, 0).Address).Select<br />
Range(Selection, Selection.End(xlUp)).Select<br />
ActiveSheet.Paste<br />
Application.CutCopyMode = False<br />
Selection.End(xlDown).Select</p>
<p style="padding-left:25px;">Else</p>
<p style="padding-left:50px;">Range(ActiveCell.Offset(1, 0).Address).Select</p>
<p style="padding-left:25px;">End If</p>
<p>Wend<br />
End Function</p>
</div>
<br />Filed under: <a href='http://dcawired.com/category/it-tools/excel-procedures/'>Excel Procedures</a>, <a href='http://dcawired.com/category/it-tools/'>IT Tools</a> Tagged: <a href='http://dcawired.com/tag/excel/'>excel</a>, <a href='http://dcawired.com/tag/excel-data/'>Excel Data</a>, <a href='http://dcawired.com/tag/fill-down-copy/'>Fill Down Copy</a>, <a href='http://dcawired.com/tag/text-to-rows/'>Text to Rows</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dcawired.wordpress.com/585/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dcawired.wordpress.com/585/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/dcawired.wordpress.com/585/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/dcawired.wordpress.com/585/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/dcawired.wordpress.com/585/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/dcawired.wordpress.com/585/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/dcawired.wordpress.com/585/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/dcawired.wordpress.com/585/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/dcawired.wordpress.com/585/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/dcawired.wordpress.com/585/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/dcawired.wordpress.com/585/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/dcawired.wordpress.com/585/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/dcawired.wordpress.com/585/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/dcawired.wordpress.com/585/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dcawired.com&amp;blog=5111608&amp;post=585&amp;subd=dcawired&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dcawired.com/2011/09/30/massaging-excel-data-text-to-rows-and-fill-down/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fcb2522560b707f366239aca6c741cec?s=96&#38;d=&#38;r=G" medium="image">
			<media:title type="html">Devin C. Adint</media:title>
		</media:content>

		<media:content url="http://upload.wikimedia.org/wikipedia/en/thumb/8/81/Pivottable-Flatdata.PNG/300px-Pivottable-Flatdata.PNG" medium="image">
			<media:title type="html">Pivot table</media:title>
		</media:content>
	</item>
		<item>
		<title>Optimizing Disk IO Through Abstraction</title>
		<link>http://dcawired.com/2010/02/02/optimizing-disk-io-through-abstraction/</link>
		<comments>http://dcawired.com/2010/02/02/optimizing-disk-io-through-abstraction/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 20:34:03 +0000</pubDate>
		<dc:creator>Devin Adint</dc:creator>
				<category><![CDATA[Storage Administration]]></category>
		<category><![CDATA[System Administration]]></category>
		<category><![CDATA[Disk IO]]></category>
		<category><![CDATA[IO optimization]]></category>
		<category><![CDATA[storage layout]]></category>
		<category><![CDATA[volume management]]></category>

		<guid isPermaLink="false">http://dcawired.com/?p=496</guid>
		<description><![CDATA[To Engineer or Not To&#8230; When disk capacity is released to a new application or service many times the projects do not consider how best to use the storage that has been provided. Essentially the approaches fall into one of two schools of thought. The first is to reduce upfront engineering into a couple design [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dcawired.com&amp;blog=5111608&amp;post=496&amp;subd=dcawired&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h3>To Engineer or Not To&#8230;</h3>
<p>When disk capacity is released to a new application or service many times the projects do not consider how best to use the storage that has been provided.  Essentially the approaches fall into one of two schools of thought.  The first is to reduce upfront engineering into a couple design options and resolve issues when they arise.  The second is to engineer several solution sets with variable parameters that will provide a broader pallet of solutions and policies from which an appropriate solution may be selected.</p>
<h4>Reduced Simplified Engineering</h4>
<ul>
<li>Apply one of a couple infrastructure designs to a project.</li>
<li>This approach involves less work upfront, has a simpler execution and involves less work gathering requirements.</li>
<li>Potentially more time and effort will be spent resolving issues when resources and design are insufficient.</li>
</ul>
<h4>Solution Engineering</h4>
<ul>
<li>Develop several standard solution sets and models and document policies and procedures which will be used to fit applications/components into these models.</li>
<li>This approach involves substantial requirements gathering and entails more work and complexity.</li>
<li>The result is a more efficient use of resources and less time spent resolving issues because the design and resources should better fit the system needs.</li>
</ul>
<h3>Engineering Disk Performance</h3>
<p>When it comes to disk performance the cost of not considering how to optimize the use of the storage infrastructure can be very significant.  Consider the IO operation cycle time table.  The order of magnitude difference between CPU operations, memory operations and that of device IO is significant.  If cycle time is scaled up to a second the comparative difference is between seconds and that of device IO taking months to years to complete.  Poorly orchestrated device use cases can result in significant impact.  A memory miss will only cost a few hundred nanoseconds but a disk IO cycle miss could cost a couple dozen milliseconds. To put it in the perspective of scale we are talking about the difference between four minutes and eight months.</p>
<div class="wp-caption alignnone" style="width: 496px">&nbsp;</p>
<table id="table1" style="border-collapse:collapse;height:122px;" border="1" width="486">
<tbody>
<tr>
<td style="border-bottom-style:double;border-bottom-width:3px;" width="120"><strong><span style="font-family:Arial;">Device</span></strong></td>
<td style="border-bottom-style:double;border-bottom-width:3px;" width="131" align="right"><strong><span style="font-family:Arial;">Cycle Time</span></strong></td>
<td style="border-bottom-style:double;border-bottom-width:3px;" width="160" align="right"><strong><span style="font-family:Arial;">Cycle in Seconds</span></strong></td>
<td style="border-bottom-style:double;border-bottom-width:3px;" colspan="2" align="right"><strong><span style="font-family:Arial;">Scaled Cycle</span></strong></td>
</tr>
<tr>
<td width="120"><span style="font-family:Arial;">CPU</span></td>
<td width="131" align="right"><span style="font-family:Arial;">1 nanosecond </span></td>
<td width="160" align="right"><span style="font-family:Arial;">1.00&#215;10^-9</span></td>
<td width="68" align="right"><span style="font-family:Arial;">1</span></td>
<td width="109" align="left"><span style="font-family:Arial;"> seconds</span></td>
</tr>
<tr>
<td width="120"><span style="font-family:Arial;">CPU Register</span></td>
<td width="131" align="right"><span style="font-family:Arial;">5 nanoseconds</span></td>
<td width="160" align="right"><span style="font-family:Arial;">5.00&#215;10^-9</span></td>
<td width="68" align="right"><span style="font-family:Arial;">5</span></td>
<td width="109" align="left"><span style="font-family:Arial;"> seconds</span></td>
</tr>
<tr>
<td width="120"><span style="font-family:Arial;">Memory</span></td>
<td width="131" align="right"><span style="font-family:Arial;">100 nanoseconds</span></td>
<td width="160" align="right"><span style="font-family:Arial;">1.00&#215;10^-7</span></td>
<td width="68" align="right"><span style="font-family:Arial;">2</span></td>
<td width="109" align="left"><span style="font-family:Arial;"> minutes</span></td>
</tr>
<tr>
<td width="120"><span style="font-family:Arial;">Disk</span></td>
<td width="131" align="right"><span style="font-family:Arial;">10 milliseconds</span></td>
<td width="160" align="right"><span style="font-family:Arial;">1.00&#215;10^-2</span></td>
<td width="68" align="right">4</td>
<td width="109" align="left"><span style="font-family:Arial;"> months</span></td>
</tr>
<tr>
<td width="120"><span style="font-family:Arial;">NFS Op</span></td>
<td width="131" align="right"><span style="font-family:Arial;">50 milliseconds</span></td>
<td width="160" align="right"><span style="font-family:Arial;">5.00&#215;10^-2</span></td>
<td width="68" align="right">2</td>
<td width="109" align="left"><span style="font-family:Arial;"> years</span></td>
</tr>
<tr>
<td width="120"><span style="font-family:Arial;">Tape</span></td>
<td width="131" align="right"><span style="font-family:Arial;">10 seconds</span></td>
<td width="160" align="right"><span style="font-family:Arial;">1.00&#215;10^1</span></td>
<td width="68" align="right"><span style="font-family:Arial;">3</span></td>
<td width="109" align="left"><span style="font-family:Arial;"> centuries</span></td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p><p class="wp-caption-text">IO Op Cycle Time Scales</p></div>
<p>How IO devices are use by systems can impact device performance in several ways.  First, between the system and the device is usually a cache which attempts to predict what data will be needed and pre-stages it in faster memory in anticipation of any potential access requests.  Traditional cache algorithms identify data that is accessed in ordered proximity on the IO device and then reads ahead to pre-fetch subsequent data.  These pages are then left in cache until they are aged out due to subsequent access to other data on the device.  The result is that sequential IOs will experience better performance because the data will be in cache.  However random IOs and IOs that occur sporadically will require direct access to the slower storage device.  Newer caching algorithms such as SARC have algorithms designed to improve anticipation of random IO but are no substitute for considering data layout on the devices.  An example of incorrect data layout would be over aggressive server side striping  using volume management which can cause sequential IOs on the system to appear as random on the storage device; in essence, circumventing caching.  The second area of consideration is the storage device back-end performance.  Back-end performance can be impacted by IO sizes and distributions across array boundaries.  The most significant impact comes from conflicting IOs that request data from different portions of storage device media at the same time.  The result is what is termed &#8220;thrashing&#8221; because the device access mechanism must be readdressed from one part of the media to another requiring increased times for continuous repositioning.</p>
<h3>Optimizing the Layout of Information on Storage</h3>
<div class="wp-caption alignright" style="width: 286px"><a href="http://lh3.ggpht.com/_0dog6GbAQhg/S2cP1iztSpI/AAAAAAAAAp4/ogXNa7ZhE1Q/s640/ComponentActivities.jpg"><img class=" " title="Component Activity Diagram" src="http://lh3.ggpht.com/_0dog6GbAQhg/S2cP1iztSpI/AAAAAAAAAp4/ogXNa7ZhE1Q/s640/ComponentActivities.jpg" alt="Component Activity Diagram" width="276" height="220" /></a><p class="wp-caption-text">Component Activity Diagram</p></div>
<p>With all these potential factors that can impact storage performance how can a method for optimizing how storage devices are used be formulated?  Some of the decisions have to be made on the back-end by the Storage Analysts that configure and present the storage to hosts.  Once storage is presented to a host how the storage is used in light of these factors needs to be considered.  Back-end configuration considerations vary by storage device implementation but most storage sub-systems have facilities that automate redistribution of data to balance IOs across the device media.  Directions on optimum logical device configuration on the storage system are available in architecture documentation and from field engineers.  But once these configured devices are made available to a system a method of laying out data needs to be considered that will increase sequential IO and reduce IO conflict potential.  To develop a standard method rather then create unique custom solutions for each application component is more advantageous to establish a more predictable, re-creatable and supportable environment.  This method should consider three factors.  It should consider what the component does, what the component stores and how large of an implementation the component is supporting.</p>
<h3>Categorizing Component Functionality</h3>
<div class="wp-caption alignleft" style="width: 250px"><a href="http://lh5.ggpht.com/_0dog6GbAQhg/S2cP1ihzNbI/AAAAAAAAAp0/3EFdY0rhgnI/s512/AppComponentTypes.jpg"><img class="  " title="Application Functional Categories" src="http://lh5.ggpht.com/_0dog6GbAQhg/S2cP1ihzNbI/AAAAAAAAAp0/3EFdY0rhgnI/s512/AppComponentTypes.jpg" alt="Application Functional Categories" width="240" height="246" /></a><p class="wp-caption-text">Application Functional Categories</p></div>
<p>The first step is to determine what an application component does.  Some applications are a single entity while other applications are non-monolithic and break the work they perform down into subsets of functionality performed by components.  All applications or component functionality can be categorized into one or more of four possible categories. The first category is client servers which include; web servers, web portals, portal servers, SNA gateways, and Citrix servers. This category is mostly comprised of executables and static configurations or models with some temporary file requirements. The next category of functionality is utility servers.  Utility servers provide peripheral or supporting functionality such as reporting, authentication and include; LDAP servers, Active Directory servers, messaging, certificate servers, file transfer servers, EDI and OLAP servers.  This category has a broad constituency including executables along with configuration files, temporary file space and data space for certificates and directory structures.  The IO profiles can range from sporadic to rather intense when dealing with OLAP servers.  The next category is applications servers that provide framework for software development through distributed objects and APIs that provide business logic and business process functionality and include, IBM Websphere, GlassFish, DataStage TX, BEA Tuxedo, and AX Workflow manager.  This category generally has executables, configuration files and uses some temporary space but interfaces with data servers for any major data functions.  The final category of application components are data servers that manage access to structured, semi-structured and unstructured data.  Data servers include databases such as Oracle and DB2 and document management applications such as Documentum.  This category generally includes executables with configuration files and large data stores that are managed by a data engine and tend to perform the brunt of data manipulation.  Categorizing components using these definitions should help to narrow down storage infrastructure performance requirements.</p>
<h3>Categorizing Storage IO Activity</h3>
<p>The next step is to determine what types of data is stored and how it is accessed by a component.  The types of data that an application component stores and accesses fall into one of four categories based upon how the data is used and accessed, see the diagram below of the categories.</p>
<div class="wp-caption alignnone" style="width: 473px"><a href="http://lh3.ggpht.com/_0dog6GbAQhg/S2cbc4HQuFI/AAAAAAAAAow/K9Snfdw6jzo/s720/StorageAccessTypes.jpg"><img class="  " title="Types of Storage Access" src="http://lh3.ggpht.com/_0dog6GbAQhg/S2cbc4HQuFI/AAAAAAAAAow/K9Snfdw6jzo/s720/StorageAccessTypes.jpg" alt="Types of Storage Access" width="463" height="183" /></a><p class="wp-caption-text">Types of Storage Access</p></div>
<p>The first category is static data which is comprised of executables, configuration files, models, static images, templates, style sheets, etc.  This data usually gets loaded once when a process starts and much of what is needed is instantiated in the systems memory as a running process or cached frequently used templates and images files and therefore IO happens at start up and then less frequently there after.  The next category is transient data which is comprised of temporary files, database transaction logs, database redo logs, spools, queues, temporary database tables etc. This data is usually only accessed a few times and then either purged or over-written in a cyclical file.   The IO profile is more random and sporadic and in the case of data servers usually works in concert with access to retained active data.  The next category is active data which includes RDBMS data, HDMS data, file shares, document or image stores, etc.  This is the retained working data set used by an application component and typically involves heavier sequential IO operations.  The final category is reference data and includes RDBMS indexes, HDMS directory data and meta data and is generally referenced in concert with access to retained active data to identify what active data needs to be accessed based upon criteria stored in the reference data.  Categorizing the file systems/directories specified by a component into one of these categories will help to identify potential data structuring to achieve an isolation of IO to reduce contention and to consolidate similar IO patterns there by increasing sequential IO potential.</p>
<h3>Determining a Size Model</h3>
<table id="table1" style="border-collapse:collapse;height:86px;" border="1" width="299">
<tbody>
<tr>
<td width="86" bgcolor="#9999ff"><span style="font-family:Arial;">Model</span></td>
<td bgcolor="#9999ff"><span style="font-family:Arial;">Description</span></td>
</tr>
<tr>
<td width="86"><span style="font-family:Arial;">Small</span></td>
<td><span style="font-family:Arial;">50 &#8211; 256 GB or up to 1200 IO/s</span></td>
</tr>
<tr>
<td width="86"><span style="font-family:Arial;">Medium</span></td>
<td><span style="font-family:Arial;">256 &#8211; 750 GB or up to 1800 IO/s</span></td>
</tr>
<tr>
<td width="86"><span style="font-family:Arial;">Large</span></td>
<td><span style="font-family:Arial;">750 &#8211; 5000GB or up to 3600 IO/s</span></td>
</tr>
<tr>
<td width="86"><span style="font-family:Arial;">Huge</span></td>
<td><span style="font-family:Arial;">&gt;5000GB or &gt;3600 IO/s</span></td>
</tr>
</tbody>
</table>
<p>Now that we know what the application component does and we know what types of data files it uses the next step is to determine how large of an implementation this component is supporting.  The size should be made based upon the size of the file spaces and/or  the estimated number of IOs.  This model should then be used to determine if the file spaces need to be separated and to what extent based upon the storage activity category.</p>
<h3>Using OS Storage Abstractions to Optimize Layout and Use</h3>
<p>Finally, now that much has been learned about the component being implemented and a determination of file space and implementation size; what mechanism may be used to manage IOs?  Most operating systems have facilities for creating logical abstractions of storage.  The diagram of OS Storage Abstraction below depicts a typical volume management facility.  This is true of HP LVM, AIX Volume Manager and Veritas Volume Manager and is even true of Solaris z-pools and zfs file systems.</p>
<div class="wp-caption alignnone" style="width: 472px"><a href="http://lh6.ggpht.com/_0dog6GbAQhg/S2cP15CL6vI/AAAAAAAAAqA/PffiQOKP84I/s720/StorageAbstraction.jpg"><img class="     " title="OS Storage Abstraction Functionality" src="http://lh6.ggpht.com/_0dog6GbAQhg/S2cP15CL6vI/AAAAAAAAAqA/PffiQOKP84I/s720/StorageAbstraction.jpg" alt="OS Storage Abstraction Functionality" width="462" height="226" /></a><p class="wp-caption-text">OS Storage Abstraction Functionality</p></div>
<p>These abstraction structures provide the means to limit storage access in three ways.  Device groups, Volume Groups or Windows labeled Disks provide a means of isolating IO.  The logical partitions are distributed across the presented disks based upon several parameters but all their IO is limited to the devices within the group.  The logical partitions and file systems limit data growth and directories provide a logical segregation of files into hierarchies for organizational purposes.  Once an implementation size is determined a model may be assigned.  In a small model all the storage activity categories may be handled by a single device group.  In a medium model the reference and transient data should be placed in a device group and the static and active data placed in another device group.  In a large model Static and reference data are placed in one device group, transient data into a second and active data into a third device group and file systems created per specifications.  Finally if a component is determined to have a huge IO / Size requirement four or even more device groups should be considered including multiple partitioning of transaction, active and reference data.</p>
<p>Division of the volume groups may be tackled in several ways.  One way would be to track all device group names and their purpose and add new device groups based upon the new application components the service or application they are supporting.  This becomes more complex in shared service environments where one system may host several instances of a component.  An easier way to accomplish this is through using a device group naming convention.  With a naming scheme analysts don&#8217;t need to know what device groups already exist, at a glance how storage is used on a system can be seen from the device group names and it eliminates the creation of unnecessary volume groups since any new file systems that fall into pre-existing categories will end up being assigned to the appropriate pre-existing device group.  Essentially the used device group names are self-tracking since the application of the naming process should result in the same name or a new name when applicable.  The implementation of the naming scheme can be automated in a spreadsheet form using lookups based upon the models and categorizations of the file systems requested in the form.  The naming scheme I have used  has the following form:</p>
<ul>
<li>[app abbreviation up to 4 characters][00 Instance number][storage activity category(STAR)][Environment Code]</li>
<li>example: Oracle People Soft Data Table Production = orps1ap<br />
Oracle People Soft Index Table Production = orps1rp</li>
</ul>
<table id="table1" style="border-collapse:collapse;" border="1" width="77%">
<tbody>
<tr>
<td bgcolor="#6666ff"><strong><span style="font-family:Arial;">Component</span></strong></td>
<td bgcolor="#6666ff"><strong><span style="font-family:Arial;">Code</span></strong></td>
<td bgcolor="#6666ff"><strong><span style="font-family:Arial;">Storage Access</span></strong></td>
<td bgcolor="#6666ff"><strong><span style="font-family:Arial;">Code</span></strong></td>
<td bgcolor="#6666ff"><strong><span style="font-family:Arial;">Environment</span></strong></td>
<td bgcolor="#6666ff"><strong><span style="font-family:Arial;">Code</span></strong></td>
</tr>
<tr>
<td><span style="font-family:Arial;">Client</span></td>
<td><span style="font-family:Arial;">C</span></td>
<td><span style="font-family:Arial;">Static</span></td>
<td><span style="font-family:Arial;">S</span></td>
<td><span style="font-family:Arial;">Development</span></td>
<td><span style="font-family:Arial;">D</span></td>
</tr>
<tr>
<td><span style="font-family:Arial;">Utility</span></td>
<td><span style="font-family:Arial;">U</span></td>
<td><span style="font-family:Arial;">Transient</span></td>
<td><span style="font-family:Arial;">T</span></td>
<td><span style="font-family:Arial;">Integration Test</span></td>
<td><span style="font-family:Arial;">I</span></td>
</tr>
<tr>
<td><span style="font-family:Arial;">Application</span></td>
<td><span style="font-family:Arial;">A</span></td>
<td><span style="font-family:Arial;">Active</span></td>
<td><span style="font-family:Arial;">A</span></td>
<td><span style="font-family:Arial;">Acceptance Test</span></td>
<td><span style="font-family:Arial;">A</span></td>
</tr>
<tr>
<td><span style="font-family:Arial;">Data</span></td>
<td><span style="font-family:Arial;">D</span></td>
<td><span style="font-family:Arial;">Reference</span></td>
<td><span style="font-family:Arial;">R</span></td>
<td><span style="font-family:Arial;">Quality Assurance</span></td>
<td><span style="font-family:Arial;">Q</span></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td><span style="font-family:Arial;">Production</span></td>
<td><span style="font-family:Arial;">P</span></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td><span style="font-family:Arial;">Disaster Recovery<br />
</span></td>
<td><span style="font-family:Arial;">R</span></td>
</tr>
</tbody>
</table>
<h3>Conclusion</h3>
<p>Exploring the structures that comprise an application component and then making informed judgments based upon the size of an implementation will result in better performance and a more efficient use of resources.  The isolation of IOs by the type of data access will place like data together increasing sequential access potential and decreasing the potential for IO conflicts due to concerted access between active, transient and reference data types.</p>
<p>Bellow is a link to a worksheet which I had created a while ago to determine volume group names based upon data types.</p>
<p><a href="http://dcawired.files.wordpress.com/2010/12/vg-san-layout-worksheet.xls"><img class="alignnone" title="Excel" src="http://officeimg.vo.msecnd.net/en-us/files/905/850/ZA102383410.png" alt="VG SAN Layout Worksheet" width="48" height="46" /></a>VG Storage Layout Excel Worksheet</p>
<br />Filed under: <a href='http://dcawired.com/category/storage-administration/'>Storage Administration</a>, <a href='http://dcawired.com/category/system-administration/'>System Administration</a> Tagged: <a href='http://dcawired.com/tag/disk-io/'>Disk IO</a>, <a href='http://dcawired.com/tag/io-optimization/'>IO optimization</a>, <a href='http://dcawired.com/tag/storage-layout/'>storage layout</a>, <a href='http://dcawired.com/tag/volume-management/'>volume management</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dcawired.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dcawired.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/dcawired.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/dcawired.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/dcawired.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/dcawired.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/dcawired.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/dcawired.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/dcawired.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/dcawired.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/dcawired.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/dcawired.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/dcawired.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/dcawired.wordpress.com/496/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dcawired.com&amp;blog=5111608&amp;post=496&amp;subd=dcawired&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dcawired.com/2010/02/02/optimizing-disk-io-through-abstraction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fcb2522560b707f366239aca6c741cec?s=96&#38;d=&#38;r=G" medium="image">
			<media:title type="html">Devin C. Adint</media:title>
		</media:content>

		<media:content url="http://lh3.ggpht.com/_0dog6GbAQhg/S2cP1iztSpI/AAAAAAAAAp4/ogXNa7ZhE1Q/s640/ComponentActivities.jpg" medium="image">
			<media:title type="html">Component Activity Diagram</media:title>
		</media:content>

		<media:content url="http://lh5.ggpht.com/_0dog6GbAQhg/S2cP1ihzNbI/AAAAAAAAAp0/3EFdY0rhgnI/s512/AppComponentTypes.jpg" medium="image">
			<media:title type="html">Application Functional Categories</media:title>
		</media:content>

		<media:content url="http://lh3.ggpht.com/_0dog6GbAQhg/S2cbc4HQuFI/AAAAAAAAAow/K9Snfdw6jzo/s720/StorageAccessTypes.jpg" medium="image">
			<media:title type="html">Types of Storage Access</media:title>
		</media:content>

		<media:content url="http://lh6.ggpht.com/_0dog6GbAQhg/S2cP15CL6vI/AAAAAAAAAqA/PffiQOKP84I/s720/StorageAbstraction.jpg" medium="image">
			<media:title type="html">OS Storage Abstraction Functionality</media:title>
		</media:content>

		<media:content url="http://officeimg.vo.msecnd.net/en-us/files/905/850/ZA102383410.png" medium="image">
			<media:title type="html">Excel</media:title>
		</media:content>
	</item>
		<item>
		<title>Storage Capacity KPIs</title>
		<link>http://dcawired.com/2010/01/22/storage-capacity-kpis/</link>
		<comments>http://dcawired.com/2010/01/22/storage-capacity-kpis/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 20:08:20 +0000</pubDate>
		<dc:creator>Devin Adint</dc:creator>
				<category><![CDATA[Capacity Planning]]></category>
		<category><![CDATA[Capacity]]></category>
		<category><![CDATA[KPI]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Storage Reporting]]></category>

		<guid isPermaLink="false">http://dcawired.com/?p=403</guid>
		<description><![CDATA[When I first started working with Distributed Storage for many years I worked with Asset Management and various other departments to answer the question, &#8220;How much storage do we have available and how much is used?&#8221;. The problem was depending upon how the numbers were sumarized and presented various impressions were left with management that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dcawired.com&amp;blog=5111608&amp;post=403&amp;subd=dcawired&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When I first started working with Distributed Storage for many years I worked with Asset Management and various other departments to answer the question, &#8220;How much storage do we have available and how much is used?&#8221;.  The problem was depending upon how the numbers were sumarized and presented various impressions were left with management that didn&#8217;t communicate a complete picture.  This invariably led to inaccurate assumptions that required many subsequent explanations.  If we are to overcome these problems and communicate a clear picture of storage capacity we must address several issues.</p>
<p>The first issue to be addressed is whether to report storage capacities as raw capacity or usable capacity.  The simplest method is to report raw but since these numbers do not take into account protection and management overhead service delivery management is tempted to think that more storage is available then what is available in reality.  If these overheads aren&#8217;t taken into account when projecting future demand the projected supply may be overstated.  For this reason it is probably best to provide facilities for reporting both raw numbers which will be used more in the day to day support and usable numbers for planning and estimation  purposes.</p>
<p><span id="more-403"></span>To further complicate this problem most corporations have heterogeneous storage environments.  Each vendor uses their own terminology for features and metrics.  For example EMC uses the terms SRDF and TimeFinder for replicated storage and in EMC Control Center reports capacity used by these facilities as Local Replica Usable and Remote Replica Usable where as IBM terms replicated storage as Copy Services and storage allocated to Copy Services is reported as Remote Copy Usable and Local Copy Usable.</p>
<p>The third and final problem to overcome is one of perspective.  For the most part storage technicians when talking about capacity are speaking from the stand point of the storage hardware and have in mind spindles and logical abstraction of this storage by disk subsystems for presentation to host systems.  Where-as many service delivery analysts and IT system users look at storage from the stand point of how it has been further abstracted by host systems which have formatted the capacity into file systems.  One group is thinking about file space and the other is thinking about drive space.  I have seen this problem of perspective result in confusion when management looks at a new application that needs X amount of storage and they see free space on another system and erroneously assumed that the free space is just available to any other system.  The truth is; free space within a file system may only be used from that file system, free space on a disk partially allocated may only be used by either a new file system or an existing file system.  Free space on a disk assigned to a system but not yet allocated through logical formatting may be used by another system but requires additional work to remove it from that system and place it on another system.  In other words there are degrees of freedom with regard to available capacity.</p>
<p>To resolve some of these problems the first step is to define a terminology which is vendor agnostic.  Storage capacity terms must first start out at a high level defining terms that generically encompass functionality implemented by any storage vendor.  To resolve the issue of degrees of available capacity the relationship of these terms to over-all capacity must be established.</p>
<div class="wp-caption alignright" style="width: 324px"><img title="Storage Capacity KPI Hierarchy" src="http://lh6.ggpht.com/_0dog6GbAQhg/Sqp_ImBBk-I/AAAAAAAAAQY/x_R4IvV5Lj4/s512/CapacityKPIs2.jpg" alt="Storage Capacity KPI Hierarchy" width="314" height="322" /><p class="wp-caption-text">Storage Capacity KPI Hierarchy</p></div>
<p>A generic set of terms is depicted in an hierarchy in the &#8220;Storage Capacity KPI Hierarchy&#8221; diagram.  We will quickly review each term.  Every storage solution has a potential capacity, the most amount of storage that can be installed in a storage sub-system.  Drilling down into a storage sub system what ever capacity is already installed in a sub-system is termed the total capacity and anything left over related to the potential capacity would be called the expansion capacity.  Within the total capacity some of that capacity is considered configured for use and whatever remains is considered unconfigured.  Then within the the configured capacity some of that capacity may be reserved for maintenance, replication, protection (RAID sparing) or other over-head, some is assigned and what remains is unassigned capacity.  Now assigned capacity  may be available to be used by a system to which it is assigned but it is considered unallocated until it is logically formatted into a logical volume and/or file system.  Capacity terminology shifts from being storage sub-system centric to being file space centric after being assigned to the host.  Once allocated to a file system the space within the file system is either used by file data or free.</p>
<p>Once this terminology and these high level metrics are established the next undertaking is to determine how the vendor specific metrics may be used to calculate these metrics.  After this is done then the reports using these metrics must be defined.   It may also be useful to  combine some of the metrics to simplify reporting.  We will focus our metrics on usable capacities for reporting back to Service Delivery and users.  Below are some examples of metrics and formulas which may be used to calculate them from EMC Control Center and HP EVA&#8217;s Virtual Controller Software.</p>
<p style="padding-left:120px;text-indent:-100px;">Expansion:          Usable* GB expandability based upon the number of free slots in the array that may be used to expand capacity multiplied by standard spindle sizes (min(disk size))<br />
ECC: =(Total Slots for indicated model &#8211; “# Disks”) * (min(spindle_size)/2)<br />
VCS: =Max Capacity – Current Capacity</p>
<p style="padding-left:120px;text-indent:-100px;">Total Capacity:    All usable* storage in the array including unconfigured, unallocated, reserved free, and used storage.<br />
ECC:  =(“Configured &#8211; Usable (GB)” + (“Unconfigured – Total (GB)”/2) or on SP1 of ECC “Configured – Usable” may not be accurate use: =(“Local Replica Usable”+“Remote Replica Usable”+”Primary Usable”) + (“Unconfigured – Total (GB)”/2)<br />
VCS: Current Capacity</p>
<p style="padding-left:120px;text-indent:-100px;">Unassigned:        Unconfigured usable* storage within the array and configured usable* storage not presented to any host or reserved for any application including COD.<br />
ECC: =“Unallocated – Unmapped &#8211; Usable &#8211; Total (GB)”<br />
VCS: =Current Capacity – (Allocated Storage + Lost to Overhead)</p>
<p style="padding-left:120px;text-indent:-100px;">Reserved :           Usable* GB of Configured storage which is either reserved for a specific application or reserved through presentation to a specific host or front end adapter which is not being physically used or is not under volume management on a host.<br />
ECC: =“Accessible &#8211; Free &#8211; No Vol Grps (GB)” + “Unallocated &#8211; Mapped &#8211; Usable (GB)”<br />
VCS: Lost to Overhead = Current Capacity * 0.25</p>
<p style="padding-left:120px;text-indent:-100px;">Assigned:             Configured usable* storage presented to a host and in use, including free space within the file system and LVM structures.<br />
ECC:= Total Storage – (Unallocated + reserved storage)<br />
VCS: Allocated Storage</p>
<p style="text-align:left;">These metrics may be extracted and reported on in many ways including reporting tools like Birt or Crystal Reports, php and others .  ECC provides an oracle API which creates an ODBC connection and other solutions either provide database connectivity or the information could be scraped from web pages if necessary.  (The reports below were generated using an access database with table links through ODBC to a couple ECC implementations and data access pages.  I will work on a follow up article on how to do this.)</p>
<p>Now that we have defined the formulas for extracting some metrics from our storage platforms the next step is to define some reports.  The first report should present an overall picture of capacity.  A pie chart is best suited for presenting an over-all picture of the percentage of capacity utilization.  The chart below is an example of a Storage Capacity Pie Chart.</p>
<div class="wp-caption aligncenter" style="width: 506px"><img class=" " title="Storage Capacity Pie Chart" src="http://lh3.ggpht.com/_0dog6GbAQhg/S1oADqFSPTI/AAAAAAAAAkU/HhmwmXqi5Gw/s912/StoragePieChart.jpg" alt="Storage Capacity Pie Chart" width="496" height="203" /><p class="wp-caption-text">Storage Capacity Pie Chart</p></div>
<p>The next report should drill down into the capacity to further break it out by location and storage array.  A percentage bar chart will normalize the data to a percentile and then the actual capacities may be inserted as labels.  This will allow analysts to quickly assess where resources are running low.</p>
<div class="wp-caption aligncenter" style="width: 514px"><img class="  " title="Storage Capacity Bar Chart" src="http://lh3.ggpht.com/_0dog6GbAQhg/S1oDW6TVEDI/AAAAAAAAAkg/WcG5euiywqg/s800/StorageBarChart.jpg" alt="" width="504" height="314" /><p class="wp-caption-text">Storage Capacity Bar Chart</p></div>
<p>The next task would be to further drill down by expanding the categorization of the storage resources by tier.  This assumes that a tiered approach has been applied to storage resources.  Most large corporations have categorized storage array platforms into tiers or classes of storage based upon architecture, performance and resiliency.  This classification of storage has become more complicated in the past five years with the advent of unified storage, cloud storage and storage visualization.  These technologies have introduced the ability to provide varied classifications of storage within the same storage subsystem.  Since these technologies have complicated tiered storage capacity reporting the topic is somewhat beyond the scope of this article and will be handled in future articles.</p>
<p style="text-align:center;">
<p style="text-align:center;">
<br />Posted in Capacity Planning Tagged: Capacity, KPI, Storage, Storage Reporting <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dcawired.wordpress.com/403/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dcawired.wordpress.com/403/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/dcawired.wordpress.com/403/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/dcawired.wordpress.com/403/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/dcawired.wordpress.com/403/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/dcawired.wordpress.com/403/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/dcawired.wordpress.com/403/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/dcawired.wordpress.com/403/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/dcawired.wordpress.com/403/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/dcawired.wordpress.com/403/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/dcawired.wordpress.com/403/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/dcawired.wordpress.com/403/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/dcawired.wordpress.com/403/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/dcawired.wordpress.com/403/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dcawired.com&amp;blog=5111608&amp;post=403&amp;subd=dcawired&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dcawired.com/2010/01/22/storage-capacity-kpis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fcb2522560b707f366239aca6c741cec?s=96&#38;d=&#38;r=G" medium="image">
			<media:title type="html">Devin C. Adint</media:title>
		</media:content>

		<media:content url="http://lh6.ggpht.com/_0dog6GbAQhg/Sqp_ImBBk-I/AAAAAAAAAQY/x_R4IvV5Lj4/s512/CapacityKPIs2.jpg" medium="image">
			<media:title type="html">Storage Capacity KPI Hierarchy</media:title>
		</media:content>

		<media:content url="http://lh3.ggpht.com/_0dog6GbAQhg/S1oADqFSPTI/AAAAAAAAAkU/HhmwmXqi5Gw/s912/StoragePieChart.jpg" medium="image">
			<media:title type="html">Storage Capacity Pie Chart</media:title>
		</media:content>

		<media:content url="http://lh3.ggpht.com/_0dog6GbAQhg/S1oDW6TVEDI/AAAAAAAAAkg/WcG5euiywqg/s800/StorageBarChart.jpg" medium="image">
			<media:title type="html">Storage Capacity Bar Chart</media:title>
		</media:content>
	</item>
		<item>
		<title>Pushing Your Profile and SSH Keys</title>
		<link>http://dcawired.com/2010/01/21/pushing-your-profile-and-ssh-keys/</link>
		<comments>http://dcawired.com/2010/01/21/pushing-your-profile-and-ssh-keys/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 22:58:48 +0000</pubDate>
		<dc:creator>Devin Adint</dc:creator>
				<category><![CDATA[System Administration]]></category>
		<category><![CDATA[Expect.pm]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[profile distribution]]></category>
		<category><![CDATA[ssh]]></category>

		<guid isPermaLink="false">http://dcawired.com/?p=419</guid>
		<description><![CDATA[When ever you start supporting a new environment especially in a large corporation usually you are confronted with many systems.  Security will take care of setting up your access across whatever platforms there may be.  But generally you are left holding the bag with setting up your ssh keys and any profile customizations not to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dcawired.com&amp;blog=5111608&amp;post=419&amp;subd=dcawired&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When ever you start supporting a new environment especially in a large corporation usually you are confronted with many systems.  Security will take care of setting up your access across whatever platforms there may be.  But generally you are left holding the bag with setting up your ssh keys and any profile customizations not to mention distribution of any scripts or tools you have come to rely upon.  Of course before you put any tools on a system there are several things to consider.  You definitely want to consider the environments you are first performing the distributions on and it is always good to start with development or lab environments and move out from there.  Also you will need to consider the corporate policies related to the environment which might limit your ability to even have your own set of tools and scripts.  You may be limited down to simple .profile changes and ssh keys.  Implementing a script to push these keys and profiles out may need to go through various degrees of red tape.  Whatever policies and requirements exist in your organization are your responsibility to know and to determine how or if the tools discussed here may be used.</p>
<p><span id="more-419"></span>First you should have run ssh-keygen to create your keys and you should have created an .ssh/authorized_keys file in which you have placed your public key.  The next order of business is to create a script that may be copied to any other system to retrieve your profiles from a designated home or administrative system.  This script may be easily created using korn shell.  An example getprof script is inserted below.  The script uses ssh and file handles to stream tar through a ssh connection.  The reason for this is that I have run into organizations that have disabled scp as well as incompatibilities between the versions of ssh on various platforms.  Streaming tar through ssh has been the most effective means of distribution.</p>
<p style="color:#555555;background-color:#e0e0e0;border:#000000 2px dashed;font:8pt/8pt Courier New;padding:2px 6px 4px;">#!/bin/ksh<br />
HST=${HST:-AdminHost}<br />
USR=${USR:-UserName}<br />
RCP=${RCP:-&#8221;scp -p&#8221;}<br />
RSH=${RSH:-&#8221;ssh -l&#8221;}<br />
SRC=${SRC:-&#8221;$USR@$HST&#8221;}<br />
## Get Tools<br />
mkdir -p ~/tools/bin<br />
mkdir -p ~/tools/sbin<br />
mkdir -p ~/tools/rbin<br />
mkdir -p ~/tools/dba<br />
mkdir -p ~/tools/etc<br />
mkdir -p ~/tools/man<br />
mkdir -p ~/tools/source<br />
mkdir -p ~/tools/depot<br />
mkdir -p ~/.ssh<br />
## Get ALL<br />
exec 3&gt;all.tar<br />
${RSH} ${USR} ${HST} &#8216;tar cf &#8211; ./.ssh ./.ssh2 .profile .kshrc .hosts .exrc .screenrc tools/bin tools/sbin tools/etc tools/dba tools/man tools/rbin&#8217; &gt;&amp;3<br />
exec 3&gt;&amp;- ; tar xvf all.tar &amp;&amp; rm all.tar</p>
<p>But this is only half the battle.  The next problem is to get this script distributed to all the systems you support and to execute it so that your profile is retrieved from a central system to all the systems you will need to support.  To manually do this you would have to login to each system and scp this script to that system, then execute it and answer all the ssh prompts.  A better way is to use a perl script and the Expect.pm module from the central system to automate the login, copy and execution of this script.  You will first need to go to <a href="http://www.cpan.org/">www.cpan.org</a> and get the Expect and IO-Tty perl modules and install them on your systems as most native OS perl distributions do not have these modules.  Instructions for perl module installation may also be found on cpan.</p>
<p>Start your perl script out with a good header for documentation and load your modules. Also setup any global variables in this section.</p>
<p style="color:#555555;background-color:#e0e0e0;border:#000000 2px dashed;font:8pt/8pt Courier New;padding:2px 6px 4px;">#!/usr/bin/perl<br />
#<br />
# Header Name: ~/pushprof.pl<br />
#<br />
# Purpose:<br />
#<br />
# Expected Paramters:<br />
#<br />
# Variables:<br />
#<br />
# Tech/Func Leads:<br />
#<br />
# Target Dependencies:<br />
#<br />
# SH Version: $Header$<br />
#<br />
# History:<br />
# 07/11/00	{Your Name}	&#8211; Script created, Func: None<br />
#<br />
#&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
# &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
#| Use/Include Modules						|<br />
# &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
use File::Basename;<br />
use Sys::Hostname;<br />
use Shell qw(uname);<br />
use Expect;<br />
use Net::Ping;<br />
# &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
#| Define Header Global Variables 		|<br />
# &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
## Determine Current Hosts Name and OS<br />
$THISHOST = hostname() ;<br />
$THISOS = uname(&#8220;-s&#8221;) ;<br />
chomp $THISOS ;<br />
$OS = lc(substr($THISOS,0, 2)) ;    ## lower case first two chars<br />
### Global Variables<br />
use vars qw/ %opt /;<br />
$shell_prompt = qr/[\$\#&gt;][&gt;:\s]*\r?$/;<br />
<br />
### Local Variables<br />
my ($rc) ;<br />
<br />
## Other Script Variables
</p>
<p>Then create a section for functions and procedures and it is good practice to have a cleanup procedure and command line processing procedures.  This will give you a good foundation for any other perl scripts you may write in a future and is a template that I have found very useful.</p>
<p style="color:#555555;background-color:#e0e0e0;border:#000000 2px dashed;font:8pt/8pt Courier New;padding:2px 6px 4px;"># &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
#| Define Script Functions/Procedures  |<br />
# &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
#<br />
# Trap signals and clean up<br />
$Interrupted = 0;   # to ensure it has a value<br />
sub CleanUp()<br />
{<br />
syswrite(STDERR, &#8220;ouch\n&#8221;, 5);<br />
die &#8220;\n&#8221;;<br />
}<br />
$SIG{INT} = \&amp;CleanUp;<br />
$SIG{TERM} = \&amp;CleanUp;<br />
$SIG{QUIT} = \&amp;CleanUp;<br />
#$SIG{EXIT} = \&amp;CleanUp;<br />
#<br />
# Command line options processing<br />
#<br />
sub Init()<br />
{<br />
my $opt_string = &#8216;hs:u:&#8217;;<br />
use Getopt::Std;<br />
getopts( &#8220;$opt_string&#8221;, \%opt ) or Usage();<br />
Usage() if $opt{h};<br />
if ( $opt{s} ) {<br />
$host = $opt{s} ;<br />
} else {<br />
Usage();<br />
};<br />
<br />
if ( $opt{u} ) {<br />
$username= $opt{u} ;<br />
} else {<br />
Usage();<br />
};<br />
<br />
}<br />
sub Usage()<br />
{<br />
print STDERR &#8220;<br />
Syntax: $0 [-h] | [-s {host}] ( -u {username} )<br />
-h             Help<br />
-s {hostname}  Hostname<br />
-u {username}  User Name<br />
&#8221; ;<br />
exit;<br />
}
</p>
<p>We will also need a special subroutine to handle any login prompts that may be encountered.  The expect object, timeout, passphrase and password will have to be passed to this subroutine so that it can reference the expect methods.</p>
<p style="color:#555555;background-color:#e0e0e0;border:#000000 2px dashed;font:8pt/8pt Courier New;padding:2px 6px 4px;">sub LoginHandler<br />
{<br />
my $exp = shift;<br />
my $timeout = shift;<br />
my $passphrase = shift;<br />
my $password = shift;<br />
$spawn_ok = 0;<br />
$exp-&gt;expect($timeout*2,<br />
[<br />
qr'(yes\/no)',<br />
sub {<br />
$spawn_ok = 1;<br />
my $fh = shift;<br />
$fh-&gt;send("yes\n");<br />
exp_continue_timeout;<br />
}<br />
],<br />
[<br />
'-re', qr'(login: $)',<br />
sub {<br />
$spawn_ok = 1;<br />
my $fh = shift;<br />
$fh-&gt;send("$username\n");<br />
exp_continue_timeout;<br />
}<br />
],<br />
[<br />
qr/try again/i,<br />
sub {<br />
$spawn_ok = 1;<br />
my $fh = shift;<br />
print $fh "^[\n";<br />
print STDERR "\nSent Break...\n";    ## don't continue to avoid lock out<br />
die "ERROR: password not accepted exiting to avoid lockout!\n";<br />
}<br />
],<br />
[<br />
qr/password.*:\s*\r?$/i,<br />
sub {<br />
$spawn_ok = 1;<br />
my $fh = shift;<br />
print $fh "$password\n";<br />
exp_continue_timeout;<br />
}<br />
],<br />
[<br />
qr/passphrase/i,<br />
sub {<br />
$spawn_ok = 1;<br />
my $fh = shift;<br />
print $fh "$passphrase\n";<br />
exp_continue_timeout;<br />
}<br />
],<br />
[<br />
'-re', qr/\#\s*\#\s*\#\s*\r?$/i,  ## match banner and just continue<br />
sub {<br />
exp_continue_timeout;<br />
}<br />
],<br />
[<br />
eof =&gt;<br />
sub {<br />
if ($spawn_ok) {<br />
die "ERROR: premature EOF in login.\n";<br />
} else {<br />
die "ERROR: could not spawn telnet.\n";<br />
}<br />
}<br />
],<br />
&#8216;-re&#8217;, $shell_prompt, #&#8217; wait for shell prompt, then exit expect<br />
);		## First expect handle login steps<br />
}
</p>
<p>The next step is to code the main subroutine.  I create a main subroutine not because one has to but because I first learned coding in ansi C and I am used to having a main function/subroutine.  I feel that it is also adds a certain level of neatness and form to the code.</p>
<p style="color:#555555;background-color:#e0e0e0;border:#000000 2px dashed;font:8pt/8pt Courier New;padding:2px 6px 4px;"># &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
#| Define Main Routine            |<br />
# &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
sub Main()<br />
{<br />
# Get Command Line Options<br />
Init();<br />
<br />
# Local Variables<br />
my ($rc ); $rc=0;<br />
my $RT=0;<br />
<br />
my $password = &#8216;<span style="color:#ff0000;">Password</span>&#8216;;<br />
my $passphrase = &#8216;<span style="color:#ff0000;">PhasePhrase</span>&#8216;;<br />
my $username = &#8216;<span style="color:#ff0000;">UserName</span>&#8216; if (defined($username));<br />
my $adminhost = &#8216;<span style="color:#ff0000;">AdminHost</span>&#8216;;<br />
my $homedir = &#8220;\/export\/home\/$username&#8221;;<br />
my $shell_prompt = qr/[\$\#]\s*$/;<br />
my $login = &#8220;ssh $username\@$host&#8221;;<br />
my $timeout = 120 ;<br />
<br />
print STDERR &#8220;$login\n&#8221; ;<br />
<br />
## Test if the host can be ping&#8217;d<br />
my $p = Net::Ping-&gt;new();<br />
if ( $p-&gt;ping($host) ) {<br />
print &#8220;Deploying profile to $host \n&#8221;;<br />
} else {<br />
print &#8220;Seems $host is not reachable \n&#8221;;<br />
}<br />
$p-&gt;close();<br />
<br />
## Start Expect instance and open ssh<br />
my $exp = Expect-&gt;spawn($login)<br />
or die &#8220;Can&#8217;t login to $HOST as $USER:$!\n&#8221;;<br />
$exp-&gt;log_file(&#8220;./pushprof.log&#8221;);<br />
<br />
## Handle login prompts<br />
LoginHandler($exp, $timeout, $passphrase, $password);<br />
<br />
## scp getprof from AdminHost<br />
$exp-&gt;send(&#8220;scp $username\@$adminhost:$homedir/getprof .\n&#8221;);<br />
LoginHandler($exp, $timeout, $passphrase, $password);<br />
<br />
## execute getprof<br />
$exp-&gt;send(&#8220;sh getprof \n&#8221;);<br />
LoginHandler($exp, $timeout, $passphrase, $password);<br />
<br />
$exp-&gt;send(&#8220;exit\n&#8221;);<br />
$exp-&gt;soft_close();<br />
$exp-&gt;log_file(undef);<br />
<br />
return $rc<br />
} ;<br />
<br />
# &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
#| End of Main Routine            |<br />
# &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
$rc = Main();<br />
exit $rc
</p>
<p>Now we have a script which we can call to push our profile out to any system.  If you want to pass the profile out to more than one system simply put the list into a .hosts file and then use a fore loop calling the pushprof.pl. eg:</p>
<p>for host in `cat .hosts`<br />
do<br />
./pushprof.pl –s $host –u YourUserName<br />
done</p>
<br />Posted in System Administration Tagged: Expect.pm, perl, profile distribution, ssh <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dcawired.wordpress.com/419/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dcawired.wordpress.com/419/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/dcawired.wordpress.com/419/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/dcawired.wordpress.com/419/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/dcawired.wordpress.com/419/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/dcawired.wordpress.com/419/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/dcawired.wordpress.com/419/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/dcawired.wordpress.com/419/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/dcawired.wordpress.com/419/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/dcawired.wordpress.com/419/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/dcawired.wordpress.com/419/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/dcawired.wordpress.com/419/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/dcawired.wordpress.com/419/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/dcawired.wordpress.com/419/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dcawired.com&amp;blog=5111608&amp;post=419&amp;subd=dcawired&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dcawired.com/2010/01/21/pushing-your-profile-and-ssh-keys/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fcb2522560b707f366239aca6c741cec?s=96&#38;d=&#38;r=G" medium="image">
			<media:title type="html">Devin C. Adint</media:title>
		</media:content>
	</item>
	</channel>
</rss>
