A CommVault White Paper: Business Continuity: Best Practices for Transitioning Backup Software

April 4, 2018 | Author: Oswin McLaughlin | Category: N/A
Share Embed Donate


Short Description

Download A CommVault White Paper: Business Continuity: Best Practices for Transitioning Backup Software...

Description

A CommVault White Paper: Business Continuity: Best Practices for Transitioning Backup Software

CommVault Corporate Headquarters 2 Crescent Place Oceanport, New Jersey 07757-0900 USA Telephone: 888.746.3849 or 732.870.4000 ©2007 CommVault Systems, Inc. All rights reserved. CommVault, CommVault and logo, the “CV” logo, CommVault Systems, Solving Forward, SIM, Singular Information Management, CommVault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, QNet, GridStor, Vault Tracker, Quick Snap, QSnap, Recovery Director, CommServe, and CommCell, are trademarks or registered trademarks of CommVault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.

Best Practices for Transitioning Backup Software Data Protection through backup is a ubiquitous activity within the Data Center. Even the best planned backup environments can assume a coral-like growth pattern due to the accumulation of data and systems within the enterprise. Perceived issues with one vendor’s backup application can have less to do with failures within the application than failures to accommodate requirements for the use of the application within the enterprise. Following the decision to migrate Data Protection operations from Vendor A to Vendor B a number of decision points should be addressed in order to ensure that the migration is SUCCESSFUL and that the new implementation addresses the requirements that created the requirement to migrate. This document outlines a number of the considerations associated with a successful migration between backup application vendors. This document is NOT intended to represent a comprehensive migration plan but rather a summary of the factors that can ensure success (or failure) during the transition between Data Protection software solutions. Transitioning from a legacy backup software solution to another can be stressful, especially when you already have your hands full with other IT tasks. In addition to time, there are other major obstacles to address in planning a migration between backup applications: ƒ

ƒ

ƒ

ƒ

ƒ

ƒ

Backup is pervasive within the enterprise. Any replacement of an existing application that is installed on every server within the Data Center(s) requires planning in order to ensure that Change Control requirements are met, end user impact is minimized (or negated), and boot windows are allocated for those systems that will receive new device drivers during the course of the conversion. Use of proprietary storage formats which cannot be read by another vendor. While some companies advertise that they can read / write the public Microsoft Tape Format (NTBackup), the use of that feature is handicapped by both slower performance and none of the indexing capability of the proprietary format. o Additionally, any backup job that has been broken into multiple streams during the creation of the backup policy may ONLY be read by the writer application. Control of storage resources (libraries, tape drives) is usually all or nothing unless a third party device manager (example: Sun ACSLS) is used to virtualize control of the robotics and tape drives within the Tape Library Unit. o Additional consideration should be made regarding any potential refreshes in the type and quantity of the storage targets associated with backup and recovery operations. As a portion of the devices associated with the legacy application must remain available for recovery of data associated with backups by that application, the enterprise may have an opportunity to upgrade to net new tape devices or disk devices for use with the new application layer. Systems dedicated to backup and recovery operations MAY require “mothballing” in order to remain available to the enterprise for legacy restore operations. If mothballing is a requirement, new server hardware may be assessed relative to backplane and connectivity requirements associated with the new application layer. TCP / IP infrastructure requirements associated with operations v. data protection activity can be re-assessed to determine if: o Current network infrastructure meets throughput requirements for adherence to the backup window OR if backup traffic should be moved from the LAN. Software services and drivers compete and conflict with other backup vendor software’s drivers making parallel implementation impractical to manage and impossible for adherence to compliance requirements associated with established Service Level Agreements.

Page 1

To minimize the pain associated with a conversion between Data Protection application profiles while improving the conditions associated with a successful migration, it is imperative to have a transition plan. At a minimum, the plan should address: ƒ

ƒ

Validation from Line of Business (LoB) Management / Legal regarding data retention requirements associated with backups on tape and disk storage; o The migration event is an excellent facilitator for a re-assessment of the requirements that the business has for the maintenance of copies of data on tape and disk. Changes in data retention requirements associated with corporate ediscovery and compliance efforts may be better defined, and therefore accounted for in the migration process, prior to the start of the migration cycle. Continued recovery support of long term archive data already on backup media; o Planning should address where, how much or many, and in what format(s) legacy backup media will be maintained. Whereas disk facilitates the fastest restore time for legacy backup data, the cost and volatility of disk as a storage resource precludes its use as a long term storage medium for all but the most critical systems’ data. It is STRONGLY suggested that retention associations relative to the legacy application layer be classified by data type and recovery requirements as follows: Data Classification Mission Critical (revenue) Key Systems (services) Key Systems (data sets) Active Storage (operations) Inactive Storage (archive) Restore at Will

ƒ

ƒ ƒ

DAY ONE

DAY 2-7

DAY 8-30

DAY 31-90

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

(system names)

Maintaining data protection while replacing existing backup software agents with the new backup software agents; o Establish a timeline per system type associated with operations cutover between applications that allows for both failback (in the event of error) and adequate time for validation testing of the new application. Sharing or transition of storage resources (libraries, drives, etc.) to the new backup solution; and Associated with the conversion event, but not part of the actual conversion process, it is STRONGLY recommended that personnel associated with data protection operations complete training on the new software. o CommVault recommends a two tier approach to training whereby administration and operations personnel attend an initial training class to provide a background on the fundamentals of the Galaxy application environment (Galaxy 6.1 System Administration Level I) BEFORE the migration begins, with a follow on Advanced Class (Galaxy 6.1 System Administration Level II) AFTER the migration is completed. This training cycle allows the backupcentric personnel access to training in line with their information and expertise requirements through the course of the conversion.

Page 2

Dealing with existing backup data 60

Percentage of Occurence

Data protected by backup software can be classified 50 as either production recovery data or archival data. Production recovery 40 data should be set to storage targets in order to 30 ensure that in the event of a disaster, production systems 20 can be reconstituted within Recovery Time / Recovery Point objectives. Normally 10 production recovery data has a life span of 0-30 days. 0 This life span supports Day One better than 90% of individual file/message/system recovery requests - most of which usually occur within two weeks of a backup.

Day 2 - 7

Day 8 - 30

Day 31 -90

Day 91 -

Data Creation Life Span

The most traditional approach for migration between Data Protection Operations suggests maintaining a running copy of the legacy backup software along with the necessary resources (library, drives, agents) to support your largest single restore event (Example: multiple streams, etc.) and any unique restore requirement (e.g. proprietary application, unique OS, etc.) Assuming that other systems associated with backup operations (Storage Nodes, SAN Media Servers) meet hardware and connectivity requirements migration planning will assume that the remaining resources and all client systems should be transitioned for use / support of the new backup software in order to ensure the anticipated ROI associated with the application conversion. The old production recovery data will eventually age off as planned and the remaining storage resources assigned to the new backup software. During the transition period where viable data is still managed by both backup solutions, emergent restore requests for files, documents, or messages from the old backup software should be restored with minimal inconvenience using the reduced assets and manually copied over to their required destination.

Use the previous backup software’s cross–client restore capability to temporarily support restore of production recovery data with minimal resources. Size the dedicated resources to meet your largest expected restore event (e.g. parallel streams, memory, storage, CPU, etc.) and any unique application or OS requirement.

Full system / application recovery using data maintained by the previous backup software would be a rare event. In the event that a full system restore based on a legacy backup is required, the appropriate backup agent can be installed on the rebuilt base system and a restoration performed. This level/type of support would need to be maintained only until the production recovery data expires.

Page 3

Alternatively, for a quicker response time it is possible to maintain a minimal configuration of the previous backup software manager in a system image repository (e.g. Norton’s Ghost.), or even back it up with the new backup software. However, this option requires maintaining an available existing compatible host system similar to the configuration to the original backup “Master” server. This may be practical to maintain for two or three years, but not in the long-term.

TIP: Keep previous software installation CD’s, licenses and compatible media drives along with manuals to support archival data restores. Maintain all of these items in a separate, accessible and fireproof location.

Dealing with Archival Data Archival data generally has historical or financial value to the enterprise and is used in defining the company’s future existence, function, or past performance. As such, archival data is maintained for a much longer period of time. The length of retention is driven predominantly by legal, regulatory, government, and / or company policy. Along with long term retention, archival data is normally defined by its hardware independence (generally no requirement for a specific hardware configuration on the target host to recover the data), low restore request rate (once a year on average), and an acceptable lengthy response time to complete the restore (days, not minutes). Archival data should be a fraction of the total data managed by the backup software. Once the production recovery support is transitioned, the breadth of storage resources required to maintain archival data should be minimal. Archival data recovery needs should be adequately handled by simply maintaining the backup software installation disks, metadata, and licenses. As the need arises, install the backup software and metadata, and perform the restore. Note: Data protected via Hierarchical Storage Management (HSM), as opposed to backup, is free from the constraints of application layer conversion planning, however, specific planning regarding resource contention must be considered. HSM integration with the legacy backup application at the device layer (shared devices) may be a factor in determining the rollout for the new Data Protection application (example: SUN ASM). Alternately, software integrated HSM relies on interleaving with the backup application, requiring that both application layers (backup and HSM) be preserved on a dedicated separate server (example: Symantec VSM). The most difficult task when transitioning existing backup data may be to separate the archival data from the production recovery data. If this was not planned for and done previously, you’re looking at a significant cost in terms of time to recover, separate, and re-archive the data. The alternative is to maintain both the production recovery and archival data for the long-term. While doing so may tie up large volumes of storage that could otherwise be used, it may be your least costly option. Consider this when configuring data protection with your new backup software.

TIP: Periodically test restores using previous backup software. This is to validate software, hardware, procedures, and backup media. Whereas this is a GREAT general practice for all backup operations, a policy regarding periodic legacy restore validation is essential to ensuring that data that must be retrievable remains so.

Page 4

Storage Target Resource Allocation In order to support recovery requirements associated with the legacy backup application, it essential to maintain the media and compatible hardware (e.g. 8MM, 4MM, DLT tape drives, etc.) necessary to read/restore the data. Unless the data on media associated with the legacy application is to converted to Galaxy, it is recommended that the legacy media be removed from the active libraries and Vaulted. If media is Vaulted, it is important to ensure that the media is maintained in a secure environment that facilitates the preservation of the magnetic information on the tape / disk. All media has a recommended shelf life. Even under optimum storage conditions, media can deteriorate and become unrecoverable. Life Expectancy: How Long Will Magnetic Media Last? Unfortunately, media life expectancy (LE) information is largely undocumented, and a standard method for determining magnetic media lifetimes has yet to be established. According to manufacturers' data sheets and other technical literature, thirty years appears to be the upper limit for magnetic tape products, data storage media. LE values for storage media, however, are similar to miles per gallon ratings for automobiles. Your actual mileage may vary. Note: An article in the January 1995 Scientific American (Jeff Rothenberg, "Ensuring the Longevity of Digital Documents") conservatively estimated the physical lifetime of digital magnetic recording tape at one year. Because of the confusion that can result from such a statement, the Imation National Media Lab officially responded with a letter to the editor that appeared in the June 1995 issue of Scientific American. The letter states that the "physical lifetimes for digital magnetic tape are at least 10 to 20 years. Media and Longevity It is common for the enterprise to assess storage of media solely in terms of cost. This view assumes that the information stored on the media has no intrinsic value. However, the Vaulted Storage should be evaluated in terms of the cost of losing the recorded information in the event that the storage medium degrades irreversibly. The value of the Vaulting process of the tape cassette must be equated with the cost of preserving the data. When the cost of losing the information is considered, it may be economically justified to invest more in a storage environment of proven reliability. It may also warrant the cost of making and keeping replicated copies of original data and stockpiling systems to play back the data at future times.

TIP: As media nears the end of its reliability life, copy the data to newer media to reduce the risk of media failure by age.

Page 5

Data Protection System Resource Allocation Data Protection hardware also can become inoperable /un-repairable if no spare parts exist. At some point-in-time maintaining a mothballed, wholly separate hardware environment may also become impractical. Conversely, if media and or systems associated with the legacy application are proprietary in nature; or if compliance requirements force the retirement (rather than reuse) of existing hardware, the opportunity to invest in net new system hardware to host the new Data Protection application infrastructure presents the enterprise with the ability to upgrade protection performance. There is a long list of components that determines the performance a Data Protection solution. CommVault recommends consideration of these components as part of the validation process associated with planning for the deployment of the new application layer. Note: It is important to understand, that the slowest of all components determines the maximum reachable throughput of the Data Protection solution. It is now the goal for the backup software to make the most out of the existing components: • • • • • • • •

Data structure and layout (compressibility, layout across storage system) Storage / Disk subsystem for the Backup Server(s) Storage connection (SAN, SCSI, Host Bus Adapters...) I/O Bus of Backup Server(s) CPU of Backup Client OS / Filesystem Implementation Network I/O Bus of Backup Server(s) o Single Ended disk on the server cannot stream efficiently to the backplane to write data to tape. Old servers are SLOW for many reasons… • CPU of Backup Server • Memory of Backup Server • Tape device / Backup device connection (SAN, SCSI, Host Bus Adapters...) • Tape device(s) / Backup device(s) Note: Depending on the backup concept (example: LAN-free) not all of these components might be of relevance.

Server Hardware Considerations CommVault CommServe servers will be, first and foremost, constrained by bandwidth issues and I/O wait states, not by CPUs. Gigabit Ethernet is more important than CPU sizing in order to be able to stream enough data to tape if using multiple drives simultaneously, or if using higher-bandwidth backup drives such as LTOx, SDLT or 9x40 or backups to disk. In order for to optimize CommVault Media Agent Server performance it is recommended to use the following formula for sizing RAM requirements. Note that these figures are approximations: 512MB for CommVault software (minimum) 1GB recommended; 256MB per attached tape drive; 512MB for the OS; 512MB RAM for the TCP/IP Stack (100BaseT); 1GB RAM for the TCP/IP Stack for GB Ethernet. Solid server sizing and planning requires testing to ensure optimized performance of backups. The interrelation of data type, block sizes, network, CPUs, RAM, etc. are all factored into the performance of data written from HDD (DASD or SAN) to tape.

Page 6

Phased, Plunge, or Parallel Implementation? Transitioning a client or storage resource to the new backup software may be completed in various ways depending upon the data, host, and data protection requirements. Parallel Most companies would like to use a parallel transition approach. In this method, the previous and new backup software peacefully co-exist. Resources are shared and previous backup agents are turned off in an orderly fashion – with a quick reversal if necessary in the event of an error in the deployment of the new application layer. In reality, parallel transitions are rarely possible. The inability to share storage resources, software conflicts, and potential for inconsistent data protection are the major roadblocks. Example: Protecting online databases requires exclusive management of transaction logs. Each backup software agent requires controlled access to those logs via the database API. If the logs become split between backup software agents – recovery becomes near impossible. Example: Some backup software use the archive bit of a file property to determine if the file needs to be backed up. Once it has been backed up, the archive bit is reset. If both backup agents use the archive bit, each will back up a different and incomplete set of files. Benefits to this approach: Most auditable in terms of successful backups and recoveries, transition requirements are implicit in monitoring between two application layers. Negatives to this approach: Prone to failure based on numerous contention issues between the separate applications.

TIP: Don’t run parallel backups using both backup software packages unless you have verified that no conflict exists and both are recoverable.

Page 7

Phased Given that an in-parallel transition is generally not possible, the next best method for conversion between applications is a phased transition. In this method, systems are designated for migration in blocks of hosts, where a pre-determined number of systems at a time are moved to the new backup software, configured, and stabilized. Lessons learned from each transition are applied to the next group. Deployment via the Phased Approach is the most typical conversion process and is based on the following generalized workflow: 1) Identify conversion groups by system role a. Test Group b. User Group (Generally a file / print) c. Production Phase 1 (key service systems such as Exchange) d. Production Phase 2 (key data systems such as Oracle / SQL) e. “Clean Up” 2) Establish NEW application foundation (CommServe and at least one Media Agent) 3) Validate connectivity between foundation systems and storage targets 4) Per conversion group: a. Stop services / daemons for legacy backup application (set to manual) b. Install new application binaries c. Complete test backups (FULL and INCREMENTAL) d. Complete test restores e. On validation of successful restores, remove legacy backup application binaries 5) Complete knowledge transfer with Operations / Administration personnel

The advantage of phased transition is that allows a lengthy transition period with minimal complexity and maximum data protection. The difficulty in using this approach is the sharing of storage resources. If it is not possible to share a common library and/or drives between the previous and new backup software then the conversion groups’ migration is are governed by common resources – not business or practical requirements. In cases where backup are targets are consolidated into a single library the phased transition is actually a plunge transition.

Plunge A plunge transition implies that conversion between application layers is completed “all at once” the previous backup software is dropped “cold turkey” with the new backup software up and running as quickly as possible. A plunge transition is normally not undertaken without, at least, running a pilot implementation to validate the new backup software and practice agent installation. The Pilot process for a Plunge implementation would follow the Phased Approach process outlined above. Benefits to this approach: Minimized risk to production systems / end user access to services and data assets. Defined cutover of systems creates a very “clean” audit trail relative to control test reporting. Negatives to this approach: Time associated with Pilot testing, duplicated hardware assets required to complete the Pilot to prepare for the Plunge.

TIP: Complete a parallel transition if possible; phased if practical; and plunge if it is possible to PILOT before production.

Page 8

Sharing Storage Resources As noted earlier in this document, the sharing of a common resource pool for tape-based backup is, at best, a complex endeavor. Whereas newer tape libraries offer a virtualization capability, these libraries are still relatively uncommon out in the real world and would be a costly expenditure to procure a virtualized library device or VTL for the sole purpose of backup software transition. Alternatively, backups to disk can be configured as a short term cache point between the two backup configurations, assuming that the storage target is SAN-based and both backup environments reside in a fibre channel fabric. The negative to this approach is the relative cost associated with SAN connectivity. A less expensive configuration solution would be to use a third party library control software package that both backup software packages support. SUN / StorageTek’s ACSLS for UNIX / LIBAttach for Windows are prime examples of a library management software package that allows multiple applications to share the same tape library. Software-based library sharing however, is proprietary to each library manufacturer and may require additional server hardware dedicated to the software virtualization of the library sharing (Example: SUN Server to host ACSLS binaries). A much less capable, but library independent example would be Microsoft’s Removable Storage Manager (RSM). RSM has limitations (robotic management via RSM creates conflicts between some backup applications and the operating system) however, this option presents a low cost, hardware agnostic option used to share removable media libraries during backup software transition. If library virtualization or third party library management software is not viable solutions for sharing storage resources, then only dedicated physical association of resources remains as a configuration option. NOTE: CommVault does not recommend resource timesharing as a viable solution. The risks associated with the manual (or scripted) movement of library control back and forth between backup software applications are numerous and unpredictable. Given the risk of data loss between unreconciled backup applications this configuration is not worth the cost avoidance.

Use library virtualization or third party library management software to share resources during backup software transition. Do not timeshare resources. The risks are too great.

Page 9

Managing the Transition Transition Management is the subject of numerous books and the primary source of income for many consultants. Change does not come easy. There is a natural urge to maintain the status quo. It takes less effort and is definitely less stressful to NOT CHANGE. However, without change there is no progress. Without progress you will stagnate and in the world of business and IT, stagnation is a death sentence. There are numerous examples of failed implementations and transitions when proper prior planning and preparation were not involved. The reasons most often cited for transition failure were “not enough time/resources” or “too many problems with the new software - the current application works just fine”. The actual reasons were “we didn’t understand the transition requirements” and “we aren’t comfortable using the new solution”. Understanding backup software transition requirements comes with experience. This understanding also requires an in depth examination of the enterprise’s data storage environment and data protection needs. Transition experience comes from practice and is best found in consultants who have successfully managed similar transitions. These consultants will help document data storage environment and protection requirements BEFORE those requirements are acted on. The requirements of day-to-day management in an IT environment leaves little if any time to properly transition a critical IT function such as backup. Transition delays or problems can cause unacceptable gaps in data protection. It is best to leave the transition planning and management to professionals. Training is also critical to the success of transitioning to a new backup software solution. Old habits die hard and without adequate training can cause problems where problems should not have occurred. Even the best artist takes time to learn and practice with a new tool set before attempting to create a masterpiece.

Use consultants to do transition planning and management. Get your key administrators and operators trained on the new software before it is implemented.

Page 10

View more...

Comments

Copyright � 2017 SILO Inc.