One of the problems with ever-increasing storage volumes is moving that volume of data to a new storage system. When that nice, new storage system is added to a row in your data center, after you turn it on, make sure everything lights up and finish connecting servers that at some point are going to have to migrate data from the old storage platform to the new one. New storage systems bring new technologies like thin provisioning and automated tiering. How do you make sure your migration does not impact those? Migration migraines now not only stem from having to move TBs of information but also from making sure that the migration does not render new storage system capabilities useless and invalidate current data retention policies. There are software and systems that now address these needs while still delivering migration performance.
Even before thin provisioning and automated tiering, moving TBs of data, no matter what the infrastructure, took time. It still does. The good news was that migration had evolved into a block transfer that in many cases could be done SAN side, storage system-to-storage system. It was almost ideal: limited server impact, if any, and very fast raw block to block transfers. The problem is that this block-to-block transfer can throw a wrench in modern storage capabilities like thin provisioning and auto-tiering.
The block-to-block transfer is unaware of what is actually in those blocks. They may be full of deleted data, they may be full of actual data. To be safe, it is all transferred, block by block, to the new system. The new system has to assume that all of these blocks are full of real data. As a result, your new thin volume is now not so thin. Storage companies are addressing this problem. First, look for thin-aware file systems that can communicate with the thin-provisioned volume on the new storage system and not write the blocks that have deleted data in them. Second, look for thin-provisioned systems that can detect deleted blocks that have been zeroed out, a separate step, and then not write those zero blocks during the migration. This does not require a thin-aware file system, and is an extra step that may cause some performance issues on the new system.
Automated tiering brings another wrinkle to the migration process. If you bought into the concept and thin provisioning, you might have taken the vendor up on the "you don't need as much storage" pledge. If the new system can perform a thin migration, they are probably right, you won't need as much storage. If you also decided to make an investment in three tiers of storage and let the system decide where to place data, you might have an initial migration issue. On initial migration all data is hot, but obviously it can't all go to SSD. You may not have enough fibre to store the initial migration, either. The practice that I have seen recommended so far is to migrate everything to a SATA tier first and let hot data start to "migrate up." This could cause some performance issues on initial use but should balance itself out pretty quickly. The other recommended option is to make sure your fibre/SAS tier is large enough to hold the initial data set. Regardless of which tier you choose, I think you are going to need enough space on a given tier to hold the entire capacity of the initial migration.
Another area of interest is migration on the disk archive tier. These are typically CAS-based file systems, so migration from them can be more challenging. In theory, these systems are supposed to get very large and stay in place for decades. We all know that storage systems don't last for decades. One solution is to use a disk-archive solution that is node based. As nodes are added. old ones can be replaced, data auto-migrates to the newer nodes in smaller chunks. What if you migrate away from your current disk archive supplier? Doing a CAS to NAS file copy can take months on most of these systems, and the move may violate compliance policies that you have. There are vendors starting to develop CAS migration software that will make these migrations faster and maintain compliance.
Migration migraines now not only stem from having to move TBs of
information but also from making sure that the migration does not render
new storage system capabilities useless and invalidate current data
retention policies. There are software and systems that now address
these needs while still delivering migration performance.