Save storage with data deduplication

This blog post isn’t directly about software development, because I’m going to look at a certain feature of Windows Server, but my primary motivation is still dealing with development and testing.

When you want to set up AX 7 (Microsoft Dynamics 365 for Finance and Operations, Enterprise Edition :)) environments for development, testing or demonstration purposes, you can either deploy them to Azure or download a virtual disk and create a virtual machine locally or in a custom data center. Many partners (especially VARs) prefer hosting VMs by themselves, because they’ve already invested to data centers and they’re used to hosting virtual machines with previous versions of AX.

They may need a large number of VMs for AX 7 (for many projects and many developers), but it’s not a problem for computation resources, because just a fraction of these VMs are actually running. For example, a developer may have VMs for five projects but only one of them is used for development at any time. The problem people complain about is the storage requirement, because the virtual disks are pretty big (a single fresh VHD takes approximately 70 GB and it grows when used).

Most data in these virtual disks is the same, therefore if there was a feature to somehow store the same bits just once, it would drastically reduce the amount of data in physical storage. Fortunately it’s exactly what Windows offers.

Windows Server 2012 introduced the Data Deduplication feature, which detects identical blocks of data (not whole files, which would be useless for virtual disks) and stores them only once, keeping a sort of a link from the other places. It’s completely transparent to you – you work with all files in the same way as before. Windows Server 2012 R2 then added extra support for virtual disks in VDI (Virtual Desktop Infrastructure) scenarios, therefore this is the minimum version you want to use. (Windows Server 2016 supports Data Deduplication too, of course, and have added a few improvements.)

According to Microsoft measurements, the usual space saving in these scenarios is 80-95% and I would indeed expect high numbers with AX VMs, because they’re almost the same. This means that any additional VM will need (after deduplication) just a few extra gigabytes of storage and having many VMs won’t be a problem anymore.

I haven’t try it by myself, though, because I don’t need it at home and clients have specialized infrastructure guys dealing with these things. If you have practical experience, you can share it in comments below this post.

If you want to learn more about data deduplication in Windows Server, the latest documentation is here.