A step-by-step guide to migrating Microsoft Data Quality Services to Azure

Advantages of migration to HEDDA.IO

Some Microsoft SQL Server functions of the classic on-premises product have been increasingly used in Azure during the last months.

For example, Azure Data Factory SSIS Integration Runtime is a complete PaaS that is fully compatible with on-premises SQL Server Integration Services. Also, with the current version of the SQL Server Managed Instance, the Microsoft SQL Server Master Data Services can now be used in Azure.

Microsoft Data Quality Services, on the other hand, cannot be used in Azure except on an Azure VM as IaaS. Many processes involved in loading a data warehouse or moving data between different services are increasingly taking place in the cloud. However, the need to carry out pure data quality processes in the cloud increases as well. At this point a gap remains, whereby necessary data quality processes either cannot be integrated into cloud processes or corresponding processes cannot be migrated to Azure.

HEDDA.IO is a DQ service developed entirely for the cloud. With the concepts of knowledge bases, domains and composite domains, HEDDA.IO is compatible with Microsoft Data Quality Services. Through an SSIS component that is fully aligned with the SSIS-IR, the validation and cleansing of data within the ETL processes can be easily performed. Existing on-premises processes with Microsoft Data Quality Services can thus be migrated quickly and easily to Azure using HEDDA.IO.

With the discontinuation of Azure Data Marketplace, Reference Data Services were removed from Data Quality Services. This means that various checks and cleanups based on composite domains can no longer be performed with Microsoft Data Quality Services. HEDDA.IO has an open API with which Reference Data Services can be quickly and easily integrated into the product. Various services can be deployed directly with HEDDA.IO from the Marketplace. Some of the services are available as open source on GitHub so that end users can create their own services.

DQS Knowledge Base

Let’s start with a DQS Knowledge base and a domain in Microsoft Data Quality Services. Open the SQL Server 2017 Data Quality Client. In the start screen, the Knowledge Base Management area on the left displays the Knowledge Bases that you have already defined. Click on the Open Knowledge Base button and select the Knowledge Base DQS Data in the following dialog. You may also find the DQS Data knowledge base under Recent Knowledge Base.

DQS Data is a standard knowledge base that is automatically created on your system when Data Quality Services are installed.


If you have opened the Knowledge Base, you will see the various domains that belong to this Knowledge Base in the Domain area on the left. In the right area you will see the corresponding domain properties for the selected domain, in this example the domain Country/Region. In addition to the name, a more detailed description of the domain and its data type is included here.

 

Via the Domain Values tab you can switch to the actual values within your domain. Here you see the individual values for the selected domain as well as the assigned leading values in the column “Correct To”.

 

As you can see in the selection, a domain can also have completely different character sets here.

Exporting a DQS Knowledge Base

To export a Knowledge Base, click the second icon from the right in the Domain Management area on the left. The Icon looks like a small table with an arrow pointing to the right.


You can now select whether you want to export the full Knowledge Base or the selected domain. Click on Export Knowledge Base to export the entire KB.

In the next dialog “Export to Data File”, select the storage location for the KB and enter a name for it. Exporting the Knowledge Base may take a few seconds based on the amount of data you have stored in it.

 

After the export, you can import the domain into HEDDA.IO.

Importing a DQS Knowledge Base to HEDDA.IO

To import the previously created DQS Knowledge Base into HEDDA.IO, start Excel and log on to your HEDDA.IO service via the HEDDA.IO tab. You can get the URL and the API key from the properties of your service via the Azure Portal. Further information can be found in the HEDDA.IO documentation at: https://hedda.io/documentation

 

Once you are connected to your HEDDA.IO instance, you can select Import from the Configuration Group.

 

The Import dialog allows you to import HEDDA.IO exports as well as DQS exports. To import the previously exported DQS file, select DQS Import from the HEDDA.IO Import dialog.

 

 

 When you import a DQS file, it is uploaded from your local computer to the HEDDA.IO service.

 

 

 The HEDDA.IO service then imports the corresponding DQS file and creates a new knowledge base and the corresponding domains from this file. Based on the file size, the import may take some time. During the import, the Import button is disabled at each connected client.

After the backup has been imported from the Data Quality Server, you can access the Knowledge Base and its corresponding domains.

 

All members, including synonyms and validation status have been imported into HENNDA.IO. Both DQS and HEDDA.IO can handle UTF-8 and can export or import corresponding members.

 

Creating an SSIS Package

To create an SSIS package to clean up data using the previously imported DQS Backup and HEDDA.IO service, you must first install the HEDDA.IO SSIS component. You can download the component either from the portal of your previously created service or from the HEDDA.IO Web site at

https://hedda.io/download

 

After installing the component, open SQL Server Data Tools. Create a new SSIS project with a data flow using the HEDDA.IO Domain Cleansing component.

 

Configure the components to use the previously imported DQS domain. Then save the data again in a database.

Creating an SSIS-IR with HEDDA.IO

To use an Azure Data Factory SSIS-IR with HEDDA.IO components, you must specify a custom setup script when creating the SSIS-IR.

Follow the next steps to create an Azure Data Factory with HEDDA.IO components.

Create a new Azure Data Factory using the Azure Portal.

 

 

 You can create the Azure Data Factory in the same resource group in which you created the HEDDA.IO service. You can of course also use a new or an existing resource group. For performance reasons however, make sure that the Azure Data Factory is created in the same region in which the HEDDA.IO service was created.