Thursday, July 02, 2015

Polybase (Now in Editions of 2016 as new feature)

What is Polybase?  Polybase is a new technology that integrates PDW (SQL server Parallel Data warehouse) with Hadoop Distributed File System.  It used to only work with PDW and most small to medium enterprise that doesn’t have the appliance weren’t able benefit. 

Polybase allows users to access/query non-relational data in Hadoop, blobs, files, data from either on premise or on the cloud and run analytics and BI on the data from within SQL server. It also provides a concept of Data Lake where you query the data from where it is stored and once you complete your query leave it where it was. This concept will facilitate analysis on Big Data from its current location and reduce the costs associated in moving the data. The following diagram is taken from Microsoft white paper and shows the interaction that you can have with different data sources from within SQL server when using PolyBase feature.