SQL 2016 is in market for just under two weeks and Microsoft released a new sample databases(WideWorldImporters and WideWorldImportersDW) Those sample database can be used to evaluate the new features of SQL 2016 such as
• Polybase
• Query store
• In-Meomory OLTP
• Updatable column store index (both clustered and non-clustered)
• Native JSON support
• Security (Dynamic data masking, row level security and Always encrypted)
• Temporal database and tables
There are also few scripts and applicaiton that was released with new sample databases to test out the new features. Enjoy
In this blog you will find various tools and articles that are useful to any DBAs. I will try to put more practical articles while at the same time post any database software bugs and some important tips on internals of sql server and optimization. Address cons and pros of different relational database management in use and new developments. This blog will also cover the market perception of different database system including some hard facts, reviews and benchmarks. Teshome Asfaw
Monday, June 13, 2016
Monday, June 06, 2016
Azure SQL Data Warehouse
Azure SQL
Data Warehouse is the new SaaS offering from Microsoft. It is still in preview
state and I had a chance to set up and play with it. It looks very promising.
So, what
is Azure SQL Data Warehouse? Microsoft defines Azure SQL Data warehouse as
follows: “Azure
SQL Data Warehouse is a cloud-based, scale-out database capable of processing
massive volumes of data - both relational and non-relational. Built on
massively parallel processing (MPP) architecture, SQL Data Warehouse can handle
your enterprise workload”.
FlyData [1] that compares Azure data warehouse vs Amazon Redshift defined
it as “a distributed and enterprise level database capable of handling large
amounts of relational and non-relational data”
Historically,
those type of MPP are only available for enterprises that can afford to fork
out a lot of money to buy appliance upfront. I believe what Microsoft done is
great in trying to bring this powerful massively parallel processing
architecture to all of us and now we are in position to investigate, experiment
and perform cost benefit analysis and see if it can benefit the business under
our domain. I believe the SaaS offering of the product will not discriminate
against any size of companies whether small, medium or large. It is also a
great opportunity for data scientists to experiment their modelling and avoid
hassle of initial investment cost. Whether this will be a game
changer, we won’t say it for sure now but time will tell.
The
abstraction and separation of compute from azure data warehouse is the one I
personally like very much. I believe the flexibility to dynamically
deploy, grow, shrink and pause is indeed another feature. In this
article, I will look at two aspects of the product; architecture and pricing
Architecture
There are
four layer that defines azure data warehouse. There are disagreement if
it is four or three or even two layers that defines Azure data warehouse
architecture. I wouldn't be drawn into the layering as they all talk
about the same thing. Those layers are DMS, storage, control and compute
nodes. The image below (source: Microsoft) gives you a visual
architecture of Azure data warehouse
- Data Movement services is one of the layer
that spans across both control node and compute nodes. It is a windows
service that coordinates and/or manages data movement between different
nodes.
- Control node is the entry point for all
applications. It is like connecting to an instance of SQL server. The
application that connects to ADW (Azure data warehouse) doesn’t know the
underlying compute nodes nor will be able to connect to any of the compute
nodes. The compute nodes are abstracted from our application. The control
node is responsible in distributing our requests to underlying compute
nodes
- Compute nodes are the engine of our computing
power hence called compute nodes. When you issue a big query the first
node you hit will be control node which will then distribute the query to
compute node. Once the query run on compute nodes it is then the work of
control nodes that will aggregate the results from all the compute node
and return’s to our application.
- One of the great achievements of Azure data
warehouse is the separation of dependence of storage layer from the
compute node when scaling up. You should be able to increase your storage
independent of your compute node. The storage layer and compute
nodes scales separately. The data will be stored in Azure Blob storage.
The compute nodes are directly interacting with the storage layer when
reading or writing data.
Image
source: https://azure.microsoft.com/en-gb/documentation/articles/sql-data-warehouse-overview-what-is/
Charging
Charging
is based on number of DWU. DWU (data warehouse unit) represents the amount of
compute resources used at any given time. The more DWU you use, the
faster your query runs as it will be executed on more distributed CPU and
memory. It will take some time and experience to understand the trade-off
and/or optimum DWU to pricing to run your query to be completed within the
time-frame you planned or anticipated.
As
per information I have got at current time, the charge is per minute. For
instance, if you used 100 DWU for 30 minute you will be charged for 30 minutes
instead of per hour pricing that is shown for compute capabilities. Currently,
the minimum DWU you can configure from the portal is 100 DWU (with a cost of
£0.43/hour) with increment of 100 up to a maximum of 2000 DWU (with a cost of
£8.55/hour). If you pause you will not be charged.
References
- https://azure.microsoft.com/en-gb/services/sql-data-warehouse/
- (https://azure.microsoft.com/en-gb/documentation/articles/sql-data-warehouse-get-started-provision/
- https://www.simple-talk.com/cloud/cloud-data/azure-sql-data-warehouse/
Subscribe to:
Posts (Atom)