Microsoft Azure Data Lake is a highly scalable data storage and analytics service. The service is hosted in Azure, Microsoft’s public cloud, and is largely intended for big datastorage and analysis. Like other data lakes, Azure Data Lake allows developers, scientists, business professionals and other users to gain insight from large, complex data sets. To do this, users write queries that process data and generate results. Because Azure Data Lake is a cloud computing service, it gives customers a faster and more efficient alternative to deploying and managing big data infrastructure within their own data centers.
As with most data lake offerings, the Azure Data Lake service is composed of two parts: data storage and data analytics. Users can store enormous volumes of structured, semi-structured or unstructured data produced from any application, ranging from large archival stores to small, time-sensitive transactional data. According to Microsoft, users can provision Azure Data Lake to store terabytes or even exabytes of data. The storage service also provides high throughput for fast data processing.
On the analytics side, Azure Data Lake users can produce their own code for specific data transformation and analysis tasks, or use existing tools, such as Microsoft’s Analytics Platform System or Azure Data Lake Analytics, to query data sets.
Azure Data Lake is based on the Apache Hadoop YARN (Yet Another Resource Negotiator) cluster management platform and is intended to scale dynamically within the Azure public cloud. This helps the service accommodate the needs of big data projects, which tend to be compute-intensive.
Users can write their own processing code for Azure Data Lake with a programming language such as U-SQL, which merges SQL structure and user-specific code. This also allows users to run analytics across SQL servers in Azure, as well as across Azure SQL Database and Azure SQL Data Warehouse. This unifies access to most data sources in Azure.
Pricing for Azure Data Lake contains numerous components, including storage capacity, the number of analytics units (AUs) per minute, the number of completed jobs and the cost of managed Hadoop and Spark clusters. The Azure Pricing Calculator can help users determine exact data lake costs.