web analytics

Leverage existing Hadoop cluster with Composable

Composable DataOps Platform provides users with the capability to perform ad-hoc queries on data residing in the Hadoop Distributed File System (HDFS). Hive acts as a bridge between SQL-like queries and the underlying Hadoop infrastructure, allowing users to interact with the data using familiar SQL syntax.


When a user submits a SQL-like query through the Hive ODBC driver, Hive intelligently translates it into MapReduce jobs. MapReduce is a parallel processing framework utilized in Hadoop for distributed data processing. By breaking down the query into smaller tasks and distributing them across a cluster of machines, Hive efficiently executes the query on the HDFS data.

One of the key advantages of using the Composable DataOps Platform alongside Hive is the ability to chain together multiple MapReduce jobs within Composable DataFlows. DataFlows provide a visual interface for designing and orchestrating data pipelines. By integrating Hive and DataFlows, users can create sophisticated data transformation workflows that involve Hive queries, combining the power of SQL-like querying with the data manipulation capabilities of DataFlows.

Furthermore, the results obtained from the Hive query, in the form of tabular data, can be seamlessly fused with data residing in other systems within the Composable DataOps Platform. This integration allows users to combine the queried data with data from various sources such as relational databases, cloud storage, web services, and more. The fusion of data from different systems provides a holistic view and enables comprehensive analysis and reporting.

By leveraging the Hive ODBC driver and the Composable DataOps Platform, users gain the ability to perform on-demand and exploratory analysis on their data stored in the HDFS. They can write SQL-like queries, take advantage of Hive’s translation to MapReduce, and incorporate these queries into their larger data workflows using Composable DataFlows. The seamless integration with other data systems further enhances the insights derived from the Hive queries by incorporating additional contextual data.