The first thing you’ll need to do is install the 64-bit Simba Spark ODBC Driver. This can be downloaded here: https://databricks.com/spark/odbc-drivers-download
Next, you’ll need to create a new ODBC connection settings key in the Composable Key Vault.
Below are the settings that you’ll need to specify. If the Simba Spark ODBC Driver is not available in your list after the driver is installed, you may need to restart the Composable web app in IIS.
- Host: This is the same host name you use for accessing Databricks via the user interface (example: xyz.azuredatabricks.net)
- Username: token
- Password: To create an authentication token in Databricks, click on the profile icon in the upper right corner of Databricks. Under ‘User Settings’, click ‘Access Tokens’. And then click ‘Generate New Token’. Paste the token into this password field.
- SSL: 1
- transportMode: http
- ThriftTransport: 2 (2 corresponds to SparkThriftServer (Spark 1.1 and later))
- httpPath: This is the path to the cluster that will be used to execute the query. Click on your cluster in the Databricks UI. Then under ‘Configuration’, scroll down to ‘Advanced Options’. Then click on the tab ‘JDBC /ODBC’. There you will see a field called ‘HTTP Path’.
- AuthMech: 3 (3 corresponds to UserName and Password)
- UseSystemTrustStore: 1
If you receive the error
"ERROR [HY000] [Simba][ThriftExtension] (14) Unexpected response from server during a HTTP connection: SSL_connect: certificate verify failed.", then set the
UseSystemTrustStore value to
1, otherwise, you can ignore the setting
Composable and Databricks
Composable DataOps Platform and Databricks are two powerful tools that, when used together, offer an exceptional solution for data engineering, analytics and ML. Here are several reasons why integrating Composable with Databricks can be advantageous:
- Seamless Data Integration: Composable provides a comprehensive data integration and orchestration framework that enables seamless connectivity with various data sources and systems. By integrating with Databricks, Composable allows users to easily ingest, transform, and analyze data from diverse sources as well as from data objects in the Databricks Lakehouse.
- Scalable Data Processing: Composable, with its ability to orchestrate complex data workflows, complements Databricks by providing a visual interface to design and manage data pipelines. This combination enables organizations to efficiently process large volumes of data and perform advanced analytics at scale.
- Advanced Analytics and Machine Learning: By integrating with Composable, organizations can leverage the robust data preparation, feature engineering, and model deployment capabilities of Composable to enhance their data analytics workflows in Databricks. This integration empowers data scientists and analysts to easily build and deploy sophisticated models, enabling faster insights and decision-making.
- Extensibility and Customization: Composable’s modular and extensible architecture allows organizations to incorporate custom functionality and integrate with external systems seamlessly. When combined with Databricks, organizations can leverage the flexibility and extensibility of Composable to integrate with additional tools, systems, and services, further enhancing their data engineering and analytics capabilities.
Launch a Composable instance via the Azure Marketplace and follow the above steps to install and configure the Databricks ODBC driver. Then, follow one of the Composable tutorials to create end-to-end analytics workflows.