How To
Jun 21, 2024

How to Connect Excel to Databricks

This article explains how to connect Excel to Databricks, in a step-by-step guide.

Get started with this process only if you:

  1. Have enough time on hand. This is a very complex and time-consuming process.

(Good luck if you aren’t a techie!)

  1. Have less than 1,048,576 rows of data to import. Excel has a row limitation and will simply cut off the rest.

Find these restrictions limiting? Unfortunately, there's no way to bypass this in Excel.

(Hint: You can easily import and analyze large datasets on Gigasheet, a big data spreadsheet tailored for big data. Jump to the end to learn more.)

Either way, let’s dive in. 

Prerequisites

Ensure you have these in place before you begin: 

  1. Azure Databricks setupsome text
    • Create an Azure Databricks workspace.
    • Add an Azure Databricks cluster and associate your data with it.
  2. ODBC driver installationsome text
    • Download the 64-bit version of the ODBC driver for your operating system.
    • Install and configure the ODBC driver to set up a Data Source Name (DSN).
  3. Token managementsome text
    • Setup Azure Databricks personal access token.
  4. Excel installationsome text
    • Install Microsoft Excel (a trial version is sufficient).

How to connect Excel to Databricks

Follow these steps to connect Excel and Databricks using OAuth 2.0: 

Step 1: Launch ODBC data sources

Search for ODBC data sources on your device and open it.

connect excel to Databricks using OBC Data Sources

Go to the System DSN tab and select “Simba Spark”. Click “Configure”.

Go to System DSN and choose Simba Spark

Step 2: Configure the DSN

Select the mechanism:

  • Choose "OAuth 2.0".
Choose OAuth 2.0

Choose OAuth options:

  • Click "OAuth Options".
Choose OAuth Options
  • Select "Browser Based Authorization Code".
  • Uncheck "Ignore SQL_DRIVER_NOPROMPT".

Uncheck Ignore SQL_Driver

Configure the HTTP path:

  • Open your cluster in Databricks. Go to Advanced Options>JDBC/ODBC. Copy the HTTP Path.
Configure the HTTP Path
  • Go back to the ODBC data source pop-up window. Click "HTTP Options" and paste the HTTP path.
Configure the HTTP Path

Set advanced options:

  • Press "Advanced Options".
Set the Advanced Options
  • In the pop-up window, open "Server Side Properties".
Edit Server Side Properties
  • Choose “Add”
Add a Server Side Property
  • Enter the key as “Auth_Flow” with the value “2”.
Enter the key

Step 3: Connect Excel

Now, open Excel and go to the “Data” tab.

Now time to Connect Excel to Databricks

Select “Get Data”:

  • If you have the newer versions of Excel (2016 and above), click on Get Data > From Other Sources > From ODBC.
Get Data From ODBC
  • If you have the older versions of Excel (2013 and below), you’ll find this under Get External Data > From Other Sources > From Data Connection Wizard > ODBC DSN.

Choose the DSN you just configured. Click “OK”. Next, authenticate yourself on a browser pop-up window.

Choose Simba Spark

Lastly, choose the columns you want to import and click “Next”. 

Choose the columns to import

Select “Return Data to Microsoft Excel” and press “Finish”.

Return Data to Microsoft Excel

After the seemingly endless steps, you can finally import data into Excel. Unless you have more than 1,048,576 rows of data. Then, you can’t.  

Reminder: Excel has a row limitation!

Even with a few thousand rows, Excel becomes painfully slow.

If you use Databricks, you’re accustomed to speed and accuracy. When it comes to big data, Excel simply can’t meet your expectations.

You need a platform like Gigasheet, that is specifically designed for big data.

Gigasheet is a big data spreadsheet that helps businesses upload, analyze, and collaborate on up to a billion rows of data. 

Why use Gigasheet, the big data spreadsheet?

This no-code platform is a database disguised as a spreadsheet, at a cloud scale. It’s quick to connect to Databricks with only 3 steps. The best part?  You can manage large datasets, without worrying about data cutoffs or technical complexities.

Why Gigasheet? Well, it

  1. Handles a billion rows, without splitting data
  2. Requires no setup, training, or coding skills. Even non-technical professionals can use it
  3. Supports real-time collaboration

Quick comparison: Excel vs. Gigasheet

Gigasheet vs Excel

How to connect Gigasheet to Databricks

It’s easy to connect Gigasheet and Databricks. Here’s how you can do it: 

Step 1: Open Gigasheet

Log in to your Gigasheet account (if you don’t have one, sign up for free).

Navigate to “Import from Platform”.

Connect Gigasheet to Databricks

Select “Databricks Personal Access Token”.

Use Personal Acess Token to connect to Databricks

Step 2: Fill out the information

Enter the necessary connection details for your Databricks environment. 

Connecting Gigasheet to Databricks

Step 3: Specific the tables

Enter the name of the table you want to import to Gigasheet.

Choosing the Data to import

In a few minutes, you’ll look at a screen like this 👇🏻. Voila, you’re done! 

Importing data into Gigasheet

Overcome Excel’s limitations with Gigasheet

Excel is a popular choice for data management. It’s ideal for day-to-day operations that involve minimal reporting and analytics. However, it struggles with big data. It lags and delays your work. 

You need a platform that’s quick to set up, easy to use, and advanced enough to handle complex datasets.

A spreadsheet (also a database) for big data analytics. 

Sign up for a free account today!

The ease of a spreadsheet with the power of a data warehouse.

No Code
No Training
No Installation
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.