How to use Dremio with R and Exploratory
Dremio provides SQL interface to various data sources such as MongoDB, JSON file, Redshift, etc. It is often considered as Data Fabric because it can take care of the query optimization and data cache management across all the different type of data sources so users don’t need to deal with the difference among the data sources. And it can accelerate the query performance sometimes up to 1000 times by utilizing highly optimized physical representations of source data with Apache Parquet, leveraging columnar in-memory processing with Apache Arrow and advanced push-downs into the underlying data sources (when dealing with RDBMS or NoSQL sources).
In this post, I’m going to walk you through how you can install Dremio on your local machine and connect it to Dremio from R and Exploratory.
Before you start.
If your local machine does not have Java yet, you need to install it. You can download Java from here.
First, you need to download Dremio from Dremio Download Site. In this blog post, I’ll explains the installation on Mac for Dremio Community Edition.
Once you downloaded the installation file (my case .dmg file), double click the file. And you’ll see something like this. So drag the Dremio icon and move it to Applications folder.
That’s it! Now let’s start Dremio.
To start Dremio, double click Dremio you just installed
And controller window pops up like this. So click Start button.
Once it started Open Dremio button becomes clickable like this. So click it.
Create Admin Account
So this will open a browser (or a new tab if you already opened your browser) So enter required fields for your Admin account.
Click Next, and congratulation! Now your can see Dremio page like below.
Add Sample Source
Dremio prepares some Sample Source so let’s check it. Click
Add Sample Source
Now you can see
sample.dremio.com is added to Samples data source. Click the
And you can see there are three samples under samples.dremio.com. Let’s click the first one called ‘SF_incidents2016.json’, which is a JSON file about criminal incidents happened in San Francisco in 2016.
And this will opens a dialog like below where you can preview it’s data so click
And now it shows Dremio’s query dialog. And this means that you can access this JSON with SQL! And not only from this Dremio’s query dialog, but also you can access from many other applications including R and Exploratory. And that’s what I’m going to do in the next section.
Dremio ODBC Driver
Before accessing the data in Dremio from R and Exploratory, you need to setup ODBC on your local machine. Below is an example for Mac.
Now let’s download ODBC driver so that you can connect to your Dremio from R and Exploratory. From the same Dremo Download Site, you can download Dremio ODBC Driver.
Double clicked downloaded file (my case .dmg file). This will open up a window like below.
And double click the Dremio ODBC.pkg file. And follow the instruction on dialog (Basically, click continue and agree license).
Setup Data Source on Exploratory
Open Connection Dialog either from Exploratory Desktop Project List page or inside Project.
Open Connection inside Project
If you already opened a project, then from a project header menu, select Connections menu.
Add New Dremio Connection
Click Add button on Connection List Dialog.
From Connection chooser, click Dremio icon.
Enter following fields
- Host (in this example, localhost)
- Port (by default it’s 31010)
- Username (Username that you setup on Dremio Admin Account)
- Password (Password that you setup for Dremio Admin Account)
Once you entered these fields click Test Connection button to test it and make sure you connection test went well.
After confirming it, click Add button to save it.
Query Dremio Data
On left hand side tree, click plus (+) button next to Data Frames label and
And select Dremio
On Data Import Dialog, select Dremio Connection (i.e. Dremio Local Mac) that you
just created and expand
Samples.'sample.dremio.com' and you can see
SF_incidents2016.json that you added at Dremio from their sample. Click the
Table name, which would automatically generate a SQL query to get the whole
data. By clicking Preview button, you will see the data returned in Exploratory
So now you know how you can install/setup Dremio and ODBC driver on your machine and access Dremio data from Exploratory. So on the next couple of blog posts, I’ll talk about how you can query YOUR data (JSON file, MongoDB, Redshift, etc) with Dremio and Exploratory.