How to use Dremio with R and Exploratory

Dremio provides SQL interface to various data sources such as MongoDB, JSON file, Redshift, etc. It is often considered as Data Fabric because it can take care of the query optimization and data cache management across all the different type of data sources so users don’t need to deal with the difference among the data sources. And it can accelerate the query performance sometimes up to 1000 times by utilizing highly optimized physical representations of source data with Apache Parquet, leveraging columnar in-memory processing with Apache Arrow and advanced push-downs into the underlying data sources (when dealing with RDBMS or NoSQL sources).

In this post, I’m going to walk you through how you can install Dremio on your local machine and connect it to Dremio from R and Exploratory.

Dremio Server

Before you start.

If your local machine does not have Java yet, you need to install it. You can download Java from here.

Download

First, you need to download Dremio from Dremio Download Site. In this blog post, I’ll explains the installation on Mac for Dremio Community Edition.

Install

Once you downloaded the installation file (my case .dmg file), double click the file. And you’ll see something like this. So drag the Dremio icon and move it to Applications folder.

That’s it! Now let’s start Dremio.

Setup

To start Dremio, double click Dremio you just installed

And controller window pops up like this. So click Start button.

Once it started Open Dremio button becomes clickable like this. So click it.

Create Admin Account

So this will open a browser (or a new tab if you already opened your browser) So enter required fields for your Admin account.

Click Next, and congratulation! Now your can see Dremio page like below.

Add Sample Source

Dremio prepares some Sample Source so let’s check it. Click Add Sample Source Button.

Now you can see sample.dremio.com is added to Samples data source. Click the sample.dremio.com.

And you can see there are three samples under samples.dremio.com. Let’s click the first one called ‘SF_incidents2016.json’, which is a JSON file about criminal incidents happened in San Francisco in 2016.

And this will opens a dialog like below where you can preview it’s data so click Savebutton.

And now it shows Dremio’s query dialog. And this means that you can access this JSON with SQL! And not only from this Dremio’s query dialog, but also you can access from many other applications including R and Exploratory. And that’s what I’m going to do in the next section.

Dremio ODBC Driver

Before accessing the data in Dremio from R and Exploratory, you need to setup ODBC on your local machine. Below is an example for Mac.

Download

Now let’s download ODBC driver so that you can connect to your Dremio from R and Exploratory. From the same Dremo Download Site, you can download Dremio ODBC Driver.

Install

Double clicked downloaded file (my case .dmg file). This will open up a window like below.

And double click the Dremio ODBC.pkg file. And follow the instruction on dialog (Basically, click continue and agree license).

That’s it!

Setup Data Source on Exploratory

Open Connection Dialog either from Exploratory Desktop Project List page or inside Project.

Open Connection inside Project

If you already opened a project, then from a project header menu, select Connections menu.

Add New Dremio Connection

Click Add button on Connection List Dialog.

From Connection chooser, click Dremio icon.


Enter following fields

  • Name
  • Host (in this example, localhost)
  • Port (by default it’s 31010)
  • Username (Username that you setup on Dremio Admin Account)
  • Password (Password that you setup for Dremio Admin Account)

Test Connection

Once you entered these fields click Test Connection button to test it and make sure you connection test went well.

After confirming it, click Add button to save it.

Query Dremio Data


On left hand side tree, click plus (+) button next to Data Frames label and select Database Data.

And select Dremio

On Data Import Dialog, select Dremio Connection (i.e. Dremio Local Mac) that you just created and expand Samples.'sample.dremio.com' and you can see SF_incidents2016.json that you added at Dremio from their sample. Click the Table name, which would automatically generate a SQL query to get the whole data. By clicking Preview button, you will see the data returned in Exploratory !

Summary

So now you know how you can install/setup Dremio and ODBC driver on your machine and access Dremio data from Exploratory. So on the next couple of blog posts, I’ll talk about how you can query YOUR data (JSON file, MongoDB, Redshift, etc) with Dremio and Exploratory.

results matching ""

    No results matching ""