Amazon Redshift Database Data Import

You can quickly import data from your Amazon Redshift Database into Exploratory.

Here is a blog post introducing this support in detail.

1. Create a Connection to use

Create a connection following this instruction.

2. Open Redshift Import dialog

Click '+' button next to 'Data Frames' and select 'Import Database Data'.

Click Redshift to select.

3. Preview and Import

Click Preview button to see the data back from your Redshift db.

If it looks ok, then you can click 'Import' to import the data into Exploratory.

4. Querying Random Sample Data

You might want to take a random sample of the data that would be reasonable size for your analysis.

You can use md5 function to get random number generated and use it like below to get the random sample of the data.

SELECT *
   FROM airline_2016_01
   ORDER BY md5('randomSeed' || flight_num)
   LIMIT 100000

5. Using Variables in SQL

First, create a custom R script.

Second, define a variable in the R script and save.

cutoff_date <- "\'2016-01-15\'"

Note that the ‘\’ (backslash) symbols are used to escape the single quotes, which are required to be used for characters in SQL queries.

Finally, you can use @{} to surround a variable name inside the query like below.

select *
from airline_2016_01
where fl_date > @{cutoff_date}

Here's a blog post for more detail.

6. AWS Security Group Setup

If you encounter a database connection error, please go to AWS console and make sure you added your client PC's IP address to your Security Group (both Inbound and Outbound) associated with the Redshift cluster.

7. Exploratory Data Analysis for Amazon Redshift with R & dplyr

Here is the link to the blog post Exploratory Data Analysis for Amazon Redshift with R & dplyr

results matching ""

    No results matching ""