# Amazon Redshift

You can quickly import data from your Amazon Redshift Database into Exploratory.

Here is a [blog post](https://blog.exploratory.io/exploratory-data-analysis-for-amazon-redshift-with-r-dplyr-9a14441020eb#.aqcbfa6h8) introducing this support in detail.

## 1. Create a Connection to use

Create a connection following [this instruction](https://docs.exploratory.io/data_import/database-data/connection).

## 2. Open Redshift Import dialog

Click '+' button next to 'Data Frames' and select 'Database Data'.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN-qBMezITjny_pQQ%2Fimport-database.png?generation=1586795453631702\&alt=media)

Click Amazon Redshift to select.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN66k4YJQfHTwMU5I%2Fredshift.png?generation=1586795459845940\&alt=media)

## 3. Preview and Import

Click Preview button to see the data back from your Redshift db.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN66mzrfS3x8GJpWw%2Fimport-redshift-dialog.png?generation=1586795460214677\&alt=media)

If it looks ok, then you can click 'Import' to import the data into Exploratory.

## 4. Querying Random Sample Data

You might want to take a random sample of the data that would be reasonable size for your analysis.

You can use [md5](http://docs.aws.amazon.com/redshift/latest/dg/r_MD5.html) function to get random number generated and use it like below to get the random sample of the data.

```
SELECT *
   FROM airline_2016_01
   ORDER BY md5('randomSeed' || flight_num)
   LIMIT 100000
```

## 5. Using Parameters in SQL

First, click a parameter link on the SQL Data Import Dialog.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN2yAtTsqxNt3r0_M%2Fadd_parameter.png?generation=1586795457457840\&alt=media)

Second, define a parameter and click Save button.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN2yC4K_GmMFFjzoo%2Fdefine_parameter.png?generation=1586795457181349\&alt=media)

Finally, you can use @{} to surround a variable name inside the query like below.

```
select *
from airline_2016_01
where carrier = @{carrier}
```

If you type @ then it suggests parameters like below.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN2yEXevqQptAWUEl%2Finsert_param_in_query.png?generation=1586795457117048\&alt=media)

Here's a [blog post](https://exploratory.io/note/kanaugust/An-Introduction-to-Parameter-in-Exploratory-WCO4Vgn7HJ) for more detail.

## 6. AWS Security Group Setup

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN66v_0i1voeo7Yp-%2Fredshift4.png?generation=1586795460765180\&alt=media)

If you encounter a database connection error, please go to AWS console and make sure you added your client PC's IP address to your Security Group (Inbound) associated with the Redshift cluster.

## 7. Number of rows

From performance point of view, we no longer show actual number of rows which can be only fetched by executing whole query again.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN66xWkcCxqo19jUZ%2Fsql_number_of_rows.png?generation=1586795460160662\&alt=media)

If you still want to show the actual number of query for your query, you can do so by setting System Configuration.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN66zCHttnANqtQHx%2Fnum_of_rows_config_menu.png?generation=1586795459932943\&alt=media)

Then set "Yes" For "Show Actual Number of Rows on SQL Data Import Dialog"

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN670aVGCuTHA4oG4%2Fnum_of_rows_config.png?generation=1586795460286823\&alt=media)

This will show you Actual Number of Rows like below.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oN6725Fzzj5A208wn%2Factual_num_of_rows.png?generation=1586795460230058\&alt=media)

## 8. Exploratory Data Analysis for Amazon Redshift with R & dplyr

Here is the link to the blog post [Exploratory Data Analysis for Amazon Redshift with R & dplyr](https://blog.exploratory.io/exploratory-data-analysis-for-amazon-redshift-with-r-dplyr-9a14441020eb)
