Exploratory
  • Introduction
  • Product Features
    • Summary View
    • Table View
    • Row Filter
    • Column Filter
    • Dashboard
    • Dashboard (日本語)
    • Note
    • Note (日本語)
    • Steps (Right-hand side)
    • Branch
    • Parameter
    • Parameter (日本語)
    • Export
    • Share
      • Share Type
      • Chart / Analytics
      • Data
      • Report (Note / Dashboard)
      • Notification
      • Version History
      • Restore Older Version
      • CSV API
    • Share (日本語)
      • 共有のタイプ
      • チャート / アナリティクス
      • データ
      • レポート (ノート / ダッシュボード)
      • 通知
      • バージョンの履歴
      • 古いバージョンの復元
      • CSV API
    • Schedule
      • Manage Schedules
      • Notification
      • Scheduling History
    • Schedule (日本語)
      • スケジュールの設定
      • 通知
      • スケジュールの履歴
    • Team
      • Manage Teams
    • Team (日本語)
      • チームの設定
    • Project
      • Import
      • Export
      • Search
  • Data Import
    • File Data
      • CSV / Delimited File
      • Amazon S3
      • Google Drive
      • Google Cloud Storage
      • Excel
      • JSON
      • Log File
      • Microsoft Azure
      • Stats - SAS / SPSS / STATA
      • RData / RDS
      • Parquet File
      • EDF - Exploratory
    • Database Data
      • SQL Troubleshooting
      • Create Connection
      • Amazon Athena
      • Amazon Aurora
      • Amazon Redshift
      • Amazon Redshift (日本語)
      • Google BigQuery
      • HP Vertica
      • MariaDB / MySQL DB
      • MariaDB / MySQL DB (日本語)
      • Microsoft Access
      • MongoDB
      • ODBC
      • Oracle
      • PostgreSQL
      • PostgreSQL (日本語)
      • Presto
      • Snowflake
      • SQLServer (DSN)
      • SQLServer
      • Teradata
      • Treasure Data
    • Cloud Apps Data
      • Create Connection
      • FRED - Federal Reserve of Economic Data
      • Github Issues
      • Google Analytics
      • Google Analytics (日本語)
      • Google Spreadsheet
      • Google Cloud Storage
      • Salesforce
      • Twitter Search
      • Stripe
      • Weather Data
      • Stock Price Data
    • Write R Script as Data
      • Currency Exchange Rate
    • Write R Script as Data (日本語)
    • Web Page Scraping
    • Text Input Data
    • Data Source Extension
      • Quandl
      • Holiday
      • RSS Data
    • Create Custom Data Source
  • Data Wrangling
    • Command Line mode for faster and more flexible data interaction in Exploratory
    • Select / Remove Columns
    • Reorder Columns
    • Create New Calculation
    • Create New Calculation for Multiple Columns
    • Summarize (Aggregate)
    • Group
    • Filter
    • Rename
    • Arrange (Sort)
    • Top / Bottom N
    • Join
    • Merge
    • Gather
    • Spread
    • Pivot
    • Expand
    • Complete
    • Separate
    • Unite
    • Bind Rows
    • Bind Columns
    • Keep Only Unique Rows
    • Keep Only Duplicated Rows
    • Slice
    • Drop NA
    • Sample
    • Impute NA
    • Fill
    • Create Buckets
    • Assign New Values to Existing Values - Recode
    • Assign New Values by Setting Conditions - Case When
    • Work with Categories
    • Data Type Conversion
    • Row as header
    • Ungroup
    • Unnest
    • Separate List Items into Columns (Unnest Wider)
    • Separate List Items into Rows (Unnest Longer)
    • Separate Address (Japan)
    • Hoist
    • Remove Empty Rows
    • Remove Empty Columns
    • Clean Column Names
    • Window Calculation
    • Window Calculation (日本語)
    • Add Row
    • Text Wrangling
    • Regular Expression Cheat Sheet
    • Regular Expression Cheat Sheet (日本語)
  • Visualization
    • Types
      • Pivot
      • Summarize Table
      • Table
      • Bar
      • Line
      • Area
      • Pie/Ring
      • Radar
      • Histogram
      • Density Plot
      • Scatter (No Aggregation)
      • Scatter (With Aggregation)
      • Boxplot
      • Violin
      • Error Bar
      • Error Bar (Summarized Data)
      • Map - Standard
      • Map - Extension
      • Map - Long/Lat
      • Map - Heatmap
      • Heatmap
      • Contour
      • Number
      • Word Cloud
      • Word Cloud (日本語)
    • Features
      • Trend Line
      • Reference Line
      • Repeat By
      • Window Calculation
      • Date/Time Aggregation
      • Show Range
      • Highlight
      • Change Marker
      • Multiple Y-Axis Columns
      • Layout Configuration
      • Column Configuration
      • Column Configuration Dialog
      • Color and Group Setting
      • Color and Group Setting (日本語)
      • Color Setting
      • User Color Palette Setting
      • Pin
      • Save as PNG/SVG
      • Save as Exploratory Data File
      • Share/Schedule
      • URL Link
      • Category (Binning)
      • Highlight
      • Limit Values
      • 'Others' Group
      • Edit Display Name
      • Missing Value Handling
      • Rename Column Names
      • Axis Setting
      • Axis Formatting
      • Show Detail
      • Fit to Screen (Table)
      • Number of Unique Values Check
      • Number of Unique Values Check (日本語)
  • Analytics
    • Correlation
    • Distance
    • K-Means Clustering
    • Principal Component Analysis
    • Factor Analysis
    • Correspondence Analysis
    • Linear Regression Analysis
    • Logistic Regression Analysis
    • Generalized Linear Models
    • Survival Curve
    • Cox Regression
    • Random Survival Forest
    • Decision Tree
    • Random Forest
    • XGBoost
    • Time Series Forecasting (Prophet)
    • Time Series Forecasting (ARIMA)
    • Time Series Clustering
    • Anomaly Detection
    • Word Count
    • Text Clustering with Topic Model (LDA)
    • Market Basket Analysis
    • T Test
    • T Test (Aggregated Data)
    • ANOVA
    • Wilcoxon Test
    • Kruskal-Wallis Test
    • Chi-Square Test
    • A/B Test
    • Normality Test
    • Prediction
    • Dictionaries for Text Analysis
  • Statistics
    • Correlation
    • Distance
    • Cosine Similarity
    • SVD
    • Multi Dimensional Scaling
    • T-test
    • F-test
    • Chi-square test
    • A/B Test (Bayesian)
  • Machine Learning
    • Linear Regression
    • Logistic Regression
    • GLM
    • Multinomial Logistic Regression
    • K-means Clustering
    • Random Forest
    • XGBoost
    • Forecasting
    • Time Series Clustering
    • Anomaly Detection
    • Survival Curve
    • Survival Model (Cox Regression)
    • Market Basket
    • Causal Impact
    • Evaluate Prediction - Regression
    • Evaluate Prediction - Binary
    • Calculate ROC
    • Evaluate Prediction - Multiclass
    • Prediction
    • Prediction - Binary Classification
    • Prediction - Survival Model
    • Simulate Survival Curve
    • Extract Summary of Fit
    • Extract Parameter Estimates
    • Run ANOVA Test
    • Fix Imbalanced Data (SMOTE)
  • Text Analysis
    • Tokenize Text
    • Create N-gram Tokens
    • Calculate tf-idf
    • Count Text Pairs
  • Extend with R
    • R Package Install
    • Custom R Script
    • Custom Model Function
  • Setup
    • Disable McAfee virus scan
    • Change Repository Location
    • Change Repository Location (日本語)
    • Holidays Data for Forecast
    • Possible Reasons for Install Error
    • Upgrade Microsoft .NET Framework
  • Diagnostics
    • Log file for debugging
    • Log file for debugging (日本語)
    • Startup Log file for debugging
    • Startup Log file for debugging (日本語)
    • Check version of Exploratory Desktop
    • How to Recover the History Data
  • Keyboard shortcuts
Powered by GitBook
On this page
  • Data Source Extension Overview
  • How to create Data Source Extension
  • Create Data Source Extension directory structure
  • Create Data Source Extension R script
  • Create Data Source Extension Definition File
  • Compress the Directory Structure to create Data Source Plugin file
  • How to install
  • How to use
  • How to update

Was this helpful?

  1. Data Import

Create Custom Data Source

PreviousRSS DataNextData Wrangling

Last updated 5 years ago

Was this helpful?

Exploratory provides a framework with which developers can create their own data sources that can be invoked from Exploratory's Data Import Dialog.

Data Source Extension Overview

A user-defined data source extension consists of JSON format extension definition meta file (extension.json) and R script file (extension.r). These two files need to be stored inside a zip file as follows.

File name: <extension_name>.zip

Content of the zip file: <extension_name>/extension.json <extension_name>/scripts/extension.r

Data Source Extensions can be installed in Exploratory though UI, enabling Exploratory access to the additional data sources.

Here are examples of Data Source Extension.

  • (Financial Data)

  • (Weather Data)

  • ( Data)

How to create Data Source Extension

So let's say you want to create a data source extension that uses R package to get Stocks data. Following is the step-by-step tutorial of creating such a data source extension, named tidyquant_example.

Create Data Source Extension directory structure

First create a directory called tidyquant_example, and scripts directory below it. NOTE: The directory name is the data source extension name and should be unique among user-defined data source extensions.

So on Mac, you can do:

$ mkdir tidyquant_example
$ mkdir tidyquant_example/scripts

Create Data Source Extension R script

Create a extension.r with following content. This is defining a function execute_tidyquant, which gets stock prices data through tq_get function of tidyquant package.

execute_tidyquant <- function(stocks = "AAPL,GOOG,MSFT,AMZN", from = "2006-01-01") {
  stockList <- stringr::str_split(stocks,",")
  result <- sapply(stockList, function(stock){
    list(stock = tidyquant::tq_get(stock, get = "stock.prices", from = from))
  })
  final <- do.call("rbind", result)
}

To access functions of an additional R package (in this case, tq_get function of tidyquant), use explicit namespace like tidyquant::tq_get. This makes sure that your function gets access to the package function when used inside Exploratory.

Place the extension.r under tidyquant_example/scripts.

$ cp extension.r tidyquant_example/scripts
$ ls tidyquant_example/scripts
extension.r

Create Data Source Extension Definition File

This is how your Data Source Extension Definition File (extension.json) looks like. The file must be in valid JSON format.

{
  "name": "tidyquant_example",
  "displayName": "Stocks Data with tidyquant (example)",
  "iconURL" : "https://raw.githubusercontent.com/mdancho84/tidyquant/master/img/tidyquant-logo.png",
  "helpURL" :  "https://github.com/mdancho84/tidyquant",
  "function": "execute_tidyquant",
  "packages": [{"name" : "tidyquant", "version" : "0.4.0"}],
  "rSourceFile" : "extension.r",
  "version":"0.1",
  "inputParameters": [
    {
      "name": "stocks",
      "displayName":"Stock Symbols",
      "dataType": "lov",
      "defaultValue" : "",
      "dynamicSuggestion" : false,
      "required" : true,
      "colSpan" : 10,
      "labelClassName" : "lov-label",
      "listOfValues" : [
          {"name" : "AAPL", "displayName" : "Apple"},
          {"name" : "GOOG", "displayName" : "Google"},
          {"name" : "MSFT", "displayName" : "Microsoft"},
          {"name" : "AMZN", "displayName" : "Amazon"},
          {"name" : "FB", "displayName" : "Facebook"}
      ]
    },
    {
       "name": "from",
       "displayName":"Date From",
       "dataType": "text",
       "defaultValue" : "2016-01-01",
       "required" : true
    }
  ]
}

This file should be placed as extension.json under tidyquant_example directory.

$ cp extension.r tidyquant_example/scripts
$ ls tidyquant_example/scripts
extension.r

Following is the detail of attributes of Data Source Extension Definition File.

Attributes of Data Source Extension Definition File

name (required)

name attribute holds Name of the data source extension. in this case my_riem_measures, please make sure to use your directory name for this name attribute.

displayName (required)

displayName attribute is used for Display Name on Data Source Extension Picker Dialog and Import Data Dialog.

iconURL (required)

iconURL holds icon image file URL for your data source extension. currently it only supports external URL.

For example:

   "iconURL" : "http://xmllondon.com/images/sparqlThumb.png",

If you do not have an icon default icon is used instead.

If you use external URL, iconWidth and iconHeight are set to 32px by default to fit icon in the UI. If you want to change it for some reason, you can change these by specifying following attributes.

"iconWidth" : "64px",
"iconHeight" : "32px"

helpURL (required)

helpURL holds URL for your data source extension help page. Help Link is put on Import Dialog Header. If you do not have one, you can set default exploratory doc link like below:

"helpURL" :  "http://docs.exploratory.io/",

packages (optional)

packages is an array of package name and version pairs that the data source extension depends. For example, if you data source extension depends on tidyquant you need to set it like below:

"rPackageDependencies": [{"name" : "tidyquant", "version" : "0.4.0"}],

function (required)

function holds R function name that the data source extension calls to get data. The R function must return a data frame as output. In this example, you'll call riem_measures from riem package.

rSourceFile (required)

rSourceFile holds the name of R script file that the data source extension depends. Let's use extension.r for this example.

hasQueryField (optional)

If you want to have a dedicated query input field that has much more space for your query string, Set hasQueryField as true and add a parameter whose name is query and dataType is text.

For example, if you define following in your extension.json

"hasQueryField" : true,
"inputParameters": [
  {
    "name" : "query",
    "displayName":"Query",
    "dataType":"text",
    "defaultValue" : "",
    "required" : true
  },

then you can have dedicated query input field on right hand side of the import Dialog.

version (required)

Version of your data source extension.

inputParameters

inputParameters is an array of parameters passed to the function (i.e in this case, execute_tidyquant) and these are rendered as input fields on Data Import Dialog. Parameter order matters. Make sure to set input parameters in a way that underlying R function expects. For example, if your R function has arguments stocks and from, then you need to define your inputParameters in this order.(i.e stocks, then, from).

If you prefer named parameter, you can set withName as true. By setting it true, this argument is used with its name as key in R function. For example, if you have withSentiment parameter and set withName as true,

{
  "name": "withSentiment",
  "withName": true,
  "displayName":"Score Sentiment",
  "dataType": "boolean",
  "defaultValue" : false,
  "required" : false
}

Then final R script would look like this.

exploratory::getTwitterTimeline(..., withSentiment = TRUE)

Each parameter can have following Attributes

  • name

  • displayName

  • colSpan

  • dataType

  • showLabel

  • defaultValue

  • withName

  • isStringArray

  • placeholder

  • required

name (required)

name of the input parameter.

displayName (required)

displayName attribute is used for Parameter Display Name on Import Data Dialog.

dataType (required)

dataType attribute holds a type of the input parameter. Supported types are:

  • text

  • select

  • lov

  • number

  • boolean

We do not support Date type parameter for now.

text

If you use text, it becomes input field that accepts characters.

select

This is useful when you want to create a static single value selector. For example, if you want to create a time range selector, you can create it by specifying options and itemDataType like below. options is an array of selector options and each option needs to have label and value attributes. If your option value is text, itemDataType should be set as text. If your option value is number, itemDataType should be number. Date is not supported for itemDataType for now. To set default selection, you can set your default value to defaultValue attribute. If you want to pass NULL to underlying R function, set value as null in your JSON file.

{
  "name": "time_range",
  "displayName": "Time Range",
  "dataType": "select",
  "itemDataType" : "text",
  "options": [
    {"label":"Last 5 Years", "value":"5y"},
    {"label":"Last 1 Hour", "value":"1h"},
    {"label":"Last 4 Hours", "value":"4h"},
    {"label":"Last 1 Day", "value":"1d"},
    {"label":"Last 7 Day", "value":"7d"},
    {"label":"Last 30 Days", "value":"30d"},
    {"label":"Last 90 Days", "value":"90d"},
    {"label":"Last 12 Months", "value":"1y"},
    {"label":"From 2004", "value":"all"}
  ],
  "defaultValue" : "5y"
}

lov

This is useful when you want to create a static multi-select List of Values. For example, if you want to create a Stock Symbols List of Values, you can create it by specifying listOfValues like below. listOfValues is an array of list of values option and each option needs to have name and displayName attributes.

{
  "name": "stocks", 
  "displayName":"Stock Symbols", 
  "dataType": "lov", 
  "defaultValue" : "",
  "dynamicSuggestion" : false,
  "required" : true,
  "colSpan" : 10,
  "labelClassName" : "lov-label",
  "listOfValues" : [
    {"name" : "AAPL", "displayName" : "Apple"},
    {"name" : "GOOG", "displayName" : "Google"},
    {"name" : "MSFT", "displayName" : "Microsoft"},
    {"name" : "AMZN", "displayName" : "Amazon"},
    {"name" : "FB", "displayName" : "Facebook"}
  ]
},

number

If you use number, it becomes input field that accepts numeric value.

boolean

If you use boolean, you can define a list of values with two values (i.e. true/false)

defaultValue

If the parameter needs default value, you can set it through defaultValue attribute.

withName

Set true if you want to make this argument as "named" parameter. See inputParameter for details.

isStringArray

If you want to support R character vector parameters like c("a", "b", "c"), you can set isStringArray as true. In this case, user enters comma(,) separated values in a input field and Data Source Extension Framework convert it to c("a", "b", "c") fashion.

{
    "name": "keywords",
    "displayName":"Keywords",
    "dataType":"text",
    "defaultValue" : "",
    "isStringArray": true,
    "placeholder" : "Use \",\" for multiple entries",
    "required" : true
},

placeholder

If you want to set some description in the input field, you can use placeholder attribute. For example, if you want your user to type in multiple values with comma(,) separated way, then you can define placeholder attribute to the input parameter.

  "placeholder" : "Use \",\" for multiple entries",

required

If you want to make a parameter as mandatory parameter, set the required attribute as true. By setting this to true, you can force users to enter values for these parameters and if a user clicks Get Data button without filling these parameters, he/she gets following error on Import Dialog.

Compress the Directory Structure to create Data Source Plugin file

Now we have R source file and Data Source Plugin Metadata file under tidyquant_example directory. Compress it into a zip file to create a Data Source Plugin File tidyquant_example.zip.

$ zip -r tidyquant_example.zip tidyquant_example

How to install

Once you created your data source extension zip file, you can install it into Exploratory through UI.

Go to Project and select project header menu.

Click "Extensions" menu. Extensions dialog will open.

On the pane on on the left, under "Data Source" menu, click "Add New".

Click "Add from Local". A file picker will open. Select your extension zip file in the file picker and click "Open". Then your extension is installed.

How to use

After installing your data source extension, you can test it by selecting Import Extension Data after clicking + (plus) icon next to Data Frames.

Now you should be able to see your data source extension Historical Stock Prices

And now you should be able to get Stock Data like this.

How to update

To update an installed extension with a new version of your extension zip file, just repeat the above install steps. The old extension will be overwritten by the new one.

If you want to keep multiple versions of your extension for experimental purpose, please give each version separate names (e.g. my_riem_measures_1, my_riem_measures_2), so that they don't overwrite each other.

tidyquant Data Source
riem_measures Data Source
SPARQL Data Source
RDF
tidyquant