Exploratory
  • Introduction
  • Product Features
    • Summary View
    • Table View
    • Row Filter
    • Column Filter
    • Dashboard
    • Dashboard (日本語)
    • Note
    • Note (日本語)
    • Steps (Right-hand side)
    • Branch
    • Parameter
    • Parameter (日本語)
    • Export
    • Share
      • Share Type
      • Chart / Analytics
      • Data
      • Report (Note / Dashboard)
      • Notification
      • Version History
      • Restore Older Version
      • CSV API
    • Share (日本語)
      • 共有のタイプ
      • チャート / アナリティクス
      • データ
      • レポート (ノート / ダッシュボード)
      • 通知
      • バージョンの履歴
      • 古いバージョンの復元
      • CSV API
    • Schedule
      • Manage Schedules
      • Notification
      • Scheduling History
    • Schedule (日本語)
      • スケジュールの設定
      • 通知
      • スケジュールの履歴
    • Team
      • Manage Teams
    • Team (日本語)
      • チームの設定
    • Project
      • Import
      • Export
      • Search
  • Data Import
    • File Data
      • CSV / Delimited File
      • Amazon S3
      • Google Drive
      • Google Cloud Storage
      • Excel
      • JSON
      • Log File
      • Microsoft Azure
      • Stats - SAS / SPSS / STATA
      • RData / RDS
      • Parquet File
      • EDF - Exploratory
    • Database Data
      • SQL Troubleshooting
      • Create Connection
      • Amazon Athena
      • Amazon Aurora
      • Amazon Redshift
      • Amazon Redshift (日本語)
      • Google BigQuery
      • HP Vertica
      • MariaDB / MySQL DB
      • MariaDB / MySQL DB (日本語)
      • Microsoft Access
      • MongoDB
      • ODBC
      • Oracle
      • PostgreSQL
      • PostgreSQL (日本語)
      • Presto
      • Snowflake
      • SQLServer (DSN)
      • SQLServer
      • Teradata
      • Treasure Data
    • Cloud Apps Data
      • Create Connection
      • FRED - Federal Reserve of Economic Data
      • Github Issues
      • Google Analytics
      • Google Analytics (日本語)
      • Google Spreadsheet
      • Google Cloud Storage
      • Salesforce
      • Twitter Search
      • Stripe
      • Weather Data
      • Stock Price Data
    • Write R Script as Data
      • Currency Exchange Rate
    • Write R Script as Data (日本語)
    • Web Page Scraping
    • Text Input Data
    • Data Source Extension
      • Quandl
      • Holiday
      • RSS Data
    • Create Custom Data Source
  • Data Wrangling
    • Command Line mode for faster and more flexible data interaction in Exploratory
    • Select / Remove Columns
    • Reorder Columns
    • Create New Calculation
    • Create New Calculation for Multiple Columns
    • Summarize (Aggregate)
    • Group
    • Filter
    • Rename
    • Arrange (Sort)
    • Top / Bottom N
    • Join
    • Merge
    • Gather
    • Spread
    • Pivot
    • Expand
    • Complete
    • Separate
    • Unite
    • Bind Rows
    • Bind Columns
    • Keep Only Unique Rows
    • Keep Only Duplicated Rows
    • Slice
    • Drop NA
    • Sample
    • Impute NA
    • Fill
    • Create Buckets
    • Assign New Values to Existing Values - Recode
    • Assign New Values by Setting Conditions - Case When
    • Work with Categories
    • Data Type Conversion
    • Row as header
    • Ungroup
    • Unnest
    • Separate List Items into Columns (Unnest Wider)
    • Separate List Items into Rows (Unnest Longer)
    • Separate Address (Japan)
    • Hoist
    • Remove Empty Rows
    • Remove Empty Columns
    • Clean Column Names
    • Window Calculation
    • Window Calculation (日本語)
    • Add Row
    • Text Wrangling
    • Regular Expression Cheat Sheet
    • Regular Expression Cheat Sheet (日本語)
  • Visualization
    • Types
      • Pivot
      • Summarize Table
      • Table
      • Bar
      • Line
      • Area
      • Pie/Ring
      • Radar
      • Histogram
      • Density Plot
      • Scatter (No Aggregation)
      • Scatter (With Aggregation)
      • Boxplot
      • Violin
      • Error Bar
      • Error Bar (Summarized Data)
      • Map - Standard
      • Map - Extension
      • Map - Long/Lat
      • Map - Heatmap
      • Heatmap
      • Contour
      • Number
      • Word Cloud
      • Word Cloud (日本語)
    • Features
      • Trend Line
      • Reference Line
      • Repeat By
      • Window Calculation
      • Date/Time Aggregation
      • Show Range
      • Highlight
      • Change Marker
      • Multiple Y-Axis Columns
      • Layout Configuration
      • Column Configuration
      • Column Configuration Dialog
      • Color and Group Setting
      • Color and Group Setting (日本語)
      • Color Setting
      • User Color Palette Setting
      • Pin
      • Save as PNG/SVG
      • Save as Exploratory Data File
      • Share/Schedule
      • URL Link
      • Category (Binning)
      • Highlight
      • Limit Values
      • 'Others' Group
      • Edit Display Name
      • Missing Value Handling
      • Rename Column Names
      • Axis Setting
      • Axis Formatting
      • Show Detail
      • Fit to Screen (Table)
      • Number of Unique Values Check
      • Number of Unique Values Check (日本語)
  • Analytics
    • Correlation
    • Distance
    • K-Means Clustering
    • Principal Component Analysis
    • Factor Analysis
    • Correspondence Analysis
    • Linear Regression Analysis
    • Logistic Regression Analysis
    • Generalized Linear Models
    • Survival Curve
    • Cox Regression
    • Random Survival Forest
    • Decision Tree
    • Random Forest
    • XGBoost
    • Time Series Forecasting (Prophet)
    • Time Series Forecasting (ARIMA)
    • Time Series Clustering
    • Anomaly Detection
    • Word Count
    • Text Clustering with Topic Model (LDA)
    • Market Basket Analysis
    • T Test
    • T Test (Aggregated Data)
    • ANOVA
    • Wilcoxon Test
    • Kruskal-Wallis Test
    • Chi-Square Test
    • A/B Test
    • Normality Test
    • Prediction
    • Dictionaries for Text Analysis
  • Statistics
    • Correlation
    • Distance
    • Cosine Similarity
    • SVD
    • Multi Dimensional Scaling
    • T-test
    • F-test
    • Chi-square test
    • A/B Test (Bayesian)
  • Machine Learning
    • Linear Regression
    • Logistic Regression
    • GLM
    • Multinomial Logistic Regression
    • K-means Clustering
    • Random Forest
    • XGBoost
    • Forecasting
    • Time Series Clustering
    • Anomaly Detection
    • Survival Curve
    • Survival Model (Cox Regression)
    • Market Basket
    • Causal Impact
    • Evaluate Prediction - Regression
    • Evaluate Prediction - Binary
    • Calculate ROC
    • Evaluate Prediction - Multiclass
    • Prediction
    • Prediction - Binary Classification
    • Prediction - Survival Model
    • Simulate Survival Curve
    • Extract Summary of Fit
    • Extract Parameter Estimates
    • Run ANOVA Test
    • Fix Imbalanced Data (SMOTE)
  • Text Analysis
    • Tokenize Text
    • Create N-gram Tokens
    • Calculate tf-idf
    • Count Text Pairs
  • Extend with R
    • R Package Install
    • Custom R Script
    • Custom Model Function
  • Setup
    • Disable McAfee virus scan
    • Change Repository Location
    • Change Repository Location (日本語)
    • Holidays Data for Forecast
    • Possible Reasons for Install Error
    • Upgrade Microsoft .NET Framework
  • Diagnostics
    • Log file for debugging
    • Log file for debugging (日本語)
    • Startup Log file for debugging
    • Startup Log file for debugging (日本語)
    • Check version of Exploratory Desktop
    • How to Recover the History Data
  • Keyboard shortcuts
Powered by GitBook
On this page
  • Remove
  • How to Use
  • Text
  • Text (All)
  • Text (Multiple Candidates)
  • Text (Multiple Candidates: All)
  • Text Inside Special Characters
  • Text Inside Special Characters (All)
  • Range of Text
  • Alphabets
  • Numbers
  • First Word
  • Last Word
  • Spaces
  • Repeated Spaces (Including Tabs and Line Breaks)
  • Leading & Trailing Spaces
  • Email Address
  • URL
  • Emoji
  • Punctuations Characters
  • Replace
  • How to Use
  • Text
  • Text (All)
  • Text (Multiple Candidates)
  • Text (Multiple Candidates: All)
  • Text Inside Special Characters
  • Text Inside Special Characters (All)
  • Range of Text
  • Alphabets
  • Numbers
  • First Word
  • Last Word
  • Spaces
  • Email Address
  • URL
  • Punctuations Characters
  • Extract
  • How to Use
  • Text
  • Text (All)
  • Text (Multiple Candidates)
  • Text (Multiple Candidates: All)
  • Text Inside Special Characters
  • Text Inside Special Characters (All)
  • Range of Text
  • Alphabets
  • Numbers
  • First Word
  • Last Word
  • N th Word
  • Email Address
  • URL
  • Convert
  • How to Use
  • UPPERCASE
  • Country
  • US State
  • US County
  • IP Address - Country
  • Anonymize
  • Detect a Give Text (TRUE/FALSE)
  • Character Encoding

Was this helpful?

  1. Data Wrangling

Text Wrangling

With the Text Wrangling UI, you can remove, replace, and extract strings from your text data.

Remove

You can remove strings from your text data with various options listed below.

  • Text

  • Text (All)

  • Text (Multiple Candidate)

  • Text (Multiple Candidates: All)

  • Text Inside Special Characters

  • Text Inside Special Characters (All)

  • Range of Text

  • Alphabets

  • Numbers

  • First Word

  • Last Word

  • Spaces

  • Repeated Spaces (Including Tabs and Line Breaks)

  • Leading & Trailing Spaces

  • Email Address

  • URL

  • Emoji

  • Punctuations Characters

How to Use

You can access Text Wrangling UI Remove option from the column header menu.

Text

Type in strings that you want to remove from your text. In the below example, it removes "COVID19" from the text.

Regular Expression

You can use regular expression to remove strings from your text. Click the Regular Expression radio button and type in a regular expression that you want to try. In the below example, it removes the "pandemic" at the end of the text.

Ignore Case

If you select "Yes" for Ignore Case radio button, it removes string with case insensitive fashion. For example, in the below example, it removes both 'pandemic' and 'Pandemic'.

Position

In the above regular expression case, you used $ to match the string at the end. But you can do it without manually writing regular expression by selecting "end" for the Position parameter.

Text (All)

It's basically same as "Text" option. The difference is it removes the all the matches. In below example, there are two "Pandemic" in the text and both of them are removed. In case of "Text", it only removes the first match.

Text (Multiple Candidates)

This option removes strings using multiple candidates for matching. In below example, it removes strings that match either "pandemic" or "covid".

Text (Multiple Candidates: All)

It's basically same as Text (Multiple Candidates). The difference is it removes all the matches like the below screenshot.

Text Inside Special Characters

This option removes text inside specified special characters. It's useful when you want to remove text inside parentheses. If you select "Include Special Characters", it removes the text along with begin and end special characters.

Text Inside Special Characters (All)

It's basically the same as Text Inside Special Characters. The difference is it removes all the matches in the text like the below screenshot.

Range of Text

This option removes strings in a specified range from your text. The below example shows removing strings between 10th and 20th.

Alphabets

This option removes Alphabets from your text like the below screenshot.

Numbers

This option removes Numbers from your text like the below screenshot.

First Word

This option separate text into words with specified separator (space for the below example) and removes the first word like the below screenshot.

Last Word

This option separate text into words with specified separator (space for the below example) and removes the last word like the below screenshot.

Spaces

This option removes spaces in the text like below screenshot. By clicking "Show Invisible Characters" checkbox, you can see space with "␣" character in the preview table.

Repeated Spaces (Including Tabs and Line Breaks)

This option removes repeated spaces, tabs, and line breaks. In the below example, you can see double spaces are changed to single space and line breaks are removed.

Leading & Trailing Spaces

This option removes leading and trailing spaces. For example, in the below screenshot shows the case where the "Japan" has trailing space and it removes the trailing space.

Email Address

This option removes email addresses from your text like below screenshot.

URL

This option removes URLs from your text like below screenshot.

Emoji

This option removes Emojis from your text like below screenshot.

Punctuations Characters

This option removes Punctuation Characters from your text like below screenshot.

Replace

You can replace strings from your text data with various options listed below.

  • Text

  • Text (All)

  • Text (Multiple Candidate)

  • Text (Multiple Candidates: All)

  • Text Inside Special Characters

  • Text Inside Special Characters (All)

  • Range of Text

  • Alphabets

  • Numbers

  • First Word

  • Last Word

  • Spaces

  • Email Address

  • URL

  • Punctuations Characters

How to Use

You can access Text Wrangling UI Replace option from the column header menu.

Text

Type in strings that you want to replace. In the below example, it replaces "COVID19" with "Corona Virus".

Regular Expression

You can use regular expression to replace strings in your text. Click the "Regular Expression" radio button and type in a regular expression that you want to try. In the below example, with the regular expression it can replace the strings like "New York" to NYC.

Ignore Case

If you select "No" for Ignore Case radio button, it replaces string with case sensitive fashion. For example, in the below example, it replace 'new York' but not 'New York'.

Position

If you select the "End" for the Position, it only matches the string at the end of the text. In the below example, it only matches "COVID19" at the end of the text and ignore the rest.

Text (All)

It's basically same as "Text" option. The difference is it replaces the ALL the matches. In below example, there are two "new" in the text and both of them are replaced with "NEW". In case of "Text", only the first one got replaced.

Text (Multiple Candidates)

This option replace strings using multiple candidates for matching. In below example, it replaces strings that match either "covid" or "corona" to "New". You can enter comma separate values to the From input field as candidates.

Text (Multiple Candidates: All)

It's basically same as Text (Multiple Candidates). The difference is it replaces all the matches like the below screenshot.

Text Inside Special Characters

This option replaces text inside specified special characters with the value entered in "Replace With" input field. It's useful when you want to replace text inside parentheses. If you select No for "Include Special Characters", it keeps the begin and end special characters and replace strings inside of these.

Text Inside Special Characters (All)

It's basically the same as Text Inside Special Characters. The difference is it replaces all the matches in the text like the below screenshot.

Range of Text

This option replaces strings in a specified range in your text with the value entered in "Replace With" input field. The below example shows replacing strings between 10th and 20th with "Special".

Alphabets

This option replaces Alphabets in your text with the value entered in "Replace With" input field like the below screenshot.

Numbers

This option replaces Numbers in your text like the below screenshot.

First Word

This option separate text into words with specified separator (space for the below example) and replaces the first word like the below screenshot.

Last Word

This option separate text into words with specified separator (space for the below example) and replaces the last word like the below screenshot.

Spaces

This option replaces spaces in the text like below screenshot. By clicking "Show Invisible Characters" checkbox, you can see space with "␣" character in the preview table.

Email Address

This option replaces email addresses in your text like below screenshot.

URL

This option replaces URLs in your text like below screenshot.

Punctuations Characters

This option replaces Punctuation Characters in your text like below screenshot.

Extract

You can replace strings from your text data with various options listed below.

  • Text

  • Text (All)

  • Text (Multiple Candidate)

  • Text (Multiple Candidates: All)

  • Text Inside Special Characters

  • Text Inside Special Characters (All)

  • Range of Text

  • Alphabets

  • Numbers

  • First Word

  • Last Word

  • Nth Word (2nd, 3rd, etc.)

  • Email Address

  • URL

How to Use

You can access Text Wrangling UI Extract option from the column header menu.

Text

Type in strings that you want to replace. In the below example, it extracts "COVID" from text.

Regular Expression

You can use regular expression to replace strings in your text. Click the "Regular Expression" radio button and type in a regular expression that you want to try. In the below example, with the regular expression it can extracts the strings like "New York".

Ignore Case

If you select "No" for Ignore Case radio button, it extracts string with case sensitive fashion. For example, in the below example, it extracts 'new York' but not 'New York'.

Position

If you select the "End" for the Position, it only matches the string at the end of the text. In the below example, it only matches "COVID19" at the end of the text and ignore the rest.

Text (All)

It's basically same as "Text" option. The difference is it extracts the ALL the matches. In below example, there are two "new" in the text and both of them are replaced with "NEW". In case of "Text", only the first one got replaced.

Text (Multiple Candidates)

This option extracts strings using multiple candidates for matching. In below example, it extracts strings that match either "covid" or "corona" to "New". You can enter comma separate values to the From input field as candidates.

Text (Multiple Candidates: All)

It's basically same as Text (Multiple Candidates). The difference is it extracts all the matches like the below screenshot.

Text Inside Special Characters

This option extracts text inside specified special characters. It's useful when you want to extract text inside parentheses.

Text Inside Special Characters (All)

It's basically the same as Text Inside Special Characters. The difference is it extracts all the matches in the text like the below screenshot.

Range of Text

This option extracts strings in a specified range from your text. The below example shows extracting strings between 10th and 20th.

Alphabets

This option extracts Alphabets from your text like the below screenshot.

Numbers

This option extracts Numbers from your text like the below screenshot.

First Word

This option separates text into words with specified separator (space for the below example) and extracts the first word like the below screenshot.

Last Word

This option separates text into words with specified separator (space for the below example) and extracts the last word like the below screenshot.

N th Word

This option separates text into words with specified separator (space for the below example) and extracts the nth word (e.g. 2nd word) like the below screenshot.

Email Address

This option extracts email addresses in your text like below screenshot.

URL

This option extracts URLs in your text like below screenshot.

Convert

You can convert strings to various way as listed below.

  • UPPERCASE

  • lowercase

  • Title Case

  • Normalize (Zenkaku/Hankaku)

  • Country

  • US State

  • US County

  • IP Address - Country

  • Anonymize

  • Detect a Given Text (TRUE/FALSE)

  • Character Encoding

How to Use

You can access Text Wrangling UI Extract option from the column header menu.

UPPERCASE

It converts text to uppercase like the below screenshot.

lowercase

It converts text to lowercase like the below screenshots.

Title Case

If you select "No" for Ignore Case radio button, it extracts string with case sensitive fashion. For example, in the below example, it extracts 'new York' but not 'New York'.

Normalize (Zenkaku/Hankaku)

It normalize text. In the below example, it converts Zenakaku numbers Hankaku numbers.

Country

You can convert country code to country name or vice versa. The supported country codes are:

  • Correlates of War (Character)

  • Correlates of War (Numeric)

  • ISO3 (Character)

  • ISO3 (Numeric)

  • ISO2 (Character)

  • IMF (Numeric)

  • FIPS (ederal Information Processing Standard) 10-4 (Numeric)

  • FAO (Numeric)

  • United Nations (Numeric)

  • Word Bank (Character)

US State

It converts US State code to US State Name or vice versa. The supported US State codes are:

  • US States Abbreviation

  • US States Number

Also, you can convert US state to Division like "Pacific", "Middle Atlantic", etc. or to Region like "West", "Northeast", etc.

US County

It generates US county code (FIPS - Federal Information Processing Standard) based on US State and County names. To use this operation, you need to specify the column includes US State Names for the US County.

IP Address - Country

It converts IP address to country name.

Anonymize

it anonymizes values by hashing algorithms.

Detect a Give Text (TRUE/FALSE)

it returns TRUE or FALSE based on whether Text data contains a given text or not.

Character Encoding

It converts column character encoding. Below example, converts encoding from Japanese cp932 to UTF-8.

PreviousAdd RowNextRegular Expression Cheat Sheet

Last updated 3 years ago

Was this helpful?