Text Wrangling

With the Text Wrangling UI, you can remove, replace, and extract strings from your text data.

Remove

You can remove strings from your text data with various options listed below.

  • Text

  • Text (All)

  • Text (Multiple Candidate)

  • Text (Multiple Candidates: All)

  • Text Inside Special Characters

  • Text Inside Special Characters (All)

  • Range of Text

  • Alphabets

  • Numbers

  • First Word

  • Last Word

  • Spaces

  • Repeated Spaces (Including Tabs and Line Breaks)

  • Leading & Trailing Spaces

  • Email Address

  • URL

  • Emoji

  • Punctuations Characters

How to Use

You can access Text Wrangling UI Remove option from the column header menu.

Text

Type in strings that you want to remove from your text. In the below example, it removes "COVID19" from the text.

Regular Expression

You can use regular expression to remove strings from your text. Click the Regular Expression radio button and type in a regular expression that you want to try. In the below example, it removes the "pandemic" at the end of the text.

Ignore Case

If you select "Yes" for Ignore Case radio button, it removes string with case insensitive fashion. For example, in the below example, it removes both 'pandemic' and 'Pandemic'.

Position

In the above regular expression case, you used $ to match the string at the end. But you can do it without manually writing regular expression by selecting "end" for the Position parameter.

Text (All)

It's basically same as "Text" option. The difference is it removes the all the matches. In below example, there are two "Pandemic" in the text and both of them are removed. In case of "Text", it only removes the first match.

Text (Multiple Candidates)

This option removes strings using multiple candidates for matching. In below example, it removes strings that match either "pandemic" or "covid".

Text (Multiple Candidates: All)

It's basically same as Text (Multiple Candidates). The difference is it removes all the matches like the below screenshot.

Text Inside Special Characters

This option removes text inside specified special characters. It's useful when you want to remove text inside parentheses. If you select "Include Special Characters", it removes the text along with begin and end special characters.

Text Inside Special Characters (All)

It's basically the same as Text Inside Special Characters. The difference is it removes all the matches in the text like the below screenshot.

Range of Text

This option removes strings in a specified range from your text. The below example shows removing strings between 10th and 20th.

Alphabets

This option removes Alphabets from your text like the below screenshot.

Numbers

This option removes Numbers from your text like the below screenshot.

First Word

This option separate text into words with specified separator (space for the below example) and removes the first word like the below screenshot.

Last Word

This option separate text into words with specified separator (space for the below example) and removes the last word like the below screenshot.

Spaces

This option removes spaces in the text like below screenshot. By clicking "Show Invisible Characters" checkbox, you can see space with "␣" character in the preview table.

Repeated Spaces (Including Tabs and Line Breaks)

This option removes repeated spaces, tabs, and line breaks. In the below example, you can see double spaces are changed to single space and line breaks are removed.

Leading & Trailing Spaces

This option removes leading and trailing spaces. For example, in the below screenshot shows the case where the "Japan" has trailing space and it removes the trailing space.

Email Address

This option removes email addresses from your text like below screenshot.

URL

This option removes URLs from your text like below screenshot.

Emoji

This option removes Emojis from your text like below screenshot.

Punctuations Characters

This option removes Punctuation Characters from your text like below screenshot.

Replace

You can replace strings from your text data with various options listed below.

  • Text

  • Text (All)

  • Text (Multiple Candidate)

  • Text (Multiple Candidates: All)

  • Text Inside Special Characters

  • Text Inside Special Characters (All)

  • Range of Text

  • Alphabets

  • Numbers

  • First Word

  • Last Word

  • Spaces

  • Email Address

  • URL

  • Punctuations Characters

How to Use

You can access Text Wrangling UI Replace option from the column header menu.

Text

Type in strings that you want to replace. In the below example, it replaces "COVID19" with "Corona Virus".

Regular Expression

You can use regular expression to replace strings in your text. Click the "Regular Expression" radio button and type in a regular expression that you want to try. In the below example, with the regular expression it can replace the strings like "New York" to NYC.

Ignore Case

If you select "No" for Ignore Case radio button, it replaces string with case sensitive fashion. For example, in the below example, it replace 'new York' but not 'New York'.

Position

If you select the "End" for the Position, it only matches the string at the end of the text. In the below example, it only matches "COVID19" at the end of the text and ignore the rest.

Text (All)

It's basically same as "Text" option. The difference is it replaces the ALL the matches. In below example, there are two "new" in the text and both of them are replaced with "NEW". In case of "Text", only the first one got replaced.

Text (Multiple Candidates)

This option replace strings using multiple candidates for matching. In below example, it replaces strings that match either "covid" or "corona" to "New". You can enter comma separate values to the From input field as candidates.

Text (Multiple Candidates: All)

It's basically same as Text (Multiple Candidates). The difference is it replaces all the matches like the below screenshot.

Text Inside Special Characters

This option replaces text inside specified special characters with the value entered in "Replace With" input field. It's useful when you want to replace text inside parentheses. If you select No for "Include Special Characters", it keeps the begin and end special characters and replace strings inside of these.

Text Inside Special Characters (All)

It's basically the same as Text Inside Special Characters. The difference is it replaces all the matches in the text like the below screenshot.

Range of Text

This option replaces strings in a specified range in your text with the value entered in "Replace With" input field. The below example shows replacing strings between 10th and 20th with "Special".

Alphabets

This option replaces Alphabets in your text with the value entered in "Replace With" input field like the below screenshot.

Numbers

This option replaces Numbers in your text like the below screenshot.

First Word

This option separate text into words with specified separator (space for the below example) and replaces the first word like the below screenshot.

Last Word

This option separate text into words with specified separator (space for the below example) and replaces the last word like the below screenshot.

Spaces

This option replaces spaces in the text like below screenshot. By clicking "Show Invisible Characters" checkbox, you can see space with "␣" character in the preview table.

Email Address

This option replaces email addresses in your text like below screenshot.

URL

This option replaces URLs in your text like below screenshot.

Punctuations Characters

This option replaces Punctuation Characters in your text like below screenshot.

Extract

You can replace strings from your text data with various options listed below.

  • Text

  • Text (All)

  • Text (Multiple Candidate)

  • Text (Multiple Candidates: All)

  • Text Inside Special Characters

  • Text Inside Special Characters (All)

  • Range of Text

  • Alphabets

  • Numbers

  • First Word

  • Last Word

  • Nth Word (2nd, 3rd, etc.)

  • Email Address

  • URL

How to Use

You can access Text Wrangling UI Extract option from the column header menu.

Text

Type in strings that you want to replace. In the below example, it extracts "COVID" from text.

Regular Expression

You can use regular expression to replace strings in your text. Click the "Regular Expression" radio button and type in a regular expression that you want to try. In the below example, with the regular expression it can extracts the strings like "New York".

Ignore Case

If you select "No" for Ignore Case radio button, it extracts string with case sensitive fashion. For example, in the below example, it extracts 'new York' but not 'New York'.

Position

If you select the "End" for the Position, it only matches the string at the end of the text. In the below example, it only matches "COVID19" at the end of the text and ignore the rest.

Text (All)

It's basically same as "Text" option. The difference is it extracts the ALL the matches. In below example, there are two "new" in the text and both of them are replaced with "NEW". In case of "Text", only the first one got replaced.

Text (Multiple Candidates)

This option extracts strings using multiple candidates for matching. In below example, it extracts strings that match either "covid" or "corona" to "New". You can enter comma separate values to the From input field as candidates.

Text (Multiple Candidates: All)

It's basically same as Text (Multiple Candidates). The difference is it extracts all the matches like the below screenshot.

Text Inside Special Characters

This option extracts text inside specified special characters. It's useful when you want to extract text inside parentheses.

Text Inside Special Characters (All)

It's basically the same as Text Inside Special Characters. The difference is it extracts all the matches in the text like the below screenshot.

Range of Text

This option extracts strings in a specified range from your text. The below example shows extracting strings between 10th and 20th.

Alphabets

This option extracts Alphabets from your text like the below screenshot.

Numbers

This option extracts Numbers from your text like the below screenshot.

First Word

This option separates text into words with specified separator (space for the below example) and extracts the first word like the below screenshot.

Last Word

This option separates text into words with specified separator (space for the below example) and extracts the last word like the below screenshot.

N th Word

This option separates text into words with specified separator (space for the below example) and extracts the nth word (e.g. 2nd word) like the below screenshot.

Email Address

This option extracts email addresses in your text like below screenshot.

URL

This option extracts URLs in your text like below screenshot.

Convert

You can convert strings to various way as listed below.

  • UPPERCASE

  • lowercase

  • Title Case

  • Normalize (Zenkaku/Hankaku)

  • Country

  • US State

  • US County

  • IP Address - Country

  • Anonymize

  • Detect a Given Text (TRUE/FALSE)

  • Character Encoding

How to Use

You can access Text Wrangling UI Extract option from the column header menu.

UPPERCASE

It converts text to uppercase like the below screenshot.

lowercase

It converts text to lowercase like the below screenshots.

Title Case

If you select "No" for Ignore Case radio button, it extracts string with case sensitive fashion. For example, in the below example, it extracts 'new York' but not 'New York'.

Normalize (Zenkaku/Hankaku)

It normalize text. In the below example, it converts Zenakaku numbers Hankaku numbers.

Country

You can convert country code to country name or vice versa. The supported country codes are:

  • Correlates of War (Character)

  • Correlates of War (Numeric)

  • ISO3 (Character)

  • ISO3 (Numeric)

  • ISO2 (Character)

  • IMF (Numeric)

  • FIPS (ederal Information Processing Standard) 10-4 (Numeric)

  • FAO (Numeric)

  • United Nations (Numeric)

  • Word Bank (Character)

US State

It converts US State code to US State Name or vice versa. The supported US State codes are:

  • US States Abbreviation

  • US States Number

Also, you can convert US state to Division like "Pacific", "Middle Atlantic", etc. or to Region like "West", "Northeast", etc.

US County

It generates US county code (FIPS - Federal Information Processing Standard) based on US State and County names. To use this operation, you need to specify the column includes US State Names for the US County.

IP Address - Country

It converts IP address to country name.

Anonymize

it anonymizes values by hashing algorithms.

Detect a Give Text (TRUE/FALSE)

it returns TRUE or FALSE based on whether Text data contains a given text or not.

Character Encoding

It converts column character encoding. Below example, converts encoding from Japanese cp932 to UTF-8.

Last updated