Word Count

Runs text analysis based on frequency and co-occurrence of words.

Input Data

Input data should contain following column.

  • Text Column - The column of the text data.

Properties

  • Sample Data Size - Number of rows to sample before running the analysis.

  • Text Tokenization

    • Remove Stopwords - Default is Yes.

    • Language for Stopwords - By default it is automatically selected based on the content of the text.

    • Additional Stopwords - Words to be added to the default set of stopwords.

    • Exclude from Stopwords - Words to be excluded from the default set of stopwords.

    • Words To Be Treated As One Word - If a word or phrase that should be treated as one token is separated into multiple tokens, it can be fixed by specifying the word/phrase here.

    • Remove Punctuations

    • Remove Numbers

    • Clean Up Twitter Data - Whether to remove hashtag (starts with #) and mention (starts with @). The default is No.

    • Remove Hiragana Only Words - You can treat often meaningless short Japanese Hiragana words as stopwords altogether by selecting an option here.

  • Words - Bar

    • Number of Bars (Words)

  • Word Cloud

    • Number of Words

    • Min Count for Words to Show - Minimum count of words to be shown on the word cloud. Words that appear only less than this count will not show up on the word cloud.

    • Min Word Size (px)

    • Max Word Size (px)

    • Rotated Text Ratio (0-1)

    • Chart Width (px)

    • Chart Height (px)

  • Word Pairs - Bar

    • Number of Bars (Word Pairs)

  • Word Pairs - Network

    • Number of Words

    • Max Size of Circle (px)

    • Criterion for Size of Circle

    • Max Width of Edge (px)

    • Font Size (Ratio)

How to Use This Feature

  1. Under Analytics view, select "Word Count" for Analytics Type.

  2. Select a column for Text Column.

  3. Click Run button to run the analytics.

  4. Select each view type (explained below) see the detail of the analysis.

"Word Cloud" View

"Word Cloud" View displays the word cloud of the words used in the text data. Both the color and the size of a word indicates the count of the word.

"Words - Bar" View and "Words - Count" View

"Words - Bar" View and "Words - Count" View display the count of the words in the text data.

"Words - Data" View

"Words - Data" View displays the text data after tokenization.

"Word Pairs - Bar" View and "Word Pairs - Count" View

"Word Pairs - Bar" View and "Word Pairs - Count" View display the count of the word pairs in the text data.

"Word Pairs - Network" View

"Word Pairs - Network" View displays the network graph of co-occurrence of the words. The line thickness in the graph represents the count of co-occurrence. The sizes of the circles represent the count of the word. The colors of circles are the result of clustering of the words based on co-occurrence.

R Package

Text Analysis Analytics View uses the quanteda R Package under the hood.

Exploratory R Package

For details about quanteda usage in Exploratory R Package, please refer to the github repository

Last updated