Create N-gram Tokens

How to Access This Feature

From + (plus) Button

How to Use?

  • Select a column that has tokenized text - Set a column that has tokens. This is "token" column if it's tokenized by do_tokenize function.

  • Select a column as a document - A column considered as document id. If you run do_tokenize beforehand, this can be document_id.

  • Select a column as sentence id - A column considered as sentence id in a document. If you run do_tokenize beforehand, this can be sentence_id.

  • Max # of Tokens to be Concatenated (Optional) - The default is 2. Maximum number of tokens to be connected.

  • Text to Concatenate Words (Optional) - The default is "_". Character to be used to connect ngrams.

    )

Last updated