Introduction

Text data is very common when populating templates. String functions make it easy to clean textual data, manage transformations, and reformat the data to fit neatly into templates.

What will I learn?

After following this guide, you'll be able to:

  • Find strings from a list that meet certain criteria.
  • Separate long strings by specifying separators, and put them back together.
  • Format string output in a comprehensive manner.
  • Fix casing of strings.

Prerequisites

You'll need access to a Documotor tenant and a basic understanding of the platform. Also, you should be familiar with JSON data types and JMESPath expressions, as well as the basics of working with arrays and objects.

Sample data

We'll use the following JSON document to demonstrate the basic ways of working with strings in JMESPath. You'll be able to copy it directly into your Documotor template and play around with it, or find the file and the transformations in the starter pack that you received with your tenant.

{
  "sampleString": "lorem ipsum dolor sit amet. phasellus sit amet mi lacinia, tincidunt velit non, hendrerit nunc."
}

Notice that strings are enclosed in double quotes "". In JMESPath, we sometimes define String literals, hard-coded strings that are defined by the user inside the transformation, rather than being fetched from the data. They are denoted with placement inside single quotes ''.

Cleaning

The sample string is made up of two sentences which are not correctly capitalized. Let's start by fixing that:

  correctedCasing: to_sentencecase(sampleString),

to_sentencecase capitalizes the first word of a string, and any other words following periods.

For analysis of words, it's usually helpful to split the string into an array of words. We'll work with the correctedCasing version of the input string.

  // Split the text into words, removing basic punctuation and spaces.
  // Splitting on consecutive characters creates empty strings, but ignoreEmpty is set to true.
  arrayOfWords: split_on($.correctedCasing, `true`, ' ', ',', '.'),

Querying

Use filter expressions to find strings meeting some criteria. Here are some examples:

  // Find words equal to their lowercase versions, i.e., lowercase words.
  lowercaseWords: $.arrayOfWords[?@==to_lower(@)],
  
  // Find the words that are palindromes, equal to themselves in reverse.
  palindromes: $.arrayOfWords[?@==reverse(@)],
  
  // Get a list of words containing a lowercase or an uppercase p.
  wordsContainingP: $.arrayOfWords[?contains(to_upper(@), 'P')],

  // Substring fetches the first letter and ends_with checks if the ending is the same.
  wordsThatStartAndEndWithTheSameLetter: $.arrayOfWords[?ends_with(@, substring(@, `0`, `1`))]

These are standard filtering expressions, combined with some string functions:

  • to_lower(text) and to_upper(text) convert their arguments to lowercase and uppercase respectively.
  • reverse(text) reverses the string or array argument.
  • contains(arg, value) checks whether the first argument (array or string) contains the second argument (as an element or a substring respectively). In this case, we are fetching the words of arrayOfWords that, when converted to uppercase, contain 'P'. We convert everything to uppercase in order to capture both words that contain an uppercase 'P' and a lowercase 'p'.
  • substring(text, startIndex, substringLength) fetches the substring of text starting at startIndex, of length substringLength.
  • ends_with(text, substring) checks if text ends in substring, regardless of the length of substring.

Joining and string interpolation

Join strings using the join function - it allows you to put an array of strings together into a single string, with a specified separator placed in between the different elements of the array. The syntax is join(separator, array).


  // Using the join function to list the first and last words as part of a sentence.
  firstAndLastWord: join('', [
    'The first word was ', 
    $.arrayOfWords[0], 
    ', and the last word ',
    $.arrayOfWords[-1],
    '.'
  ]),

  // Perform the same action more elegantly with string_interpolate.
  firstAndLastWordBetter: string_interpolate(
    'The first word was {0}, and the last word {1}.', 
    [$.arrayOfWords[0],$.arrayOfWords[-1]]),

  // Make a title string from the list of lowercase words.
  title: to_titlecase(join(' ', $.lowercaseWords), 'en-US') 

As you can see, this purpose of the join function is more elegantly replicated with the string_interpolate function. It allows you to define the sentence where you want to insert certain variables, and add the variables as an array argument.

Finally, we see another casing function and another use of join. to_titlecase requires a culture argument to accommodate uppercase letters in different scripts - it capitalizes each word in any case.

Learn more

A detailed overview of all functions for working with strings is given in the reference page for String functions.