Arrays

How to work with arrays in JMESPath

Introduction

An array, sometimes called a list, is a data type representing an ordered sequence of values of any type. Here's an example of an array:

["Striped T-shirt", "T-shirt", "Jacket"]

They are a natural data format that you'll encounter often. In business environments, common examples are inventory lists, transactions or sales. In JMESPath, the elements of the array are separated by commas, and the entire array is enclosed in square brackets.

What will I learn?

This guide will teach you to:

  • Retrieve data from an array.
  • Sort and filter arrays.
  • Update arrays.
  • Reformat arrays to adapt them to templates.

Prerequisites

You'll need access to a Documotor tenant and a basic understanding of the platform. Also, you should be familiar with the basics of working with objects.

Sample data

Consider the following dataset, consisting of two arrays representing the offer of a clothes shop, and its sales:

{
  "offer": [
    {
      "product": "Striped T-shirt",
      "category": "T-shirt",
      "sizes": ["XS", "S", "M", "L", "XL", "XXL"],
      "colors": ["blue/white", "red/white", "green/white"],
      "price": 119.99
    },
    {
      "product": "Standard T-shirt",
      "category": "T-shirt",
      "sizes": ["XS", "S", "M", "L", "XL", "XXL"],
      "colors": ["blue", "red", "green", "white", "black"],
      "price": 99.99
    },
    {
      "product": "Winter jacket",
      "category": "Jacket",
      "sizes": ["S", "M", "L", "XL"],
      "colors": ["black", "red", "brown"],
      "price": 599.99
    }
  ],
  "sales": [
    {
      "product": "Standard T-shirt",
      "size": "S",
      "color": "blue"
    },
    {
      "product": "Standard T-shirt",
      "size": "L",
      "color": "white"
    },
    {
      "product": "Striped T-shirt",
      "size": "XXL",
      "color": "red/white"
    },
    {
      "product": "Winter jacket",
      "size": "L",
      "color": "brown"
    }
  ]
}

Let's analyze the structure of the dataset. The object containing all the data is split into two properties: offer and sales. The value of the offer key is an array of objects, each object representing one product of a clothes shop. Each product has five properties, of which two have array values, sizes and colors.

The values of the sales key is also an array of object, this one representing a list of items sold by the shop. Only the basic info for identifying the product is given there.

Retrieving data from arrays

After inspecting data, we're ready to start working with transformations. We'll try to fetch some data from an array first.

Indexing, slices and wildcards

One way of retrieving data from an array is indexing, where we specify the index, i.e., the position of the element in the array, in square brackets following the name of the array. Indices start at zero, so the index of the last element is equal to the length of the array minus one. Negative indices are valid as well, and there, the index of -i represents the ith element from the end of the array. Here's how to access some array elements using indices.

  // The entirety of the first item sold.
  theFirstSale: sales[0],

  // The color of the second item sold.
  theFirstSaleColor: sales[1].color,

  // Get the size of the last item sold.
  theLastSaleSize: sales[-1].size,
  testtest
  
  // Get the biggest three sizes of the first clothing item on offer.
  bigSizesOfFirstItem: offer[0].sizes[-3:],
  
  // Get all the product names.
  productNames: offer[*].product

The sales[i] expression results in an object here, so we access its properties as usual, with subexpressions. bigSizesOfFirstItem example contains a slice expression used for accessing portions of arrays, and productNames contains a wildcard. A wildcard expression acts as all indices at the same time - it acts with the expressions following it on all the elements of the offer array, and returns the results of the evaluation on all the elements as an array.

Consider the problem of finding all colors offered among the items. Using a wildcard expression [*] similar to the one in the productNames example above will yield an array of arrays. Here's where the flatten operator [] comes in; it unpacks subarrays, making a single-level array of all the subarray elements. Finally, we can use the distinct function to avoid duplicates.

  // Get a list of all distinct colors that we use in our products.
  colors: distinct(offer[*].colors[])

Filtering and current scope

We often need to fetch array elements that meet some criteria. This is done via filter expressions. Instead of an index, an expression that returns true or false is given, and a subarray of all the elements for which the expression evaluates to true is returned. Let's find all the items available in XXL.

  // Find the product names of items available in XXL.
  availableInXXL: offer[?contains(@.sizes, 'XXL')].product

Let's parse this.

  • The question mark ? indicates a filter expression is used, rather than a standard index.
  • The contains function returns true if the first array argument contains the second argument as an element.
  • @ is the current scope operator. The current scope in this case, as always in filter expressions, is set to each element of the queried array offer in turn - the truthfulness of the expression is checked for each element.
    • Since each element is an object, and we're interested in checking the sizes property of the object, we access it by @.sizes.
  • Thus, we receive a list of all elements of offer whose sizes property contains the element 'XXL'.
  • We specify the product key to only fetch the product names of the elements.

You can find more information about filtering, the current scope operator, and much more on the Expressions page.

Modifying arrays

After accessing the array elements, it's also good to learn some ways to modify arrays. If a new sale is made, it needs to be added to the sales array:

  // Since the data was acquired, a new sale was made. Update the sales array.
  updatedSales: append(sales, {
    // String literals are enclosed in '' in JMESPath.
    "product": 'Winter jacket',
    "size": 'M',
    "color": 'red'
  })

Here are a few other transformations that come in handy often. Note that the ampersand & denotes an expression, however, it is not necessary - technically, all identifiers of variables are expressions. It's sometimes used to emphasize the use of expressions in order to make the code easier to read.

  // One product from each category.
  differentCategories: distinct_by(offer, &category),

  // The offer has been expanded with a winter jacket in XXL. 
  // Create an array containing the new sizes of the Winter jacket.
  // Step 1: fetch the array of sizes of the Winter jacket.
  jacketSizes: offer[?product=='Winter jacket'].sizes[],

  // Step 2: Add XXL to the array of sizes of the Winter jacket.
  updatedJacketSizes: append($.jacketSizes, 'XXL'),


  // Get sales by product, i.e., group the sales entries concerning the same product type
  // Step 1: sort_by sorts the array by the product property
  sortedProducts: sort_by($.updatedSales, &product),

  // Step 2: use group_adjacent to group adjacent elements which have the same value of product
  salesByProduct: group_adjacent($.sortedProducts, &product),

  // Step 3: organize the structure of the object
  salesByProduct2: map({product: @[0].product, sales: @[*].{size: size, color: color}}, $.salesByProduct),


  // After selling many items, it's easier to consider sales per item with a given size and color.
  // Step 1: Create a bigger data set to see the transformation working.
  // The following transformation creates an array with 3 array elements and then flattens them out.
  tripledSales: [$.updatedSales, sales, sales][],

  // Step 2: Group sales per model, matching on all properties.
  groupedModels: group_adjacent(sort_by($.tripledSales, &to_string(@)), &@),

  // Step 3: Tidy up the data structure and add the number of items sold as a property.
  salesPerModel: map(merge(@[0], {sold: length(@)}), $.groupedModels),


  // Sort offer by price ascending.
  offerByPriceAscending: sort_by(offer, &price),

  // Sort offer by price descending, showing only product name and price.
  offerByPriceDescending: reverse($.offerByPriceAscending)[*].{product: product, price: price},

  // Find the combined price of sold items.
  // Step 1: join the sales data with the offer data to add prices to it.
  joinedData: list_join(offer, $.updatedSales, &product, &product),

  // Then each product from the left will have an array of such products sold on the right.
  // Step 2: The total sales value is the product of price from the left and 
  //      the length of the array of sold products on the right, 
  //      summed across all products.
  totalSales: sum(map(multiply(left[0].price, length(right)), $.joinedData))

If you want to read up in detail on any of these, check out the reference pages for array functions. Many expressions are used here as well - the expressions reference page documents them in more detail. Otherwise, learn more about operations on specific data types: