JOLT Guide

Index

  1. DISCLAIMER
  1. What is JOLT?
  1. How to start?
  1. How to use the JOLT window
  1. How to write a JOLT
  1. Beta Operations
  1. Exercises

DISCLAIMER

This manual serves as a handy reference for anyone encountering the JOLT processor in NiFi.

After having to interface with this tool, I have noticed that the online documentation is very poor. It lacks a well-written documentation that can fully cover the potential of this tool.

This guide is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the author or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the guide or the use or other dealings in the guide.
The author has made every effort to ensure the accuracy and completeness of information in this guide, however, he cannot guarantee that there will be no errors. The author shall not be responsible for any loss of profit or any other commercial damages resulting from the use of this guide.

Contributions are welcome!
If you're keen to enhance this documentation, I encourage you to submit a pull request. Please note that by doing so, you agree to license your contributions under the same terms as the original project. I look forward to seeing your contributions and appreciate your efforts to enrich this guide. Share your expertise and insights freely!


Wishing you find it useful!

What is JOLT?

JOLT stands for JSON to JSON Transformation. It is a powerful library and processor in NiFi that allows for the transformation and manipulation of JSON data. With JOLT, you can define a set of transformation rules to reshape, filter, and modify JSON objects, making it easier to work with JSON data in various applications and systems.

How to start?

To write a JOLT in Nifi, you will need the processor where to write it.
The processor in subject is the
JoltTransformJSON:

After opening the processors configuration you will have two options:

I highly recommend using the second option, as it provides a section for an input and output sample, and it offers a larger writing window.

And now everything is ready to go!

How to use the JOLT window

Processor properties

Advanced settings

The JOLT advanced settings window provides specific tools that assist in constructing and debugging JOLT specifications
Here is the window:

Now let’s analyze the function of each component:

How to write a JOLT

JOLT specification

In the realm of data transformation, particularly within the NiFi environment, the JOLT specification emerges as a powerful tool.

It's designed to offer a declarative manner of specifying JSON-to-JSON transformations, making it an essential utility for handling JSON data effectively.

The JOLT specification (or spec) comprises several key components, each serving a unique function in the transformation process.

The structure of a spec is something like this:

[
    {
      "operation":"...",//here goes the operation we want to use (instead of '...')
      "spec":{
        ...//body of the spec
      }
    }
  ]

Here we can see that the spec itself is written in JSON format, more specifically, it is represented as an array of JSON.

We see two keywords:

It’s possible to write more operations in a singular Specification, all it’s needed it’s a comma “,” that separate an operation from another:

[
    {
      "operation":"...",//here goes the operation we want to use (instead of '...')
      "spec":{
        ...//body of the first spec
      }
    },
    {
      "operation":"...",//here goes the operation we want to use (instead of '...')
      "spec":{
        ...//body of the second spec
      }
    },
    ...//other specs
  ]

JOLT operations

The JOLT transformation tool comprises several primary operations, each designed for specific aspects of JSON transformation:

  1. Shift:
    • Overview: This is the most commonly used operation in JOLT. It enables the reorganization or 'shifting' of data from the input JSON to a new structure in the output JSON. It's particularly useful for changing the hierarchy of data or renaming keys.
    • Usage: On the right side we have to put the name and the position we want the value to be, on the left side instead we have to put the key of the value that we want to write.
    • Examples:
    {
        "customers": {
          "name1": "Ross",
          "surname1": "Bing",
          "email1": "BgRoss@email.com",
          "birthDate1": "14/06/1996",
          "name2": "Chandler",
          "surname2": "Geller",
          "email2": "GLer@email.com",
          "birthDate2": "01/04/1987"
        }
      }

    In this case, we want to organize the json to be more readable, so the goal is to create a customer1 and a customer2 containing the name, surname and email.

    So we expect an output like this:

    {
        "Customer1" : {
          "name" : "Ross",
          "surname" : "Bing",
          "email" : "BgRoss@email.com"
        },
        "Customer2" : {
          "name" : "Chandler",
          "surname" : "Geller",
          "email" : "GLer@email.com"
        }
      }

    To obtain this result, we need to shift the elements in the json:

    [
        {
          "operation": "shift",
          "spec": {
            //We want to enter the dictionary with the key called 'customers'(because
            //all the data are in there), and to do so, we have to write it like this
            "customers": {
              //Now that we are in the 'customers' we want to create a 'Customer1' and
              //a 'Customer2' containing what we said before.
              //Writing "value":"something.else" we are saying that we want to put the
              //"value" in a dictionary named "something" with a key "else"
              "name1": "Customer1.name",
              "surname1": "Customer1.surname",
              "email1": "Customer1.email",
              "name2": "Customer2.name",
              "surname2": "Customer2.surname",
              "email2": "Customer2.email"
            }
          }
        }
      ]
    This operation could have been done in an easier way using * and &, but that we will see later.

    Another function of the shift operation is to remove undesired fields from a JSON, selecting only the desired ones.
    For example:

    {
        "customer": {
          "name": "Gunter",
          "surname": "Tribiani",
          "email": "TrEr@email.com",
          "birthDate": "12/11/1973",
          "address": "8392 West Penn Avenue",
          "gender": "M",
          "phone-number": "555-87924"
        }
      }

    And we only want the name, like this:

    {
        "CustomerName" : "Gunter"
      }

    The spec is very simple:

    [
        {
          "operation": "shift",
          "spec": {
            "customer": {
              "name": "CustomerName"
            }
          }
        }
      ]
    Instead of this we could have used the remove operation, but this way is much easier because, as we will see after, when using the remove we have to write every field we want to be removed; in cases like this, where we want to remove most of the fields, it’s quicker and more efficient to use the shift operation instead.

    To better understand how this operation works, it is suggested to read the operation line from right to left.
    For example:

    {
    ...
    "value": "path_to_the.key"
    ...
    }

  1. Default:
    • Overview: The Default operation is used to ensure that certain fields in the output JSON always have a value. If a key is present in the Default specification but not in the input JSON, it will be added to the output with the specified value.
    • Usage: It’s very easy to use, simply write the new value associated with the key you want.
    • Example:
    {
        "Name": "George",
        "Surname": "Kramer"
      }

    Let say we want to add more details to the json, like lists, numbers or dictionaries and make it look like this:

    {
        "Name" : "George",
        "Surname" : "Kramer",
        "Favorite things" : {
          "number" : -4,
          "colors" : [ "Red", "Blue", "Yellow" ]
        },
        "Age" : 35
      }

    So the spec will be this:

    [
        {
          "operation": "default",
          "spec": {
            "Age": 35,
            "Favorite things": {
              "number": -4,
              "colors": [
              "Red",
              "Blue",
              "Yellow"
            ]
            }
          }
        }
      ]

  1. Remove:
    • Overview: This operation allows for the removal of specified keys and their values from the JSON data. It's useful for cleaning up the data by getting rid of unnecessary or sensitive information.
    • Usage: This operation is really easy to use, since there is only one way to use it. We have to put on the left side the name of the field we want to remove, while the right side has to be an empty string(like this “”).
    • Example:
    {
        "Person": {
          "Name": "Roy",
          "Surname": "Moss",
          "Pets": {
            "Dogs": 2,
            "Cats": 1
          }
        },
        "Numbers": [
          1,
          2,
          3
        ]
      }

    We want to remove everything excepct for the name.

    Like this:

    {
        "Person" : {
          "Name" : "Roy"
        }
      }

    So we simply remove everything we don’t need:

    [
        {
          "operation": "remove",
          "spec": {
            "Person": {
              "Surname": "",
              "Pets": ""
            },
            "Numbers": ""
          }
        }
      ]
    To achieve the same result, we could have used the shift operator. In these cases, it is recommended to use remove only when there is a necessity to remove one or only a few elements from the JSON.

  1. Sort:
    • Overview: Sort operation organizes the keys of the JSON data in a specified order. This is especially helpful for ensuring consistent output formats and improving readability.
    • Usage: This operation has the most simple sintax out of all the operators, you only need to write the name of the operation and that’s it!
    • Example:
    {
        "People": {
          "2": {
            "Surname": "Wolowitz",
            "Name": "Sheldon"
          },
          "3": {
            "Surname": "Cooper",
            "Name": "Leonard"
          },
          "1": {
            "Surname": "Bloom",
            "Name": "Howard"
          }
        }
      }

    We want to sort the numbers inside People in order from the smaller to the bigger.

    The output should be like this:

    {
        "People" : {
          "1" : {
            "Name" : "Howard",
            "Surname" : "Bloom"
          },
          "2" : {
            "Name" : "Sheldon",
            "Surname" : "Wolowitz"
          },
          "3" : {
            "Name" : "Leonard",
            "Surname" : "Cooper"
          }
        }
      }

    So we simply write this:

    [
        {
          "operation": "sort"
        }
      ]
    This operation is powerful but limited. It is not possible to sort by the value, only by the key, and it would sort everything in the JSON. In fact if you look at the example, Nameand Surname changed position because of the sorting.

  1. Cardinality:
    • Overview: This is used to change the "many-ness" of elements within the input JSON. In simpler terms, it allows you to specify how many instances of a particular element should exist in the output JSON.
    • Usage: There are two uasge of this operation, it can insert a specified value as the first element of a list with the keyword “MANY”, or it can extract the first element of a list with the keyeord “ONE”.
    • Example:
    {
        "Clients": {
          "1": {
            "ID": "123"
          },
          "2": {
            "ID": "456"
          },
          "3": {
            "ID": "789"
          }
        },
        "RandomList": [
          1,
          2,
          3,
          "4",
          false
        ]
      }

    Let’s say we want to put Clients in a list and we want to extract the first element of RandomList.
    So the output should look like this:

    {
        "RandomList" : 1,
        "Clients" : [ {
          "1" : {
            "ID" : "123"
          },
          "2" : {
            "ID" : "456"
          },
          "3" : {
            "ID" : "789"
          }
        } ]
      }

    To obtain this, we will use both of the keywords, like this:

    [
        {
          "operation": "cardinality",
          "spec": {
            "Clients": "MANY",
            "RandomList": "ONE"
          }
        }
      ]
    This operation is primarily designed for changing the structure of data elements (e.g., converting a single item to a list or vice versa). It doesn't restructure or reformat the keys into list elements directly.

  1. Modify-default-beta: The Modify-default-beta operation in JOLT allows you to add new key-value pairs to your JSON or to provide default values for existing keys.
  1. Modify-overwrite-beta: This operation is used for modifying existing values or adding new ones to your JSON data.
Later, we will dedicate a section solely to the last two operations, which are both a lot powerful and useful.

Each operation is unique. In particular the last two are more peculiar and more powerful. But before seeing them, let's discuss another important subject of JOLT: the wildcards.

Wildcards

Wildcards in JOLT are special symbols used in JOLT specifications to generalize the matching of keys or indices in JSON data. They add a layer of flexibility and power to JOLT transformations, allowing you to write more dynamic and adaptable specifications.

Each wildcard has one or more functionalities, depending on the side of the transformation where it is placed.

When we refer to the sides of the transformation, we mean the left-hand side (LHS) and the right-hand side (RHS), like this:

"LHS":"RHS"
  //or
  "LHS":{...}

Now lets take a look at every wildcard and at its usage:

  1. Asterisk (*)
    • Usage: LHS
    • Operations: All
    • Function: The asterisk is the most common wildcard, representing any number of characters. It's used to match any key at a particular level in the JSON hierarchy.
    • Example: In a transformation, specifying "*" would match any key at that level. This is particularly useful when you want to apply the same transformation to multiple keys without knowing their exact names.

    Input:

    {
        "name": "Walden",
        "surname": "Harper",
        "age": 43,
        "Gender": "M"
      }

    We want to put everything into another key named “Customers”, so we do:

    [
        {
          "operation": "shift",
          "spec": {
            "*": "Customers.&"
            //without the '&' here, the result would've been an array
            //like this: "Customers":["Walden", "Harper", 43, "M"]
            //We will see later why
          }
        }
      ]

    And we obtain:

    {
        "Customers" : {
          "name": "Walden",
          "surname": "Harper",
          "age": 43,
          "Gender": "M"
        }
      }

    Another usage of the “*” is for completing the field that you’re searching for.

    So if we have this as our input:

    {
        "color0": "blue",
        "color1": "red",
        "color2": "black",
        "color": 3,
        "anotherField": 12
      }

    And we wanted only the keys that contain the prefix “color” + a number, we can search for “color*” meaning to take every field that starts with the string “color” and then every other chatacter(but at least one).

    So if we wrote the following spec:

    [
        {
          "operation": "shift",
          "spec": {
            "color*": "&"
            //without the '&' here, the result would've been an array
            //like this: [ "blue", "red", "black" ]
            //We will see later why
          }
        }
      ]

    We will get the following output:

    {
        "color0" : "blue",
        "color1" : "red",
        "color2" : "black"
      }
    Note that the element "color": 3 is not taken, this is because we are searching for something that’s called “color + *” where “*” cannot be an empty character.

  1. At Sign (@)
    • Usage: LHS and RHS
    • Operations:
      • Shift
      • Modify-default-beta
      • Modify overwrite-beta
    • Function: The at sign is used to reference the value of the current element being processed. It allows you to use the value of the current element in your transformation.
    • Example: In a shift operation, using "@(n,key)" would reference the value of the key at n level up, where n is how many levels we want to go up from the current context + 1.
    The +1 is there because the counting starts from 1 and not from 0, so when on the same level, n will be 1.

    Input:

    {
        "First_name": "Lisa",
        "Student": {
          "Number": "3618",
          "Recognition_by": "Student_id",
          "Last_name": "Flanders"
        }
      }

    Let's say we want to restructure the JSON in a more readable way. For example, we want to obtain the following output:

    {
        "Student" : {
          "Name" : "Lisa",
          "Surname" : "Flanders",
          "Student_id" : "3618"
        }
      }

    As we can see, we made these operations:

    1. Shifted the First_name in the Student dictionary.
    1. Took the value of Recognition_by and made it a key with the value of Number.

    So the spec will be this:

    [
        {
          "operation": "shift",
          "spec": {
            "Student": {
              //We want to go out of Student and take "First_name", so we
              //go up by one level, therefore n = 1 + 1 ->2
              "@(2,First_name)": "Student.Name",
              "Last_name": "Student.Surname",//Without this we will lose the Last_name
              //And now we want to name a key as one of the values we already have,
              //like in this case we want the value of "Recognition_by" to be the key
              //of the "Number" value.
              "Number": "Student.@(1,Recognition_by)"
            }
          }
        }
      ]

    There are two @ in the spec, the first one has to exit from the context it’s in (Student), so up 1 level → n=2.
    The second one needs to remain in the same context (technically 0 level up) → n=1.

  1. Hash Sign (#)
    • Usage: LHS and RHS (only in shift operation)
    • Operations:
      • Shift
      • Modify-default-beta
      • Modify-overwrite-beta
    • Function: The hash sign has two functionalities:
      1. on the LHS it is used to assign a default value to a field;
      1. on the RHS it reference the key of the current element.

      It's useful for operations where you want to use the name of the current key in the transformation.

    • Example: First let’s see an LHS example:
    {
        "Workers": 15
      }

    We want to add more informationto the json using the shift operation, so the output will look something like this:

    {
        "Office" : {
          "City" : "Scranton",
          "Employees" : 15
        }
      }

    So we use this spec:

    [
        {
          "operation": "shift",
          "spec": {
            "Workers": "Office.Employees",
            //Here we wrote # followed by the value we wanted
            "#Scranton": "Office.City"
          }
        }
      ]
    Basically it mimics the default operation inside the shift operation

    Now let’s see an example with the wildcard on the RHS:

    In a transformation, using "[#n]" would reference the name of the key n levels up in the hierarchy.

    So for example:

    {
        "Employers": [
          {
            "Name": "Michael",
            "Role": "Manager"
          },
          {
            "Name": "David",
            "Role": "CTO"
          }
        ],
        "Employees": [
          {
            "Name": "Oscar",
            "Role": "Accounting"
          },
          {
            "Name": "Creed",
            "Role": "Quality assurance"
          },
          {
            "Name": "Jim",
            "Role": "Salesman"
          }
        ]
      }

    And let’s say we want to obtain the following output:

    {
        "Office" : [ {
          "Company" : "Munder Difflin",
          "Name" : "Oscar",
          "Role" : "Accounting"
        }, {
          "Company" : "Munder Difflin",
          "Name" : "Creed",
          "Role" : "Quality assurance"
        }, {
          "Company" : "Munder Difflin",
          "Name" : "Jim",
          "Role" : "Salesman"
        } ]
      }

    We took only the records from Employees and then added the default Company field.

    To do this, we used both the LHS and RHS # wildcard:

    [
        {
          "operation": "shift",
          "spec": {
            "Employees": {
              "*": {
                "Name": "Office[#2].&",
                "Role": "Office[#2].&",
                // We are saying to create a dictionary with the key 'Office' and
                // the value set as an array([]), then we use the # in the brackets to
                // tell how many levels it has to go up for collecting the values.
                // In this case we put 2, considering level 1 would have grouped
                // all the fields together on the lower level.
                "#Munder Difflin": "Office[#2].Company"
                // With this we assign to every record a default value Company
              }
            }
          }
        }
      ]

    Understanding how levels work in this case may not be very simple. To grasp the concept better, I personally suggest trying out some examples by putting different numbers after the # sign and observing the changes.

  1. Ampersand (&)
    • Usage: RHS
    • Operations:
      • Shift
      • Modify-default-beta
      • Modify overwrite-beta.
    • Function: The ampersand is utilized to reference the entire structure (both key and value) of the current element. This wildcard search for the key put on the LHS of the spec and assign the same value to it.
    • Example:

      Let’s look at an easy exmple to better understand this wildcard:

      {
          "Example": {
            "Name": "Thomas",
            "Surname": "Jeremy",
            "Age": 84
          }
        }

      And now, suppose we only want certain information. The output might look like this:

      {
          "Age" : 84,
          "Surname" : "Jeremy"
        }

      We can obtian this output by simply writing “Key”:”Key” for every value we want, but there is a simplier way to do it by using the & :

      [
          {
            "operation": "shift",
            "spec": {
              "Example": {
                "Age": "&",
                "Surname": "&",
                // The notation "Age": "&" above could have been written as "Age":"Age"
                // without changing the output.
                // By using "&", the spec searches for a key with the same name as the
                // one on the LHS (Age and Surname).
                // If it finds a match, it assigns the same key name.
                "fullname": "&"
                // Here instead, we tell the spec to search for a key called "fullname"
                // but we can see that that key doesn't exist, so therefore, it writes
                // nothing.
              }
            }
          }
        ]
        

      The & is primarily used in conjunction with the *, as it processes each element using only a single line.

      For example if we had this input:

      {
          "Data": [
            {
              "0": "Name"
            },
            {
              "1": "Surname",
              "2": "Fullname"
            },
            {
              "3": "Age",
              "4": "Email",
              "5": "Phone number"
            },
            {
              "6": "Gender"
            }
          ]
        }

      And we want an array with all the information like this:

      [ "Name", "Surname", "Fullname", "Age", "Email", "Phone number", "Gender" ]

      We could manually write each line, but the input may change or grow to include many more elements.

      So we can take advantage of the combination of * and & and write this spec:

      [
          {
            "operation": "shift",
            "spec": {
              "Data": {
                "*": {
                  "*": "[&]"
                  // This line is the same as if someone had put manually every single
                  // Key:Key association.
                  // Take note that without the [] the output would have been a json with:
                  // {"0":"Name", "1":"Surname", ...}
                }
              }
            }
          }
        ]
        
      It’s possible to use &n where n indicates the number of level you want to go up in the call.
      For example if we had put
      “*”:”[&1]” in the previous spec, the output would have been like this:

      ["Name",[ "Surname", "Fullname" ], [ "Age", "Email", "Phone number" ],"Gender"]

  1. Dollar Sign ($)
    • Usage: LHS
    • Operations: Shift
    • Function: The dollar sign is used to extract the name of a field or object from the input JSON and use it as the value in the output JSON.
    • Example:

      We have this JSON:

      	{
          "Info": {
            "Name": "Marshal",
            "Surname": "Aldrin",
            "Age": 28,
            "Work": "Lawyer"
          }
        }

      And we want a list of all the fields, like this:

      [ "Name", "Surname", "Age", "Work" ]

      The spec is pretty simple:

      [
          {
            "operation": "shift",
            "spec": {
              "Info": {
                "*": {
                  "$": "[]"
                  // Here we say to put in a list every Key in the "Info" Json
                }
              }
            }
          }
        ]

Beta Operations

Previously, we observed two special JOLT operations, known as beta operations. Let's explore what they are and why they are considered special:

  1. Modify-default-beta:
    • Description: The modify-default-beta operation offers enhanced capabilities compared to the default operation. While the default operation allows only the addition of new elements at the top level of the JSON structure, modify-default-beta enables modification within the pre-existing elements of the input JSON. This means you can navigate into the structure of the input JSON and insert new elements where needed, enriching the data without altering the original schema unnecessarily.

    • Example: Consider the following input JSON that lists doctors along with the average number of patients they see:
      {
          "Doctors": [
            {
              "Fullname": "Gregory Wilson",
              "Average Patients": 5
            },
            {
              "Fullname": "James Cuddy",
              "Average Patients": 2
            },
            {
              "Fullname": "Lisa House",
              "Average Patients": 4
            }
          ]
        }

      Suppose we need to append a "Hospital" attribute to each doctor's record, with the value "Princeboro-Plainston" for each. The desired output would look like this:

      {
          "Doctors" : [ {
            "Fullname" : "Gregory Wilson",
            "Average Patients" : 5,
            "Hospital" : "Princeboro-Plainston"
          }, {
            "Fullname" : "James Cuddy",
            "Average Patients" : 2,
            "Hospital" : "Princeboro-Plainston"
          }, {
            "Fullname" : "Lisa House",
            "Average Patients" : 4,
            "Hospital" : "Princeboro-Plainston"
          } ]
        }

      To achieve this transformation, the modify-default-beta operation allows us to seamlessly introduce the new "Hospital" attribute into each element of the "Doctors" array, maintaining the integrity of the original data structure while adding the required new information.
      Like this:

      [
          {
            "operation": "modify-default-beta",
            "spec": {
              "Doctors": {
                "*": {
                  "Hospital": "Princeboro-Plainston"
                }
              }
            }
          }
        ]
      We could have used a shift operator instead, but in this way it’s quicker and easier.
      The shift would have been something like this:

      [
          {
            "operation": "shift",
            "spec": {
              "Doctors": {
                "*": {
                  "*": "Doctors[&1].&0",
                  "#Princeboro-Plainston": "Doctors[&1].Hospital"
                }
              }
            }
          }
        ]

  1. Modify-overwrite-beta:
    • Description: The modify-overwrite-beta operation allows for the modification of existing values within the input JSON. This operation differs from modify-default-beta by providing the capability to overwrite existing elements. It is particularly useful for updating information directly, enabling changes to be applied to specific data points within your JSON structure without affecting the overall schema.
      In addition, this operation comes with the possibility to use supportive functions to access and modify the JSON structure.

    • Example: Let's consider an input JSON that catalogues members of a family and their respective occupations:
      {
          "Family": [
            {
              "Name": "Peter Swanson",
              "Occupation": "Safety Inspector"
            },
            {
              "Name": "Lois Brown",
              "Occupation": "Piano Teacher"
            },
            {
              "Name": "Stewie Quagmire",
              "Occupation": "Student"
            }
          ]
        }

      And now we want to update the Occupation of the third element, let’s say we expect something like this:

      {
          "Family" : [ 
          {
            "Name" : "Peter Swanson",
            "Occupation" : "Safety Inspector"
          }, 
          {
            "Name" : "Lois Brown",
            "Occupation" : "Piano Teacher"
          }, 
          {
            "Name" : "Stewie Quagmire",
            "Occupation" : "Scientist"//<--- the change from Student to Scientist
          } 
          ]
        }

      To do this, we simply access the wanted element and overwrite the field:

      [
          {
            "operation": "modify-overwrite-beta",
            "spec": {
              "Family": {
                "[2]": {
                  "Occupation": "Scientist"
                }
              }
            }
          }
        ]



      But now let’s take a look at all the support functions I mentioned before:

      • Numbers functions
        • =intSum()
          • Parameters: 1 List of integer numbers
          • Return: the sum of every number in the list as an integer
          • Example:
            "test": "=intSum(2,3,4,21)"
              //Output:
              "test" : 30
        • =doubleSum()
          • Parameters: 1 list of double numbers
          • Return: the sum of every number in the list as a double
          • Example:
            "test": "=doubleSum(2.5,3.2,4.1,20.2)"
              //Output:
              "test" : 30.0
        • =avg()
          • Parameters: 1 list of numbers
          • Return: the avarage of the list as a double
          • Example:
            "test": "=avg(-2.1,40,3.1,110,-1)"
              //Output:
              "test" : 30.0
        • =sort()
          • Parameters: 1 list of numbers
          • Return: the parameter list sorted in ascending order
          • Example:
            "test": "=sort(2,4,1,-5,102,6)"
              //Output:
              "test" : [ -5, 1, 2, 4, 6, 102 ]
        • =min()
          • Parameters: 1 list of numbers
          • Return: the smaller number in the list
          • Example:
            "test": "=min(21,4,5,3)"
              //Output:
              "test" : 3
        • =max()
          • Parameters: 1 list of numbers
          • Return: the bigger number in the list
          • Example:
            "test": "=max(21,4,5,3)"
              //Output:
              "test" : 21
        • =abs()
          • Parameters: 1 list of numbers
          • Return: 1 list of each number put in absolute value
          • Example:
            "test": "=abs(-22,5,-3.3421)"
              //Output:
              "test" : [ 22, 5, 3.3421 ]
        • =divide()
          • Parameters: 1 list with a max of 2 numbers (the second number must not be 0)
          • Return: the division between the 2 element in the list as a double
          • Example:
            "test": "=divide(12,4)"
              //Output:
              "test" : 3.0
        • =divideAndRound()
          • Parameters: a number indicating how many decimal numbers are accepted after the point, 1 list with a max of 2 numbers (the second number must not be 0)
          • Return: the division between the 2 elements in the list as a float with n decimal numbers rounded
          • Example:
            "test": "=divideAndRound(1,13,4)"
              //Output:
              "test" : 3.3
      • String functions
        • =concat()
          • Parameters: 1 list
          • Return: a string with every element of the list concatenated
          • Example:
            "test": "=concat('Hello ', 'World ', 0)"
              //Output:
              "test" : "Hello World 0"
        • =join()
          • Parameters: the character separating each element of the final string, 1 list
          • Return: a string with each element of the list separated by the first parameter
          • Example:
            "test": "=join('-',1,2,'3','four')"
              //Output:
              "test" : "1-2-3-four"
        • =split()
          • Parameters: the character by which the final string is to be split, 1 string
          • Return: a list of strings from the parameter string but splitted by the parameter character
          • Example:
            "test": "=split('-','a-b-c-de')"
              //Output:
              "test" : [ "a", "b", "c", "de" ]
        • =trim()
          • Parameters: 1 list of strings
          • Return: a list of strings trimmed(the trim only happen for the first and last spaces, not the ones in the middle)
          • Example:
            "test": "=trim('   Hello World !    ')"
              //Output:
              "test" : "Hello World !"
        • =substring()
          • Parameters: 1 string, the initial index for the substring, the last index for the substring
          • Return: a string that is a substring of the parameter string
          • Example:
            "test": "=substring('Hello World!',0,5)"
              //Output:
              "test" : "Hello"
        • =toLower()
          • Parameters: 1 list of strings
          • Return: a list with each string to lowercase
          • Example:
            "test": "=toLower('HELLO MY NAME IS')"
              //Output:
              "test" : "hello my name is"
        • =toUpper()
          • Parameters: 1 list of strings
          • Return: a list with each string to uppercase
          • Example:
            "test": "=toUpper('what?')"
              //Output:
              "test" : "WHAT?"
        • =leftPad()
          • Parameters: 1 string, the total amount of chars the final string has to have, the char to use for empty remainng spaces
          • Return: a string with the desired size, with the parameter char used on the left
          • Example:
            "test": "=leftPad('Slim',6,'_')"
              //Output:
              "test" : "__Slim" //<--- the size is 6
        • =rightPad()
          • Parameters: 1 string, the total amount of chars the final string has to have, the char to use for empty remainng spaces
          • Return: a string with the desired size, with the parameter char used on the right
          • Example:
            "test": "=rightPad('Shady',6,'_')"
              //Output:
              "test" : "Shady_" //<--- the size is 6
      • List functions
        • =firstElement()
          • Parameters: 1 list
          • Return: the first element of the list
          • Example:
            "test" : "=firstElement(9,4,1,6,'hello',true)"
              //Output:
              "test" : 9
        • =lastElement()
          • Parameters: 1 list
          • Return: the last element of the list
          • Example:
            "test" : "=lastElement(9,4,1,6,'hello',true)"
              //Output:
              "test" : true
        • =size()
          • Parameters: 1 list
          • Return: a number indicating the size of the list in input
          • Example:
            "test": "=size(9,4,1,6,'hello',true)"
              //Output:
              "test" : 6
        • =sort()
          • Parameters: 1 list
          • Return: the list sorted in ascending order. It can only takes a list of the same type values(a list of only numbers, or strings, or bools)
          • Example:
            "wrongTest": "=sort('Hello ', 'World ', 0)",
              "test": "=sort(9,8,7,6,5)"
              //Output:
              "test" : [ 5, 6, 7, 8, 9 ]
              //wrongTest won't appear, because the list is a mix of different types
        • []

          The brackets are used to access the index n of the array.

          Example:

          //Input:
            {
              "test": [null, 3, "test", 5, true, 100]
            }
            //spec:
            [
              {
                "operation": "modify-overwrite-beta",
                "spec": {
                  "firstIndex": "@(1,test[0])",
                  "randomIndex": "@(1,test[3])",
                  "outOfLength": "@(1,test[18])"
                }
              }
            ]
            //Output:
            {
              "test" : [ null, 3, "test", 5, true, 100 ],
              "firstIndex" : null,
              "randomIndex" : 5
              //outOfLength is not shown because the index 18 does not exist in the
              //list
            }

      • Type functions
        • =toInteger()
          • Parameters: 1 list
          • Return: a list where all the elements that can be casted into integers are converted to such
          • Example:
            "test": "=toInteger('8','2.435','word')"
              //Output:
              "test" : [ 8, 2, "word" ]
        • =toDouble()
          • Parameters: 1 list
          • Return: a list where all the elements that can be casted into doubles are converted to such
          • Example:
            "test": "=toDouble('8','2.435','word')"
              //Output:
              "test" : [ 8.0, 2.435, "word" ]
        • =toLong()
          • Parameters: 1 list
          • Return: a list where all the elements that can be casted into longs are converted to such
          • Example:
            "test": "=toLong('834783632873','word')"
              //Output:
              "test" : [ 834783632873, "word" ]
        • =toString()
          • Parameters: 1 list
          • Return: a list where all the elements are converted to strings
          • Example:
            "test": "=toString(1, null, true)"
              //Output:
              "test" : [ "1", "null", "true" ]
        • =toBoolean()
          • Parameters: 1 list
          • Return: a list where all the elements that can be casted into booleans are converted to such
          • Example:
            "test": "=toBoolean('0',1,'true')"
              //Output:
              "test" : [ "0", 1, true ]
        • =size()
          • Parameters: 1 list or dictionary
          • Return: the size of the element in the parameter
          • Example:
            //Input:
              {
                "test": [
                  {
                    "0": 1,
                    "2": "3"
                  },
                  {
                    "1": true
                  },
                  {
                    "three": [
                      1,
                      2,
                      3,
                      4,
                      5
                    ]
                  }
                ]
              }
              //spec
              "test": "=size(@(1,test))"          //1° example
              "test": "=size(@(1,test[2]))"       //2° example
              "test": "=size(@(1,test[2].three))" //3° example
              //Output:
              "test" : 3  //1° example
              "test" : 2  //2° example
              "test" : 5  //3° example
        • =recursivelySquashNulls
          • Parameters: no parameters nor brackets
          • Return: an object without all the null elements in it(deep search for every element and sub-element)
          • Example:
            //Input:
              {
                "test": {
                  "name": "chka",
                  "surname": null,
                  "other": {
                    "toBeCancelled": null
                  }
                }
              }
              //spec
              "test": "=recursivelySquashNulls"
              //Output:
              {
                "test" : {
                  "name" : "chka",
                  "other" : { }
                }
              }
              
        • _

          This one is a bit odd. When put at the beginning of a LHS, the_ means “add this element if not already present”.

          Like in this example:

          //Input:
            {
              "tests": [
                {
                  "test": "done",
                  "result": 1
                },
                {
                  "test": "not done",
                  "result": 0
                },
                {
                  "result": null
                }
              ]
            }
            //spec:
            [
              {
                "operation": "modify-overwrite-beta",
                "spec": {
                  "tests": {
                    "*": {
                      "_test": "undefined"
                    }
                  }
                }
              }
            ]
            //Output:
            {
              "tests" : [ {
                "test" : "done",
                "result" : 1
              }, {
                "test" : "not done",
                "result" : 0
              }, {
                "result" : null,
                "test" : "undefined"
              } ]
            }
      When the parameter is a list, it means it could be either this:

      "test":"=sort(4,3,2,1)"
      or this:

      "test":"=sort(@(1,list))"

      Using expression language in beta operations

      In NiFi, the Expression Language allows dynamic access to the attributes of a FlowFile, enhancing the flexibility of operations. To incorporate Expression Language in your Jolt specifications, you need to follow this syntax:

      Ensure you encapsulate the expression within ${} to signify the use of NiFi's Expression Language. For example, if you wish to reference the attribute filename within a Jolt transformation, use ${filename} within your Jolt specification.

      This feature enables dynamic data transformations in NiFi, adapting to data attributes on-the-fly and significantly boosting the capability of your data flows.

      Additionally, NiFi's Expression Language includes a variety of functions that can be utilized to manipulate data dynamically.

      These functions enable more complex and conditionally driven transformations, adapting on-the-fly to the data attributes within your FlowFile. This capability significantly enhances the power and flexibility of your data flows in NiFi.
      To use this functions the sintax is simple:

      ...  //beta operation
        //you can call the function without calling an attribute
        "example1":"${function()}",
        //you can call the function on an attribute
        "example2":"${attribute:anotherFunction()}",
        //you can use this sintax multiple times
        "example3":"${attribute:yetAnotherFunction()}${anotherAttribute:tooManyFunctions()}"
        ...

      Please note that within a single line, you cannot combine Expression Language(EL) syntax with Jolt function calls. Each side of an expression (RHS) must exclusively contain either EL expressions or Jolt methods, but not both. Mixing both types in a single line is unsupported and will lead to errors in processing. Ensure that you maintain this separation to avoid issues in your data transformations.

Exercises:

Now I will provide some inputs and outputs for you to practice with.

These examples can serve as a method to practice using Jolt or even to check if I accidentally solved a use case similar to the one you are working on.

If you want to solely focus on practicing, I suggest trying to find your solution first and then comparing it to mine.

Remember, there isn't always only one solution to a problem, so you might discover other approaches as well :) .

Example 1:

Input:

{
    "Employers": [
      {
        "Name": "Michael",
        "Role": "Manager"
      },
      {
        "Name": "David",
        "Role": "CTO"
      }
    ],
    "Employees": [
      {
        "Name": "Oscar",
        "Role": "Accounting"
      },
      {
        "Name": "Creed",
        "Role": "Quality assurance"
      },
      {
        "Name": "Jim",
        "Role": "Salesman"
      }
    ]
  }

Output:

{
    "Office" : [ {
      "Company" : "Munder Difflin",
      "Name" : "Michael",
      "Role" : "Manager"
    }, {
      "Company" : "Munder Difflin",
      "Name" : "David",
      "Role" : "CTO"
    }, {
      "Company" : "Munder Difflin",
      "Name" : "Oscar",
      "Role" : "Accounting"
    }, {
      "Company" : "Munder Difflin",
      "Name" : "Creed",
      "Role" : "Quality assurance"
    }, {
      "Company" : "Munder Difflin",
      "Name" : "Jim",
      "Role" : "Salesman"
    } ]
  }

Solution:

[
    {
      "operation": "shift",
      "spec": {
        "Employers": {
          "*": "Office[]"
        },
        "Employees": {
          "*": "Office[]"
        }
      }
    },
    {
      "operation": "shift",
      "spec": {
        "Office": {
          "*": {
            "Name": "Office[#2].&",
            "Role": "Office[#2].&",
            "#Munder Difflin": "Office[#2].Company"
          }
        }
      }
    }
  ]

Example 2:

Input:

{
    "Users": [
      {
        "Fullname": "Ted Stinson",
        "ID": "1"
      },
      {
        "Fullname": "Robin Mosby",
        "ID": "2"
      },
      {
        "Fullname": "Barney Eriksen",
        "ID": "3"
      }
    ],
    "Company": "Mymih"
  }

Output:

{
    "Users" : [ {
      "Fullname" : "Ted Stinson",
      "Company" : "Mymih"
    }, {
      "Fullname" : "Robin Mosby",
      "Company" : "Mymih"
    }, {
      "Fullname" : "Barney Eriksen",
      "Company" : "Mymih"
    } ]
  }

Solution:

[
    {
      "operation": "shift",
      "spec": {
        "Users": {
          "*": {
            "@(0,Fullname)": "Users.[&].Fullname",
            "@(2,Company)": "Users.[&].Company"
          }
        }
      }
    }
  ]

Example 3:

Input:

{
    "Info": {
      "Name": "Marshal",
      "Surname": "Aldrin",
      "Age": 28,
      "Work": {
        "Employed": false,
        "Job": "none"
      }
    }
  }

Output:

{
    "scalars" : [ "Name", "Surname", "Age", "Work" ],
    "other" : [ "Employed", "Job" ]
  }

Solution:

[
    {
      "operation": "shift",
      "spec": {
        "Info": {
          "*": {
            "*": {
              "*": {
                "$1": "other.[]"
              }
            },
            "$": "scalars[]"
          }
        }
      }
    }
  ]
  

Example 4:

Input:

{
    "features": [
      {
        "id": "1",
        "type": "String",
        "others": {
          "type": "Integer",
          "size": null
        },
        "properties": {
          "specific": {
            "firstKey": {
              "id": 1,
              "externalId": "5001",
              "name": "Johnathan",
              "age": 23,
              "group": "First",
              "alive": true
            },
            "secondKey": {
              "id": 1,
              "randomVal": "1s4726f32Q9",
              "ecc": "etera",
              "timestamp": "2024-05-13T13:56:53.119Z"
            },
            "list": []
          }
        }
      },
      {
        "id": "2",
        "type": "Double",
        "others": {
          "type": "Integer",
          "size": 3
        },
        "properties": {
          "specific": {
            "firstKey": {
              "id": 2,
              "externalId": "5002",
              "name": "Joseph",
              "age": 19,
              "group": "Second",
              "alive": true
            },
            "secondKey": {
              "id": 1,
              "randomVal": "893y4h3a39E2",
              "ecc": "etera",
              "timestamp": "2024-10-01T04:15:33.109Z"
            },
            "list": []
          }
        }
      }
    ]
  }

Output:

[ {
    "firstKey" : {
      "id" : 1,
      "externalId" : "5001",
      "name" : "Johnathan",
      "age" : 23,
      "group" : "First",
      "alive" : true
    },
    "secondKey" : {
      "id" : 1,
      "randomVal" : "1s4726f32Q9",
      "ecc" : "etera",
      "timestamp" : "2024-05-13T13:56:53.119Z"
    },
    "list" : [ ]
  }, {
    "firstKey" : {
      "id" : 2,
      "externalId" : "5002",
      "name" : "Joseph",
      "age" : 19,
      "group" : "Second",
      "alive" : true
    },
    "secondKey" : {
      "id" : 1,
      "randomVal" : "893y4h3a39E2",
      "ecc" : "etera",
      "timestamp" : "2024-10-01T04:15:33.109Z"
    },
    "list" : [ ]
  } ]
  

Solution:

[
    {
      "operation": "shift",
      "spec": {
        "features": {
          "*": {
            "properties": {
              "specific": "[]"
            }
          }
        }
      }
    }
  ]
  

Example 5:

Input:

[
    {
      "id": 345,
      "name": "Edd",
      "birthday": "20/09/99",
      "info": null
    },
    {
      "id": 634,
      "name": "Ed",
      "birthday": "16/08/98",
      "info": "language:english,vote:F,level:C1"
    },
    {
      "id": 809,
      "name": "Eddie",
      "birthday": "29/07/97",
      "info": "language:english,vote:D,level:C2,language:french,vote:F,level:A2"
    }
  ]

Output:

[ {
    "id" : 345,
    "name" : "Edd",
    "birthday" : "20/09/99"
  }, {
    "id" : 634,
    "name" : "Ed",
    "birthday" : "16/08/98",
    "language" : [ "english" ],
    "vote" : [ "F" ],
    "level" : [ "C1" ]
  }, {
    "id" : 809,
    "name" : "Eddie",
    "birthday" : "29/07/97",
    "language" : [ "english", "french" ],
    "vote" : [ "D", "F" ],
    "level" : [ "C2", "A2" ]
  } ]
  

Solution:

[
    {
      "operation": "modify-overwrite-beta",
      "spec": {
        "*": {
          "list": "=split(',',@(1,info))"
        }
      }
    },
    {
      "operation": "shift",
      "spec": {
        "*": {
          "*": "&1.&",
          "list": {
            "*": "&2.el[].a"
          }
        }
      }
    },
    {
      "operation": "modify-overwrite-beta",
      "spec": {
        "*": {
          "el": {
            "*": {
              "i": "=split(':',@(1,a))"
            }
          }
        }
      }
    },
    {
      "operation": "shift",
      "spec": {
        "*": {
          "*": "[&1].&",
          "el": {
            "*": {
              "@(0,i[1])": "[&3].@(0,i[0])[]"
            }
          }
        }
      }
    },
    {
      "operation": "remove",
      "spec": {
        "*": {
          "info": ""
        }
      }
    }
  ]
  

Additional Sources for Learning Jolt:

The following sites are where I learned how to use Jolt:

JOLT Guide © 2024 by Luca Biscotti is licensed under Creative Commons Attribution 4.0 International