web analytics

Spotlight: Looping with ForEach

The purpose of this Spotlight is to demonstrate the mechanics of a simple, classic For loop within a Composable DataFlow.

Our scenario:

We are told to expect delivery of a plain-text dataset with the following column names: first name, last name, street address, city, state, and zip code.  We will want to add this data to a formal data structure, like an RDBMS datastore, and want to make sure that we don’t have spaces in any of the field names.  We will also want this transformation to be a part of our pipeline, and match the eventual datastore, so this is not a one-off, manual adjustment solution.

How could we cycle through this list to transform what needs transformation, and leave what doesn’t alone?

First, it makes sense to decide what that transformation would be.  There is no right answer here, just a selection among options. Since the original source used only lower case, I am inclined to not change case.  That means that, for readability’s sake, I’d rather replace the spaces with another character, rather than jamming words together.  For our task, let’s go with underscores replacing spaces and leave the rest as-is.

Our solution:

Like they would (and will soon again?) say on the everything-old-is-new-again game show Name That Tune, “I can build that solution in… four modules”. 

OK… they name songs within only a certain number of notes, but songs don’t get our data transformed, so I’ll stick with modules.

Module One:

First, we will isolate this discussion to just the loop, not the entire pipeline.   

Start by creating a new DataFlow.

There are several iterating modules available in Composable DataFlows.  Here we will choose the ForEach module to add to our canvas.

The ForEach module
Module Details for the ForEach module

We can see that the module is looking for a List to iterate through as input.  For our looping task, we just want to show how to manipulate the field names.  So, as a stand-in for an actual full dataset, we will simply need a list of strings to kick us off.

Module Two:

To build our list of strings, let’s use an Array Builder.

The Array Builder module
Module Details for the Array Builder module

We fill in our initial field names by clicking in the Input box.  We can use either the Builder or Text mode, as shown.

Array Builder module’s InputCollection, Builder mode
Array Builder module’s InputCollection, Text mode


 

Keep your additions by clicking Save & Close.

Since we are doing string manipulation, we want our OutputType to be String.  We are doing a simple one-level list, so we don’t need to check NestArrays.

The Array Builder module with chosen settings

Next, let’s change the module’s display name to Field Names. You could use the gear icon to pop-up Module Settings, but most folks will simply double-click in the module name to get a typing cursor.

Changing the title of the Array Builder module

Finally, let’s connect the output of our newly-named Field Names module to the List input of our ForEach module.

Connecting the first two modules of the DataFlow

If your line doesn’t look like mine and you want to change it, take a look at our DYK blog post on Changing Wire Type.

Module Three:

The next step may be a fresh idea for folks who are new to looping using Composable DataFlows.

Every programmatic approach to looping needs a way to signal the endpoint of the looping, one that serves as both a turnaround point as the loop continues, and also an exit point when the loop is done.  Depending on the programming language, how that endpoint is represented differs.  In Composable DataFlows, you will use one of the varieties of terminating modules.

The simplest version of these is Loop Terminator.

The Loop Terminator module
Module Details for the Loop Terminator module

This would be used in cases where the looping did something external, like working through a list of files or a message queue, and simply needed a signal that looping was complete so that the rest of the flow could continue.  Note that the inputs are Trigger, which is used by the iterating module to signal that it has finished, and Break, which allows for terminating the loop early.  The sole output is a Boolean to pass along execution when true.

However, for our purpose, we want to not only cycle and end the loop, but also accumulate a list of each of the transformed strings from each run of the loop and be able to pass along the final list after the loop is complete. 

To do that extra bit of functionality, instead of a Loop Terminator, we will use the Accumulator module.

The Accumulator module
Module Details for the Accumulator module

To connect this as the endpoint of our looping, we connect the LoopComplete output of the ForEach to the Trigger input of the Accumulator.

Connecting the outer loop structure together

For each iteration of the loop, the ForEach module signals the Accumulator whether the looping is complete.  On all of the passes except the final one, a false is passed, to let the Accumulator know to return for another cycle.  On the final pass, the ForEach sends a true so that the Accumulator knows to wrap things up after receiving the final input object.

All that’s left for our solution is to build the inner processing that is the purpose of the loop.  In our scenario, that is the string manipulation.

Module Four:

With the rest of our loop constructed, slipping our basic string manipulation into place feels easy.

As a reminder, our goal stated at the outset was to go through our list of field names and replace spaces with underscores.

To do that in a Composable DataFlow, we will use a String Replacer module.

The String Replacer module
Module Details for the String Replacer module

Using this module is fairly straightforward. 

Our input string will be handed off by the ForEach at each iteration.  The String Replacer will then have a space as the ToReplace input, and an underscore as the Replacement input.

We are not skipping any instances (SkipFirstN can stay at the default of 0) and we would want to replace every instance of a space in each InputString with an underscore, so there is no MaxReplacements value (null means “all”).

Finally, we connect the String Replacer’s output to the Accumulator’s Input input.

When we’re done, it should look like the DataFlow below.  Note that I selected the space in the ToReplace input so that it would highlight and be visible in the image, as opposed to looking empty.

Completing the DataFlow

If we’ve done this all correctly, we’re done building.  But any process needs testing, so we’ll do that next.

Testing:

I always like to Save before running anything, so I’ll do that here.

Next, I want to be able to step through the early stages of the loop to see what is happening, so I’ll switch from Run mode to Debug mode.

Selecting Debug Mode
Debug Mode selected

Now when I press Debug, I get the stepping controls at the top of the canvas and the blue color is drained out of the outputs and connectors until those tasks have been activated and completed.

Debug Mode initiated

Press the play button five times. That will allow each of our four modules to run once (the first click initiates the debug run and sets the first module to be ready to run, plus one click to run each module from that point forward), so that we can examine the first cycle of the loop to completion.

Reviewing the first iteration through the loop

Now, let’s use View Results from the outputs shown.

The ArrayOutput shows that the Field Names array has been created as desired and passed to ForEach’s List input.

ForEach’s Object, LoopComplete, and Current Iteration results show that the first item in the list (“first name”) is being passed to the String Replacer’s InputString, the Boolean false is being passed to the Accumulator’s Trigger, and we are on iteration 1.

The String Replacer’s OutputString is now “first_name”.

The Accumulator’s Result is not yet available because the loop is not yet finished.  This is a safety measure so that no results are passed out of the Accumulator until the finished results are ready to go.

Pressing the debugger’s play button three more times has us cycle the three-module loop one more time.


Reviewing the second iteration through the loop

This time, we can just check the outputs of the ForEach and the String Replacer, as the other values have not changed.

CurrentIteration 2 gives us the second item in the list (“last name”), tells the Accumulator that we’re still not done, and the String Replacer again did its job (“last_name”).

We could step through each iteration, particularly to see that “city” and “state” – which have no spaces – don’t bomb out, but let’s jump to the end by hitting the debugger’s Continue button (third of the three).

Now we can see all of the outputs at their final stage.

Reviewing the final iteration through the loop

After all six list items have passed through the ForEach (CurrentIteration 6), the ForEach’s LoopComplete passes the Boolean true to the Accumulator’s Trigger, signaling the last iteration of the loop.  The final list item has entered and been transformed by the String Replacer. Our Accumulator now shows that the loop is complete (LoopComplete is true) and makes the final accumulated list available as a Result.

Our items with spaces have had them replaced with underscores and those list items without spaces have passed along unscathed.

Success!

Wrap up:

We have built this DataFlow as a standalone, but obviously these same steps could be taken as part of a larger DataFlow, with real data being passed along both into and out of these modules.

In fact, you’ll see such a DataFlow in our upcoming Project Lab in our NASDAQ API series.  I will link here when that Lab is available.

Our soon-to-be-released next Spotlight post will loop to build a Fibonacci Series using DoWhile.  That post will also show how to pass values back into the loop (required for Fibonacci) and shows two more ways to terminate the looping (required by While loops).

Until next time…

Enjoy!