Composable provides special mechanisms for managing complexity of dataflows. Specifically, Composable allows you to reuse parts of your dataflow by encapsulating them as stand-alone custom modules.
When developing your dataflow applications in Composable, users have access to hundreds of “first-class” modules that are available immediately. Users may take a subset of these modules, expose the inputs and outputs, and save it as a custom module. For those familiar with traditional text-based programming languages, this is essentially the Composable equivalent of functions, and allows for code reuse.
Code resuse is the use of existing code across multiple applications. Software engineers employ systematic software reuse as a strategy for increasing productivity and improving quality. This is captured in the general programming philosophy of Don’t Repeat Yourself (DRY) that states “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.” This strategy holds true for flow-based programming, as well as traditional text-based programming, and is a key feature when authoring complex dataflows in Composable.
As an example of code reuse, let us assume that we need to perform an identical task many times, either within a single dataflow or by multiple distinct dataflows. For this example, the repeated task takes in a specific column of a table, performs an aggregation (e.g., SUM, AVG, or COUNT), and outputs the result. We can compose a dataflow for this task, utilizing first-class modules wrapped with External Input and External Output modules, and save it as its own dataflow. The External Input and Output modules essentially expose the inputs and outputs of this dataflow within a parent dataflow.
Here is what this dataflow will look like.
What we have are three inputs (the data table, the column name to be aggregated and the aggregate function) and one output (the resulting aggregate value). The inputs and outputs are exposed by use of the External Inputs and External Outputs modules, which will become module connections within the parent dataflow where this is used. Note that the value in the description box will become helper text for these connections, so be sure to make good use of these!
With the inputs and outputs exposed, our resulting dataflow can now be encapsulated into a single custom module.
Now, a parent dataflow may use this just like any other module. For example, we can reuse the above dataflow three times as follows:
Here, we upload a single CSV file, and flow the resulting data table into three instances of our custom module to perform a SUM, AVG and COUNT on a specific column.
Code (dataflow) reuse is an excellent way to organize and condense complex workflows. In conjunction with the Code module, you can use this reuse and modularity feature to generate your own custom modules. This can be helpful when no combination of existing first-class modules can achieve a required function.
As an example, assume that a repetitive data engineering task requires you to replace all occurrences of a given substring (in this case DATE_HOLDER) with a new string (the current date). This can be achieved as follows:
Note that here, we used the Python Code module, but you may use any number of languages (from Python to R to SAS…). The code module allows you to insert your own algorithm (in any number of languages).
The above dataflow is encapsulated as a single custom module.
And, with Composable being a collaborative platform, you may share this custom module with other users within your enterprise.
Latest posts by Lars Fiedler (see all)
- Code Reuse and Modularity in Composable - January 2, 2019
- DataOps: Straightforward Data Migration - December 31, 2018
- WIX: Scheduling a custom database upgrade to happen After=”InstallSqlData” and Before=”StartServices” - October 24, 2018