web analytics
Press "Enter" to skip to content

Correcting Pesky Timezones in a DataFlow

Ryan O'Shea 0

Anyone who works with lots of data can attest that dealing with time and timezones can be tricky business. At Composable, we orchestrate millions of pieces of date/time data through our platform each day. Often, when working with dates, you’ll run into some data where timezones are important, but the data was stored without them. Or, even worse, at some point in your data pipeline, a program made an assumption about your data’s timezone and altered the data itself. When we design our DataFlow modules, we do our best to preserve any time data that may flow through them, but no set of assumptions is perfect for all situations.

To help solve this problem at scale, we’ve introduced a new module, the Table Column Timezone Converter. In this post, we’ll briefly explore how it can help you ensure your data’s integrity is preserved all the way to your final destination.

Let’s use an example. In our use case, we have the simple table pictured below.

Let’s just ignore for the moment that having a specific time of day is an odd choice for a Date of Birth field…

We have two columns: DOB and Name. Importantly, DOB is declared as a DateTimeOffset column in the table, like we might find if we had queried this table from a SQL Server database. (If your data isn’t yet a DateTimeOffset, you can use our Table Column Type Converter module to specify it yourself.) Take a quick look at the date values — they all have the same date and hour/minute, but differ in their timezone. Ryan’s DOB is at UTC-4, Sophia’s is UTC+10, and Steve’s has no time zone at all (Composable will assume local time in this case, which in this case gives Steve’s DOB a UTC-4 timezone, just like Ryan).

Let’s say that for our use case, that our client wants all of these dates to be written to a database in UTC-7 (Mountain Time). Let’s take a look at our new module and see what we can do with it.

The Table Column Timezone Convert module takes in our table as its first input, then accepts some configuration settings. First, we’ll tell it that we’d like to convert just the DOB column, but the module supports any number of date columns for simultaneous conversion. Next, we’ll specify the offset that we’d like our DOB column to assume. In this, case, we choose -7 hours and 0 minutes, or UTC-7:00.

The final configuration option depends on our use case. If we know that all of our DOB values are correct in their particular timezones, we’ll probably want to convert them to UTC-7, changing the hour appropriately to preserve the moment in time. For this use case, we’ll want to enable the ConvertTimes option, as pictured above. If we run this dataflow with our table as the input, we’ll get the following result:

Each of our times have been converted. Ryan and Steve, which both had (or were interpreted as) a UTC-4 offset, have been converted to 2:53PM UTC-7. And Sophia has made the full 17-hour transition from UTC+10 to UTC-7, winding her 5:53PM back to 12:53AM.

However, we don’t always get lucky with clean data, and sometimes the timezones are just wrong. If we disable the ConvertTimes option, we can produce a different result:

With ConvertTimes turned off, the 5:53PM timestamp has been preserved for each row, but the timezone offset has been overwritten to our desired value. This version is also useful just as a normalization step. If you instead had some data that you knew were all 2 hours off but didn’t share a timezone, you could run it through this module once with ConvertTimes turned off to normalize the timezones, then again with ConvertTimes enabled to perform the 2-hour shift required. We think this will help you gain better control over annoying time data in all of your DataFlows. Give the Table Column Timezone Convert module a try in your next application.

Ryan O'Shea

Ryan O'Shea

Ryan is a software engineer at Composable with a degree in Electrical Engineering from Princeton University.
Ryan O'Shea