web analytics

When is a parameterized SQL query slower than having inline values?

Software engineers have always been told to parameterize their SQL queries.  Why?

  1. It avoids any possibilities of SQL injection.  Rather than an arbitrary value being formatted into the SQL query, the value is sent as a separate named-value pair, and a variable is created and then referenced by name in the query.  This also allows for reserved characters to be used without escaping.
  2. The SQL engine can perform better query execution planning and cache query plans.

When a query comes into the SQL engine, it needs to analyze it and determine how it’s going to execute the query.  It does this by creating a query execution plan.  If the engine has seen the query before, it will most likely use an existing pre-computed estimated query plan.  And, if the query has executed successfully, it can leverage the previous actual execution graph.  The engine can simply do a string comparison of the query with previously seen queries to find the plan.  However, if the values are actually inline in the query, then the queries most likely won’t be the same as a previously executed query, and the engine will have a harder time figuring out if it already has a computed plan or if a new plan needs to be created.  In addition, if the queries are constantly changing, you may have 100s-1000s-1Ms of different query strings, and the engine will most likely not remember all of them.  SQL engines will also look at the actual parameterized values to determine how it execute the query as well.  Most engines maintain statistics about column values to adjust what lookup algorithms should be performed (hash, table scan, b-tree).

So, if parameterized queries make the database engine’s job easier, and allows caching of plans, why is it slower than having the value be directly in the query?  One of our customers recently had this issue when using Query View.  Let’s dig in and find out.

In Composable’s Query View feature, a user defines a templated query and a set of inputs.  Each input is referenced in the query using the {{inputname}} syntax.  For this discussion, you can think of each input as a parameter in the SQL query.  Let’s say we have this simple templated query:

This particular query is hitting a table of about 300 million rows, and is taking forever to return.  So to speed it up, we’ve created an index on the KLNK column.  This should allow the engine to perform close to o(1) look up.  When we execute the below query in SQL Server Management Studio, it’s now lightning fast.

And when we look at the actual execution plan, we can see that it is using our newly created clustered index and seeking to the correct rows.

But now when we execute the query via Query View, it’s still slow … What’s going on?  Let’s take a look at the actual query that is being executed in SQL Server via Composable.  To do that, we can run the below query to see the currently executing queries on the instance.

And we can see the executing query from Composable:

Now let’s take this query and execute it directly in SSMS:

And let’s see the estimated quey plan of this query:

Yikes!  It’s performing an index scan, rather than a seek.  It’s going to scan the entire index, rather than seeking directly to the right rows based on the input value.  Why would the database engine do that?   It knows there’s a clustered index.  Hmmm…

The issue lies on the declared type of the P1 parameter.  The KLNK column is a varchar(15), but our parameter is defined as an nvarchar(15).  The SQL engine can’t simply take a UNICODE string, and look it up in an ASCII string lookup table.  So instead, it needs to do a scan and do individual comparisons against every column value by converting the VARCHARs to NVARCHARs.

Let’s change the NVARCHAR to a VARCHAR and see what happens.

Hey, that’s lighning fast!  That’s great.  We now have a parameterized query that is the same speed as our inline query, if not faster.

But how do we fix the Query View so it creates a VARCHAR parameter, rather than an NVARCHAR parameter?

When choosing the input datatype, specify AnsiString (varchar), rather than String (nvarchar).

 

Now most developers would ask, why are you storing data in ascii / varchar columns?  That’s totally a 1991 thing to do.  It’s been pounded into us that we should use unicode / wide characters for everything.  You never know when you’re going to have to support multiple languages.  And it’s pain to go back and change something like that.

That’s true, and here at Composable, we use nvarchar for our internal databases.  But not all database creators are developers.  Lot’s a databases are created by business analysts and data scientists, and they like varchars.  And they usually don’t work with multi-language data, and key identifiers don’t normally have non-ascii characters.  So there’s really no need to double the storage size.  VARCHAR is a much more prevalent column type in the databases analyzed by Composable’s customers.