AND and OR interactions
I’ve been working through a particularly nasty bug recently, and when I eventually found the cause it turned out to be a mistake in a WHERE
clause including several AND
s and OR
s. I thought it’d make an interesting topic to dive into for a quick blog post.
The basic issue looked something like this:
INSERT INTO dbo.TargetTable
(
TableGUID
, Column1
, Column2
)
SELECT DISTINCT
TableGUID
, Column1
, Column2
FROM dbo.SourceTable
WHERE SourceTable.StatusColumn = 'A'
OR (SourceTable.StatusColumn = 'B' AND SourceTable.StatusDate IS NULL)
AND SourceTable.TableGUID NOT IN
(SELECT TableGUID FROM dbo.TargetTable)
The problem was we wanted to apply the last AND
predicate every time, but the interactions between the AND
s and the OR
meant that wasn’t happening. To see exactly what I mean, here’s a couple of simplified versions of the code where I’ve used brackets to make it clearer what is happening:
SELECT 1 -- returns successfully
WHERE 1 = 1
OR 2 = 2
AND 2 = 1 -- we want it to not return because of this
SELECT 1 -- this is what is actually happening
WHERE 1 = 1
OR (2 = 2 AND 2 = 1)
SELECT 1 -- this is what we should have done
WHERE (1 = 1 OR 2 = 2)
AND 2 = 1
So, basically, the OR
treats everything after it as being part of the OR
, so when the first predicate returns true it doesn’t matter what the rest of the predicates are because they’re all on the other side of the OR
. At this point we have a diagnosis, and the solution seems pretty clear: re-write the code with some brackets to tell the query engine what to do.
INSERT INTO dbo.TargetTable
(
TableGUID
, Column1
, Column2
)
SELECT DISTINCT
TableGUID
, Column1
, Column2
FROM dbo.SourceTable
WHERE (SourceTable.StatusColumn = 'A'
OR (SourceTable.StatusColumn = 'B' AND SourceTable.StatusDate IS NULL))
AND SourceTable.TableGUID NOT IN
(SELECT TableGUID FROM dbo.TargetTable)
That gives us a functionally correct solution, but to me there’s another issue. We have re-written the code to clarify things for the query engine, but I’d argue we haven’t made it particularly clear for the next developer who has to edit this code (this is all part of the same insane block of code I wrote about in my code noise post a couple of weeks ago), and that can lead to all kinds of issues further down the line.
I have a particular approach whenever I’m writing a set of predicates connected with both AND
s and OR
s. I effectively layer the predicates, starting with a top layer of either AND
s or OR
s, then moving to the second layer which will be the opposite. Each sub-layer is wrapped in brackets and indented, and I usually keep each predicate on a different line. For example, this is how I would lay out the code we started this post with:
INSERT INTO dbo.TargetTable
(
TableGUID
, Column1
, Column2
)
SELECT DISTINCT
TableGUID
, Column1
, Column2
FROM dbo.SourceTable
WHERE 1 = 1
AND (SourceTable.StatusColumn = 'A' -- top layer of ANDs
OR (SourceTable.StatusColumn = 'B' -- second layer of ORs
AND SourceTable.StatusDate IS NULL)) -- third layer of ANDs
AND SourceTable.TableGUID NOT IN
(SELECT TableGUID FROM dbo.TargetTable)
This makes it quite clear that the last AND
needs to be evaluated separately to the rest of the WHERE
clause.
Now you might be wondering where the 1 = 1
came from. That’s something I like to include in all of my code to make it easier to debug by allowing you to comment out the first predicate easily. Without that, if you want to comment out the first predicate and keep the second you end up having to do something awkward like this:
FROM dbo.SourceTable
WHERE --(SourceTable.StatusColumn = 'A'
--OR (SourceTable.StatusColumn = 'B'
--AND SourceTable.StatusDate IS NULL))
--AND
SourceTable.TableGUID NOT IN
(SELECT TableGUID FROM dbo.TargetTable)
But with the 1 = 1
you can do this instead:
FROM dbo.SourceTable
WHERE 1 = 1
--AND (SourceTable.StatusColumn = 'A'
--OR (SourceTable.StatusColumn = 'B'
--AND SourceTable.StatusDate IS NULL))
AND SourceTable.TableGUID NOT IN (SELECT TableGUID FROM dbo.TargetTable)
Which saves you from messing about with the last AND
predicate at all.
Now, if your query is largely OR
s so you want that to be your top layer, you can’t do quite the same thing because the OR
means the WHERE
always comes back as TRUE. So, what you use instead is 1 = 2
, which achieves the same thing as far as ease of debugging is concerned:
FROM dbo.SourceTable
WHERE 1 = 2
OR (SourceTable.StatusColumn = 'A'
AND SourceTable.TableGUID NOT IN
(SELECT TableGUID FROM dbo.TargetTable))
OR (SourceTable.StatusColumn = 'B'
AND SourceTable.StatusDate IS NULL
AND SourceTable.TableGUID NOT IN
(SELECT TableGUID FROM dbo.TargetTable))
This isn’t the neatest way of writing the code, because we have to repeat the NOT IN
across the different OR
predicates, but it does the same thing as the rest of the code we’ve been looking at. I suppose for consistency, I should include the 1 = 1
or 1 = 2
in the bracketed predicates as well, and that would help when it comes to debugging, but it would also clutter the code more than a little as we can see:
FROM dbo.SourceTable
WHERE 1 = 2
OR (1 = 1
AND SourceTable.StatusColumn = 'A'
AND SourceTable.TableGUID NOT IN
(SELECT TableGUID FROM dbo.TargetTable))
OR (1 = 1
AND SourceTable.StatusColumn = 'B'
AND SourceTable.StatusDate IS NULL
AND SourceTable.TableGUID NOT IN
(SELECT TableGUID FROM dbo.TargetTable))
Having said that, I do quite like the way that looks. In particular, I like the way each new AND
block is clearly defined with the 1 = 1
. These kind of standards are something to discuss with your team, if possible, and work together to standardise the way you write code.
Finally, here’s a made up example with several layers to show how this can look with very complex statements. The numeric predicates (1 = 1 and 1 = 2) are there to allow the commenting out of other predicates, and everything else is there as a stand-in for actual query logic:
SELECT
1
WHERE 1 = 2 -- false or false or true or false = true
OR (1 = 1 -- true and true and false = false
AND 'A' = 'A' -- true
AND 'B' = 'B' -- true
AND (1 = 2 -- false or false = false
OR 'AB' = 'AC' -- false
OR 'AB' IN -- false
('AD', 'A', 'AA', 'ABA')))
OR (1 = 1 -- true and true and false = false
AND (1 = 2 -- true or false = true
OR 'X' = 'X' -- true
OR 'Z' = 'A') -- false
AND (1 = 2 -- true or false = true
OR 'G' = 'G' -- true
OR 'F' = 'W') -- false
AND (1 = 2 -- false or false = false
OR 'F' = 'G' -- false
OR 'H' = 'I')) -- false
OR (1 = 1 -- true and true = true
AND (1 = 2 -- false or true = true
OR (1 = 1 -- false and true = false
AND 'E' = 'F' -- false
AND 'G' = 'G') -- true
OR 'A' = 'A') -- true
AND 'B' = 'B') -- true
OR NOT 'A' = 'A' -- not true = false
So, in conclusion, if you include an OR
in your code, be aware that anything after the OR
should be treated as being bracketed together. And ideally write your code with explicit brackets and style it in a way to make it clear what is going on.