Skip to content

AND and OR interactions

May 5, 2020

I’ve been working through a particularly nasty bug recently, and when I eventually found the cause it turned out to be a mistake in a WHERE clause including several ANDs and ORs. I thought it’d make an interesting topic to dive into for a quick blog post.

The basic issue looked something like this:

INSERT INTO dbo.TargetTable
    (
          TableGUID
        , Column1
        , Column2
    )
SELECT DISTINCT
      TableGUID
    , Column1
    , Column2
FROM dbo.SourceTable
WHERE SourceTable.StatusColumn = 'A'
    OR (SourceTable.StatusColumn = 'B' AND SourceTable.StatusDate IS NULL)
    AND SourceTable.TableGUID NOT IN 
        (SELECT TableGUID FROM dbo.TargetTable)

The problem was we wanted to apply the last AND predicate every time, but the interactions between the ANDs and the OR meant that wasn’t happening. To see exactly what I mean, here’s a couple of simplified versions of the code where I’ve used brackets to make it clearer what is happening:

SELECT 1 -- returns successfully
WHERE 1 = 1
    OR 2 = 2
    AND 2 = 1 -- we want it to not return because of this
    
SELECT 1 -- this is what is actually happening
WHERE 1 = 1
    OR (2 = 2 AND 2 = 1)

SELECT 1 -- this is what we should have done
WHERE (1 = 1 OR 2 = 2)
    AND 2 = 1

So, basically, the OR treats everything after it as being part of the OR, so when the first predicate returns true it doesn’t matter what the rest of the predicates are because they’re all on the other side of the OR. At this point we have a diagnosis, and the solution seems pretty clear: re-write the code with some brackets to tell the query engine what to do.

INSERT INTO dbo.TargetTable
    (
          TableGUID
        , Column1
        , Column2
    )
SELECT DISTINCT
      TableGUID
    , Column1
    , Column2
FROM dbo.SourceTable
WHERE (SourceTable.StatusColumn = 'A'
    OR (SourceTable.StatusColumn = 'B' AND SourceTable.StatusDate IS NULL))
    AND SourceTable.TableGUID NOT IN 
        (SELECT TableGUID FROM dbo.TargetTable)

That gives us a functionally correct solution, but to me there’s another issue. We have re-written the code to clarify things for the query engine, but I’d argue we haven’t made it particularly clear for the next developer who has to edit this code (this is all part of the same insane block of code I wrote about in my code noise post a couple of weeks ago), and that can lead to all kinds of issues further down the line.

I have a particular approach whenever I’m writing a set of predicates connected with both ANDs and ORs. I effectively layer the predicates, starting with a top layer of either ANDs or ORs, then moving to the second layer which will be the opposite. Each sub-layer is wrapped in brackets and indented, and I usually keep each predicate on a different line. For example, this is how I would lay out the code we started this post with:

INSERT INTO dbo.TargetTable
    (
          TableGUID
        , Column1
        , Column2
    )
SELECT DISTINCT
      TableGUID
    , Column1
    , Column2
FROM dbo.SourceTable
WHERE 1 = 1
    AND (SourceTable.StatusColumn = 'A' -- top layer of ANDs
         OR (SourceTable.StatusColumn = 'B' -- second layer of ORs
             AND SourceTable.StatusDate IS NULL)) -- third layer of ANDs
    AND SourceTable.TableGUID NOT IN
        (SELECT TableGUID FROM dbo.TargetTable)

This makes it quite clear that the last AND needs to be evaluated separately to the rest of the WHERE clause.

Now you might be wondering where the 1 = 1 came from. That’s something I like to include in all of my code to make it easier to debug by allowing you to comment out the first predicate easily. Without that, if you want to comment out the first predicate and keep the second you end up having to do something awkward like this:

FROM dbo.SourceTable
WHERE --(SourceTable.StatusColumn = 'A'
       --OR (SourceTable.StatusColumn = 'B'
           --AND SourceTable.StatusDate IS NULL))
    --AND 
    SourceTable.TableGUID NOT IN
        (SELECT TableGUID FROM dbo.TargetTable)

But with the 1 = 1 you can do this instead:

FROM dbo.SourceTable
WHERE 1 = 1
    --AND (SourceTable.StatusColumn = 'A'
         --OR (SourceTable.StatusColumn = 'B'
             --AND SourceTable.StatusDate IS NULL))
    AND SourceTable.TableGUID NOT IN (SELECT TableGUID FROM dbo.TargetTable)

Which saves you from messing about with the last AND predicate at all.

Now, if your query is largely ORs so you want that to be your top layer, you can’t do quite the same thing because the OR means the WHERE always comes back as TRUE. So, what you use instead is 1 = 2, which achieves the same thing as far as ease of debugging is concerned:

FROM dbo.SourceTable
WHERE 1 = 2
    OR (SourceTable.StatusColumn = 'A'
        AND SourceTable.TableGUID NOT IN
            (SELECT TableGUID FROM dbo.TargetTable))
    OR (SourceTable.StatusColumn = 'B'
        AND SourceTable.StatusDate IS NULL
        AND SourceTable.TableGUID NOT IN
            (SELECT TableGUID FROM dbo.TargetTable))

This isn’t the neatest way of writing the code, because we have to repeat the NOT IN across the different OR predicates, but it does the same thing as the rest of the code we’ve been looking at. I suppose for consistency, I should include the 1 = 1 or 1 = 2 in the bracketed predicates as well, and that would help when it comes to debugging, but it would also clutter the code more than a little as we can see:

FROM dbo.SourceTable
WHERE 1 = 2
    OR (1 = 1
        AND SourceTable.StatusColumn = 'A'
        AND SourceTable.TableGUID NOT IN
            (SELECT TableGUID FROM dbo.TargetTable))
    OR (1 = 1
        AND SourceTable.StatusColumn = 'B'
        AND SourceTable.StatusDate IS NULL
        AND SourceTable.TableGUID NOT IN
            (SELECT TableGUID FROM dbo.TargetTable))

Having said that, I do quite like the way that looks. In particular, I like the way each new AND block is clearly defined with the 1 = 1. These kind of standards are something to discuss with your team, if possible, and work together to standardise the way you write code.

Finally, here’s a made up example with several layers to show how this can look with very complex statements. The numeric predicates (1 = 1 and 1 = 2) are there to allow the commenting out of other predicates, and everything else is there as a stand-in for actual query logic:

SELECT
      1
WHERE 1 = 2 -- false or false or true or false = true
    OR (1 = 1        -- true and true and false = false
        AND 'A' = 'A'  -- true
        AND 'B' = 'B'  -- true
        AND (1 = 2     -- false or false = false
            OR 'AB' = 'AC' -- false
            OR 'AB' IN     -- false
                ('AD', 'A', 'AA', 'ABA')))
    OR (1 = 1        -- true and true and false = false
        AND (1 = 2     -- true or false = true
            OR 'X' = 'X'  -- true
            OR 'Z' = 'A') -- false
        AND (1 = 2     -- true or false = true
            OR 'G' = 'G'  -- true
            OR 'F' = 'W') -- false
        AND (1 = 2     -- false or false = false
            OR 'F' = 'G'   -- false
            OR 'H' = 'I')) -- false
    OR (1 = 1        -- true and true = true
        AND (1 = 2     -- false or true = true
            OR (1 = 1     -- false and true = false
                AND 'E' = 'F'  -- false
                AND 'G' = 'G') -- true
            OR 'A' = 'A') -- true
        AND 'B' = 'B') -- true
    OR NOT 'A' = 'A' -- not true = false

So, in conclusion, if you include an OR in your code, be aware that anything after the OR should be treated as being bracketed together. And ideally write your code with explicit brackets and style it in a way to make it clear what is going on.

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: