Code noise

April 21, 2020

I love the term code noise. It’s one of those terms that succinctly encapsulates a quite complex topic in a couple of words, and is instantly recognisable to anyone who’s encountered it even if they had never heard the term before.

Basically code noise is anything that pulls your attention away from what the code is supposed to be doing, or obscures the true nature of the code in some way. It’s not something we consider enough when writing T-SQL code, but I think there is a lot to be said for writing code the next person will be able to read.

As a small example, I was debugging something recently and found that all of the insert statements had ORDER BY clauses. I couldn’t work out why these were making me so angry, after all it’s not doing anything to hurt performance, and in fact isn’t doing anything at all, until one of the other devs in the office pointed out that it’s one example of the code noise that the whole code base is filled with.

More extreme examples are the tendency some developers have to load data into temporary tables after temporary table, or to write nested subqueries 5 layers deep. Both of these things largely just hide where the actual logic of the code is, and make it a nightmare to debug. Here’s an example:

WITH WorkOrderRoutingPlusLocation
AS
    (
        SELECT
              WrkOrdrRtng.WorkOrderID
            , WrkOrdrRtng.ProductID
            , WrkOrdrRtng.ActualCost
            , WrkOrdrRtng.PlannedCost
            , WrkOrdrRtng.OperationSequence
            , Loc.Name AS LocationName
        FROM Production.WorkOrderRouting AS WrkOrdrRtng
        INNER JOIN Production.[Location] AS Loc
            ON WrkOrdrRtng.LocationID = Loc.LocationID
    )
SELECT
      WrkOrdr.WorkOrderID
    , WrkOrdr.ScrappedQty
    , WrkOrdrRtng.OperationSequence
    , Prod.Name AS ProductName
    , WrkOrdrRtng.LocationName AS LocationMovedTo
    , PrevWrkOrdrRtng.LocationName AS LocationMovedFrom
    , WrkOrdrRtng.PlannedCost
    , WrkOrdrRtng.ActualCost
    , CASE
          WHEN WrkOrdrRtng.ActualCost
              > WrkOrdrRtng.PlannedCost
          THEN 'Over budget'
          WHEN WrkOrdrRtng.ActualCost
              < WrkOrdrRtng.PlannedCost
          THEN 'Under budget'
          WHEN WrkOrdrRtng.ActualCost
              = WrkOrdrRtng.PlannedCost
          THEN 'On budget'
          ELSE NULL
      END AS BudgetStatus
FROM Production.WorkOrder AS WrkOrdr
INNER JOIN WorkOrderRoutingPlusLocation AS WrkOrdrRtng
    ON WrkOrdr.WorkOrderID = WrkOrdrRtng.WorkOrderID
INNER JOIN Production.Product AS Prod
    ON WrkOrdrRtng.ProductID = Prod.ProductID
LEFT JOIN WorkOrderRoutingPlusLocation AS PrevWrkOrdrRtng
    ON WrkOrdrRtng.WorkOrderID = PrevWrkOrdrRtng.WorkOrderID
    AND WrkOrdrRtng.ProductID = PrevWrkOrdrRtng.ProductID
    AND WrkOrdrRtng.OperationSequence 
        = PrevWrkOrdrRtng.OperationSequence + 1
INNER JOIN
    (
        SELECT
              WrkOrdrRtng.WorkOrderID
            , SUM(WrkOrdrRtng.PlannedCost) AS TotalPlannedCost
            , SUM(WrkOrdrRtng.ActualCost) AS TotalActualCost
        FROM Production.WorkOrderRouting AS WrkOrdrRtng
        GROUP BY
              WrkOrdrRtng.WorkOrderID
    ) AS sq_WrkOrdrRtngTotals
    ON WrkOrdr.WorkOrderID = sq_WrkOrdrRtngTotals.WorkOrderID
WHERE 1 = 1
    AND sq_WrkOrdrRtngTotals.TotalActualCost
        > sq_WrkOrdrRtngTotals.TotalPlannedCost
ORDER BY
      WrkOrdr.WorkOrderID
    , WrkOrdrRtng.OperationSequence

I think this is fairly self-explanatory code, even without any comments. There’s not much here that isn’t necessary, just the 1 = 1 in the WHERE clause, but that’s to help with debugging. The CTE is there because we use these tables joined together more than once in the query, and one of those times is the right side of a left join. The subquery is there because we genuinely want to look at things at a different level of aggregation to the main query. Everything else is joined together very logically.

Under other circumstances I would have formatted it slightly different, but to make it fit well on the blog post I’ve tried to make it as thin as possible. To that end I’ve done things like splitting predicates across multiple lines that I wouldn’t ordinarily do, but I don’t think that affects the readability of the code too much.

Now, consider this alternative way of writing this query:

SELECT
      Sub3.WorkOrderID
    , WrkOrdr.ScrappedQty
    , Sub3.OperationSequence
    , Prod.Name AS ProductName
    , LNam AS LocationMovedTo
    , L.[Name] AS LocationMovedTo
FROM
    (
        SELECT
              Sub2.WorkOrderID
            , ProductID
            , BudgetStatus
            , OperationSequence
        FROM
            (
                SELECT
                      WorkOrderID
                    , ScrappedQty
                    , SUM(PlannedCost) AS TotalPlannedCost
                    , SUM(ActualCost) AS TotalActualCost
                FROM
                    (
                        SELECT DISTINCT
                              WrkOrdr.WorkOrderID
                            , ScrappedQty
                            , PlannedCost
                            , ActualCost
                        FROM Production.WorkOrder AS WrkOrdr
                        INNER JOIN Production.WorkOrderRouting AS WrkOrdrRtng
                            ON WrkOrdr.WorkOrderID
                                = WrkOrdrRtng.WorkOrderID
                    ) AS Sub1
                GROUP BY
                      WorkOrderID
                    , ScrappedQty
            ) AS Sub2
        INNER JOIN
            (
                SELECT
                      CASE
                          WHEN ActualCost > PlannedCost
                          THEN 'Over budget'
                          WHEN ActualCost < PlannedCost
                          THEN 'Under budget'
                          WHEN ActualCost = PlannedCost
                          THEN 'On budget'
                          ELSE NULL
                      END AS BudgetStatus
                    , WorkOrderID
                    , ProductID
                    , OperationSequence
                FROM Production.WorkOrderRouting AS WrkOrdrRtng
            ) AS Sub21
            ON Sub2.WorkOrderID = Sub21.WorkOrderID
        WHERE 1 = 1
            AND TotalPlannedCost < TotalActualCost
    ) AS Sub3
INNER JOIN Production.Product AS Prod
    ON Sub3.ProductID = Prod.ProductID
INNER JOIN Production.WorkOrder AS WrkOrdr
    ON Sub3.WorkOrderID = WrkOrdr.WorkOrderID
INNER JOIN Production.WorkOrderRouting AS WrkOrdrRtng
    ON Sub3.WorkOrderID = WrkOrdrRtng.WorkOrderID
    AND Sub3.ProductID = WrkOrdrRtng.ProductID
    AND Sub3.OperationSequence = WrkOrdrRtng.OperationSequence
INNER JOIN
    (
        SELECT
              L.LocationID
            , L.[Name] AS LNam
        FROM Production.[Location] AS L
    ) AS Sub4
    ON WrkOrdrRtng.LocationID= Sub4.LocationID
LEFT JOIN
    (
        SELECT
              WorkOrderID
            , ProductID
            , OperationSequence
            , LocationID
        FROM Production.WorkOrderRouting
    ) SubWOR
    ON WrkOrdrRtng.WorkOrderID = SubWOR.WorkOrderID
    AND WrkOrdrRtng.ProductID = SubWOR.ProductID
    AND WrkOrdrRtng.OperationSequence
        = SubWOR.OperationSequence + 1
LEFT JOIN Production.[Location] AS L
    ON SubWOR.LocationID = L.LocationID

Now, I think these two statements do the same thing, although to be honest I got a little lost writing this second one. It should be obvious, however, that the second statement is not nearly as clear, that there is a lot of extra code around it making a lot of noise.

Obviously this is a made up example, but it is similar to a lot of real-world examples I’ve seen. In particular, the overuse of subqueries (and subqueries inside subqueries inside subqueries) to filter or join data. The danger here, apart from it looking ugly, is that another developer comes along, can’t read the intention behind the original code because of all the noise, and just hacks something else onto the existing mess. You can see this has happened on line 63 when someone has added the WorkOrderRouting table to the query, because they need to join from it to the Location table. The WorkOrderRouting table is already part of this query, in Sub21 inside of Sub3, but the new developer hasn’t been able to figure this out, or maybe they’re not sure about how to bubble up the LocationID through all of the subqueries (especially as WorkOrderRouting also exists in Sub1 inside Sub2 inside Sub3 but can’t be bubbled up because there’s some aggregation along the way). Instead they’ve just hacked a new join to the table onto the existing mess and everything has gotten that much harder to understand.

Another thing that’s obviously bad about this statement is the inconsistent naming standards. Sub1 is always a terrible name, you need to alias anything, but especially a subquery, with something meaningful. I like to prefix any subquery aliases with sq_ so when you reference it elsewhere in the query you know you’re referencing a subquery. You also need to make all column names 2 part. Where does LNam in the outer SELECT come from? Or OperationSequence in Sub3? Without 2 part names for all columns, this can be a nightmare to figure out.

I want to end with an example of really bad code noise I found yesterday in my actual work. Table and column names are changed, but the rest of the code is as is:

DELETE FROM TriggerTable_A
WHERE A_GUID IN (SELECT tmp_Sync7.A_GUID
                        FROM tmp_Sync7
                        INNER JOIN TriggerTable_A a on tmp_Sync7.A_GUID = a.A_GUID
                        WHERE tmp_Sync7.UpdatedDate = a.UpdatedDate)

This looks a bit odd, but the main question you have looking at it is what’s that tmp_Sync7 table, right? Well, almost 300 lines of code previously we have this little code snippet (and I checked a few times and there is nothing in those 300 lines of code that does anything to tmp_Sync7):

IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'tmp_Sync7') 
    DROP TABLE tmp_Sync7

SELECT * INTO tmp_Sync7 from TriggerTable_A

My brain pretty much exploded when I saw this. What we’re saying here is DELETE from TriggerTable_A if A_GUID is in TriggerTable_A joined to TriggerTable_A. Basically, DELETE FROM TriggerTable_A. As a bonus, you don’t have that nasty permanent tmp_Sync7 taking up space on your database, and you improve performance because you’re doing a lot less.

Bottom line is code noise makes it harder for other developers to read your code. It makes it harder for you to read your code when you come back to it in 6 months to fix some crazy bug that’s only just started showing up. It can make it harder for the query optimiser to read your code, meaning it takes longer to come up with an execution plan, and has more chance of hitting the time-out limit and returning a sub-optimal plan. Both of these things hurt performance, but ultimately your drive to eliminate code noise shouldn’t just be driven by that. You should want to keep your code clean, simple, and elegant, so that when your fellow developer come to build on what you’ve done in 6 months or a year, they can easily understand what your code is doing and make clean, simple, elegant changes to it themselves.

From → Coding, key concepts

2 Comments

Code noise

Trackbacks & Pingbacks

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

Subscribe to Blog via Email

Code noise

Share this:

Related

Trackbacks & Pingbacks

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

Subscribe to Blog via Email