Code noise
I love the term code noise. It’s one of those terms that succinctly encapsulates a quite complex topic in a couple of words, and is instantly recognisable to anyone who’s encountered it even if they had never heard the term before.
Basically code noise is anything that pulls your attention away from what the code is supposed to be doing, or obscures the true nature of the code in some way. It’s not something we consider enough when writing T-SQL code, but I think there is a lot to be said for writing code the next person will be able to read.
As a small example, I was debugging something recently and found that all of the insert statements had ORDER BY
clauses. I couldn’t work out why these were making me so angry, after all it’s not doing anything to hurt performance, and in fact isn’t doing anything at all, until one of the other devs in the office pointed out that it’s one example of the code noise that the whole code base is filled with.
More extreme examples are the tendency some developers have to load data into temporary tables after temporary table, or to write nested subqueries 5 layers deep. Both of these things largely just hide where the actual logic of the code is, and make it a nightmare to debug. Here’s an example:
WITH WorkOrderRoutingPlusLocation
AS
(
SELECT
WrkOrdrRtng.WorkOrderID
, WrkOrdrRtng.ProductID
, WrkOrdrRtng.ActualCost
, WrkOrdrRtng.PlannedCost
, WrkOrdrRtng.OperationSequence
, Loc.Name AS LocationName
FROM Production.WorkOrderRouting AS WrkOrdrRtng
INNER JOIN Production.[Location] AS Loc
ON WrkOrdrRtng.LocationID = Loc.LocationID
)
SELECT
WrkOrdr.WorkOrderID
, WrkOrdr.ScrappedQty
, WrkOrdrRtng.OperationSequence
, Prod.Name AS ProductName
, WrkOrdrRtng.LocationName AS LocationMovedTo
, PrevWrkOrdrRtng.LocationName AS LocationMovedFrom
, WrkOrdrRtng.PlannedCost
, WrkOrdrRtng.ActualCost
, CASE
WHEN WrkOrdrRtng.ActualCost
> WrkOrdrRtng.PlannedCost
THEN 'Over budget'
WHEN WrkOrdrRtng.ActualCost
< WrkOrdrRtng.PlannedCost
THEN 'Under budget'
WHEN WrkOrdrRtng.ActualCost
= WrkOrdrRtng.PlannedCost
THEN 'On budget'
ELSE NULL
END AS BudgetStatus
FROM Production.WorkOrder AS WrkOrdr
INNER JOIN WorkOrderRoutingPlusLocation AS WrkOrdrRtng
ON WrkOrdr.WorkOrderID = WrkOrdrRtng.WorkOrderID
INNER JOIN Production.Product AS Prod
ON WrkOrdrRtng.ProductID = Prod.ProductID
LEFT JOIN WorkOrderRoutingPlusLocation AS PrevWrkOrdrRtng
ON WrkOrdrRtng.WorkOrderID = PrevWrkOrdrRtng.WorkOrderID
AND WrkOrdrRtng.ProductID = PrevWrkOrdrRtng.ProductID
AND WrkOrdrRtng.OperationSequence
= PrevWrkOrdrRtng.OperationSequence + 1
INNER JOIN
(
SELECT
WrkOrdrRtng.WorkOrderID
, SUM(WrkOrdrRtng.PlannedCost) AS TotalPlannedCost
, SUM(WrkOrdrRtng.ActualCost) AS TotalActualCost
FROM Production.WorkOrderRouting AS WrkOrdrRtng
GROUP BY
WrkOrdrRtng.WorkOrderID
) AS sq_WrkOrdrRtngTotals
ON WrkOrdr.WorkOrderID = sq_WrkOrdrRtngTotals.WorkOrderID
WHERE 1 = 1
AND sq_WrkOrdrRtngTotals.TotalActualCost
> sq_WrkOrdrRtngTotals.TotalPlannedCost
ORDER BY
WrkOrdr.WorkOrderID
, WrkOrdrRtng.OperationSequence
I think this is fairly self-explanatory code, even without any comments. There’s not much here that isn’t necessary, just the 1 = 1 in the WHERE
clause, but that’s to help with debugging. The CTE is there because we use these tables joined together more than once in the query, and one of those times is the right side of a left join. The subquery is there because we genuinely want to look at things at a different level of aggregation to the main query. Everything else is joined together very logically.
Under other circumstances I would have formatted it slightly different, but to make it fit well on the blog post I’ve tried to make it as thin as possible. To that end I’ve done things like splitting predicates across multiple lines that I wouldn’t ordinarily do, but I don’t think that affects the readability of the code too much.
Now, consider this alternative way of writing this query:
SELECT
Sub3.WorkOrderID
, WrkOrdr.ScrappedQty
, Sub3.OperationSequence
, Prod.Name AS ProductName
, LNam AS LocationMovedTo
, L.[Name] AS LocationMovedTo
FROM
(
SELECT
Sub2.WorkOrderID
, ProductID
, BudgetStatus
, OperationSequence
FROM
(
SELECT
WorkOrderID
, ScrappedQty
, SUM(PlannedCost) AS TotalPlannedCost
, SUM(ActualCost) AS TotalActualCost
FROM
(
SELECT DISTINCT
WrkOrdr.WorkOrderID
, ScrappedQty
, PlannedCost
, ActualCost
FROM Production.WorkOrder AS WrkOrdr
INNER JOIN Production.WorkOrderRouting AS WrkOrdrRtng
ON WrkOrdr.WorkOrderID
= WrkOrdrRtng.WorkOrderID
) AS Sub1
GROUP BY
WorkOrderID
, ScrappedQty
) AS Sub2
INNER JOIN
(
SELECT
CASE
WHEN ActualCost > PlannedCost
THEN 'Over budget'
WHEN ActualCost < PlannedCost
THEN 'Under budget'
WHEN ActualCost = PlannedCost
THEN 'On budget'
ELSE NULL
END AS BudgetStatus
, WorkOrderID
, ProductID
, OperationSequence
FROM Production.WorkOrderRouting AS WrkOrdrRtng
) AS Sub21
ON Sub2.WorkOrderID = Sub21.WorkOrderID
WHERE 1 = 1
AND TotalPlannedCost < TotalActualCost
) AS Sub3
INNER JOIN Production.Product AS Prod
ON Sub3.ProductID = Prod.ProductID
INNER JOIN Production.WorkOrder AS WrkOrdr
ON Sub3.WorkOrderID = WrkOrdr.WorkOrderID
INNER JOIN Production.WorkOrderRouting AS WrkOrdrRtng
ON Sub3.WorkOrderID = WrkOrdrRtng.WorkOrderID
AND Sub3.ProductID = WrkOrdrRtng.ProductID
AND Sub3.OperationSequence = WrkOrdrRtng.OperationSequence
INNER JOIN
(
SELECT
L.LocationID
, L.[Name] AS LNam
FROM Production.[Location] AS L
) AS Sub4
ON WrkOrdrRtng.LocationID= Sub4.LocationID
LEFT JOIN
(
SELECT
WorkOrderID
, ProductID
, OperationSequence
, LocationID
FROM Production.WorkOrderRouting
) SubWOR
ON WrkOrdrRtng.WorkOrderID = SubWOR.WorkOrderID
AND WrkOrdrRtng.ProductID = SubWOR.ProductID
AND WrkOrdrRtng.OperationSequence
= SubWOR.OperationSequence + 1
LEFT JOIN Production.[Location] AS L
ON SubWOR.LocationID = L.LocationID
Now, I think these two statements do the same thing, although to be honest I got a little lost writing this second one. It should be obvious, however, that the second statement is not nearly as clear, that there is a lot of extra code around it making a lot of noise.
Obviously this is a made up example, but it is similar to a lot of real-world examples I’ve seen. In particular, the overuse of subqueries (and subqueries inside subqueries inside subqueries) to filter or join data. The danger here, apart from it looking ugly, is that another developer comes along, can’t read the intention behind the original code because of all the noise, and just hacks something else onto the existing mess. You can see this has happened on line 63 when someone has added the WorkOrderRouting
table to the query, because they need to join from it to the Location
table. The WorkOrderRouting
table is already part of this query, in Sub21
inside of Sub3
, but the new developer hasn’t been able to figure this out, or maybe they’re not sure about how to bubble up the LocationID
through all of the subqueries (especially as WorkOrderRouting
also exists in Sub1
inside Sub2
inside Sub3
but can’t be bubbled up because there’s some aggregation along the way). Instead they’ve just hacked a new join to the table onto the existing mess and everything has gotten that much harder to understand.
Another thing that’s obviously bad about this statement is the inconsistent naming standards. Sub1
is always a terrible name, you need to alias anything, but especially a subquery, with something meaningful. I like to prefix any subquery aliases with sq_
so when you reference it elsewhere in the query you know you’re referencing a subquery. You also need to make all column names 2 part. Where does LNam
in the outer SELECT
come from? Or OperationSequence
in Sub3
? Without 2 part names for all columns, this can be a nightmare to figure out.
I want to end with an example of really bad code noise I found yesterday in my actual work. Table and column names are changed, but the rest of the code is as is:
DELETE FROM TriggerTable_A
WHERE A_GUID IN (SELECT tmp_Sync7.A_GUID
FROM tmp_Sync7
INNER JOIN TriggerTable_A a on tmp_Sync7.A_GUID = a.A_GUID
WHERE tmp_Sync7.UpdatedDate = a.UpdatedDate)
This looks a bit odd, but the main question you have looking at it is what’s that tmp_Sync7 table, right? Well, almost 300 lines of code previously we have this little code snippet (and I checked a few times and there is nothing in those 300 lines of code that does anything to tmp_Sync7
):
IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'tmp_Sync7')
DROP TABLE tmp_Sync7
SELECT * INTO tmp_Sync7 from TriggerTable_A
My brain pretty much exploded when I saw this. What we’re saying here is DELETE
from TriggerTable_A
if A_GUID
is in TriggerTable_A
joined to TriggerTable_A
. Basically, DELETE FROM TriggerTable_A
. As a bonus, you don’t have that nasty permanent tmp_Sync7
taking up space on your database, and you improve performance because you’re doing a lot less.
Bottom line is code noise makes it harder for other developers to read your code. It makes it harder for you to read your code when you come back to it in 6 months to fix some crazy bug that’s only just started showing up. It can make it harder for the query optimiser to read your code, meaning it takes longer to come up with an execution plan, and has more chance of hitting the time-out limit and returning a sub-optimal plan. Both of these things hurt performance, but ultimately your drive to eliminate code noise shouldn’t just be driven by that. You should want to keep your code clean, simple, and elegant, so that when your fellow developer come to build on what you’ve done in 6 months or a year, they can easily understand what your code is doing and make clean, simple, elegant changes to it themselves.
Trackbacks & Pingbacks