Skip to content

Service Broker 101 Lesson 4: All about Queues

So far, this series has covered what service broker is, the different components that make it up, and how we use these to open a conversation and send a message to a queue. We’ve talked about queues quite a bit in these sessions, but never really gone into detail about what they are, so that is what this lesson is all about.

Retrieving data from Queues

A queue is a full database object, like a table or a stored procedure. As such, it is part of a schema, and appears in the sys.objects view. A queue holds messages that have been sent to it, in the same way that a table does, and these messages can even be queried in the same way that you would query a table.

You can’t change the columns that are available, and there are quite a few of them. To see what there is, just run SELECT * against any queue, but a few of the key ones are service_name, service_contract_name, message_type_name, message_body, message_enqueue_time, conversation_handle.

You can also retrieve data from a queue using a RECEIVE statement. These work very much like SELECTs, except that they remove the received message from the queue. This is how you would typically process messages, to make sure that the queue does not grow too large. Also note that the RECEIVE statement, like a CTE, needs the previous statement to terminate with a ;

RECEIVE TOP (1)
      priority
    , conversation_handle
    , message_type_name
    , CAST(message_body AS XML) AS XMLMessageBody
    , message_body
FROM dbo.TargetQueue

As you can see, you need to convert the message_body to XML or a string data type for it to make any sense. When you specify a TOP (X) in your RECEIVE statement, you will get the top X messages in the order they arrived (oldest first). You can also filter RECEIVE statements, but only on conversation_handle or conversation_group_id.

Creating queues

Queues can be a bit more complex to create, and there’s a few things you need to be aware of. The basic queue creation is pretty simple

CREATE QUEUE dbo.TargetQueue

But there are a few more options than we’ve seen for the other components we’ve created. Note that queues cannot use the CREATE OR ALTER script that was brought in for SQL 2016.

CREATE QUEUE dbo.TargetQueue
WITH
      STATUS = ON
    , RETENTION = OFF
    , ACTIVATION
        (
              STATUS = ON
            , PROCEDURE_NAME = dbo.TargetQueueProcedure
            , MAX_QUEUE_READERS = 4
            , EXECUTE AS SELF
        )
    , POISON_MESSAGE_HANDLING (STATUS = ON)

Everything in the WITH block can be changed with an ALTER QUEUE statement.

STATUS – This effectively says if the queue is active or not, so whether it can send and receive messages or not. The default is ON
RETENTION – If this is on, messages on the queue are kept until the end of a conversation regardless of if they have been RECEIVEd or not. It defaults to OFF
ACTIVATION – This allows you to attach a stored procedure to the queue, that will handle messages as they arrive. This is the subject of the next lesson, so I won’t say anything more about it here.
POISON_MESSAGE_HANDLING – This relates to the ACTIVATION section, so I will cover it next lesson.

All of these settings can be changed once the queue is created, using ALTER QUEUE. That’s pretty much all there is to queues, next lesson will dive deeper into the stored procedures we attach to them and how they work.

Service Broker 101 Lesson 3: Conversations and messages

The last post went through the basics of the different components Service Broker uses in SQL Server. I talked about message types, queues that send and receive messages, contracts that specify particular message types to be sent to the target or initiator queue, and services that attach to queues and specify the contracts that can target that queue.

This lesson is all about how we fit these things together to open a conversation and send a message. I’ve seen some other posts that rush through this so I want to take my time with this and go through it step by step.

The first thing we need is an open conversation. This is generated with the BEGIN DIALOG CONVERSATION statement. Oddly, the CONVERSATION part is redundant, despite the fact that every other command refers to these items as conversations, not dialogues. Anyway, the command to open a simple conversation is as follows:

DECLARE @ConversationHandle UNIQUEIDENTIFIER;

BEGIN DIALOG CONVERSATION @ConversationHandle
    FROM SERVICE [ServiceBrokerExample/Example1/Service/ServiceSource]
    TO SERVICE 'ServiceBrokerExample/Example1/Service/ServiceTarget'
    ON CONTRACT [ServiceBrokerExample/Example1/Contract/Complicated]

There are a few key things here.

First, each conversation has its own unique dialog_handle (not conversation handle, despite everything calling it a conversation from now on, score one for consistency Microsoft). We need to capture this handle in a UNIQUEIDENTIFIER variable, as we will need it later on to send messages across the conversation. In fact, the statement will error if you don’t supply a variable to capture the handle.

Second, we need to supply both FROM and TO services. These tell the conversation which service is the source and which is the target. Remember, each service is attached to a queue, and can have one or more contracts attached to it. The source service is a database object, but the target service is an NVARCHAR. This allows the target service to live outside the database, which is something that I will cover at some point in the Service Broker 201 series.

Finally we need to give the contract the conversation will obey. This contract details what message types can be sent by the initiator and target services, and it has to be one of the contracts allowed by the target service.

At this point we have a conversation, and we can start sending messages, the code to do that is below:

SEND
    ON CONVERSATION @ConversationHandle
    MESSAGE TYPE [ServiceBrokerExample/Example1/MessageType/Reply]
        ('<message>hello world</message>')

This sends the specified message type across the specified conversation from the initiator to the target. You can also specify the message you want to send in brackets after the message type, but this is not necessary unless the message type forces you to. Sometimes just having a message of a specified type is all the information you need.

Note that the initiator to target rule applies to any code executed in a standard session connected to the database that holds the initiating service. Retrieving a message from the queue, and sending messages from the target to the initiator will be covered in the next session about queues.

The last thing we need to do is close our conversation. To do this we call the END CONVERSATION @ConversationHandle statement. This sends a message using the inbuilt Microsoft message type http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog and closes the conversation from one end. To fully close the conversation you need to also call END CONVERSATION from the other end.

It’s important to always fully close conversations from both ends, otherwise they will hang around eating resources in your database. They may not be much individually, but over time they can add up significantly. A conversation does have a lifetime, specified in seconds, and you can set the lifetime when you create it, but if you don’t the default is 2,147,483,647 (max value of an INT) which is roughly 68 years.

That’s it for now, hopefully this stuff is starting to come together. The nest session will be all about Queues, how we retrieve these messages from them, how we process the messages once we have them, and how we can send messages back from our target to the initiator.

Custom backgrounds in Microsoft Teams

Just a quick blog post today, taking you through how to add a custom background to Microsoft Teams.

Instead of doing something through Teams itself, you need to copy the custom image to the C:\Users\YOURUSERNAME\AppData\Roaming\Microsoft\Teams\Backgrounds\Uploads folder, obviously replacing YOURUSERNAME with your actual username. Note that if you try to navigate to this folder in file explorer, AppData is a hidden folder so you need to click on View and the Hidden Items checkbox before you can see it.

Once you have added the image to that folder, you can select it in Teams by clicking on the ellipsis during a call and selecting Show Background Effects. You should see a number of possible images, with your custom images at the end.

Service Broker 101 Lesson 2: Service Broker components and how they fit together

Last lesson I gave a very quick introduction to Service Broker, and outlined a couple of scenarios where it might be useful. This time I want to talk through the different components that make up Service Broker, how they fit together, and what part they play in sending and managing messages.

There are 4 main components that Service Broker needs you to create in SQL Server:

Message Type

Any message you send will have two components, the message type and the message. The CREATE MESSAGE TYPE syntax allows you to specify a message type, and what constitutes a valid message for that type.

CREATE MESSAGE TYPE [SBE/Example1/MessageType/Outgoing]
    VALIDATION = WELL_FORMED_XML;

In this case, we create a message type of ServiceBrokerExample/Example1/MessageType/Outgoing, that takes a well-formed XML message. Other values for VALIDATION are NONE, where the message can contain anything or be NULL; EMPTY, where there is no message; or VALID_XML WITH SCHEMA COLLECTION schema_collection_name, where the message has to be XML that conforms to the specified schema collection, which must already exist.

SQL Server already has a collection of message types it uses to signal various events, you can find them in the sys.service_message_types.

Queues

Queues are the most similar to SQL objects we may be used to. They are also the only Service Broker object to appear in the sys.objects table, and to be owned by schemas. There is also quite a bit to say about them, so I will largely leave them to Lesson 4.

Contracts

Contracts define the types of conversation that can be had between Queues. Specifically, each contract defines the message types that can be sent by the initiator and target queues. Each message type included in the contract can be sent by the initiator, the target, or both.

CREATE CONTRACT [SBE/Example1/Contract/Complicated]
    (
          [SBE/Example1/MessageType/Outgoing] SENT BY INITIATOR
        , [SBE/Example1/MessageType/Reply] SENT BY TARGET
        , [SBE/Example1/MessageType/Alert] SENT BY TARGET
        , [SBE/Example1/MessageType/Error] SENT BY ANY
    )

In this example, once a conversation has been opened under this contract, the initiator queue (the queue that sent the first message) can send Outgoing or Error message types, and the target queue can send reply, alert, and error messages. Note that once a conversation begins the initiator and target roles are locked for the purposes of that conversation.

Service

A service is the way a conversation connects to a queue. Each service sits above a single queue, although a queue can have multiple services. In order to be the target of a conversation, a service must also specify at least one conversation that can be used to target it.

CREATE SERVICE [SBE/Example1/Service/ServiceTarget]
    ON QUEUE dbo.TargetQueue
    (
          [SBE/Example1/Contract/Complicated]
        , [SBE/Example1/Contract/Emergency]
    )

CREATE SERVICE [SBE/Example1/Service/ServiceSource]
    ON QUEUE dbo.SourceQueue

CREATE SERVICE [SBE/Example1/Service/ServiceEmergencySource]
    ON QUEUE dbo.EmergencySourceQueue

In this example, the ServiceTarget service allows the TargetQueue to be the target of conversations using the Complicated or Emergency contracts. The ServiceSource service allows the SourceQueue to initiate conversations but not be the target of them, and the ServiceEmergencySource service does the same for the EmergencySourceQueue.

That’s it for this lesson, I know this is probably a little confusing at the moment but next lesson I’ll take us through opening a conversation and sending a message, and at that point things should get a little clearer.

Service Broker 101 Lesson 1: What is Service Broker?

If you don’t really know what Service Broker is, you’re not alone. I had probably heard the term a couple of times in my 14 years as a SQL developer, but had never come across anyone using it until I started my latest job. Even then, I only discovered it when I imported a database into an SSDT database project, and saw a Queue object appear.

I did a little investigation after that; and it seemed an interesting, if little used, piece of functionality. I didn’t really think anything more of it, but filed it away in the bit of my brain that stores marked “investigate someday” (that part of my brain gets pretty cluttered and seldom cleared out).

Then, recently, I had an issue where Service Broker seemed the perfect solution, so spent a weekend experimenting, coded the fix using Service Broker, and that release is making it’s way through UAT at the moment.

But what is Service Broker?

I can hear you thinking “yeah, yeah, get to the point already”, so I will.

Service Broker is a queueing system within SQL Server. It allows you to send a message from one queue to another, and handle what happens when that message arrives. That’s pretty much it.

So, why would I want that?

Well, that’s where the story I told you at the start comes in. The issue we had was that out client wanted to run a process during the day that usually gets run at night. This process is pretty long, and locks up some tables for a minute or more, including some tables it locks through an indexed view (that’s a whole other issue that I’ll maybe blog about some other day). At the same time, users are logging onto the application to do various things, including downloading one-use vouchers. The stored proc that sits behind that API reads the data ok, but wants to write the record that these vouchers have been viewed to one of the locked tables.

What I’ve done is shift the write to table from the stored procedure to a queue. Now when a user requests their vouchers the system selects them from the right table, and fires off a message to the queue with all the details of the vouchers they just viewed, and the queue adds them to the table whenever it has a chance.

So, that’s scenario #1, when you have a process that needs to do something but doesn’t need to do it immediately, you can split that part of the process out and use a queue to have it happen as soon as possible, allowing your main process to complete quicker. Typically this will be logging that something has happened, where you need to return the result of the thing immediately but the logging table might be blocked by either a big read/write, or lots of small reads/writes.

Scenario #2 is when you are running something like the ETL systems I’ve seen and heard getting built more and more. These systems work off queues in an case, typically built as tables, where you have an overarching system that dynamically decides what happens and in what order.

As an example, you start by importing 3 files, so you add those imports to the queue and they all start executing. Once the first one is finished, you add a validation for that to the queue and that starts processing. File 2 finishes importing but that’s flagged as not needing validation so you check what data loads you can run with just that file and add them to the queue. File 3 takes a long time to load, and File 1 finishes validation in that time but there’s nothing that can be populated from File 1 until File 3 is finished loading so nothing gets added to the queue at that point.

If you have a system that wants to work like that, where you are executing a set of modules in a dynamic order, then Service Broker may be useful as a way of managing that order.

I was going to post a bit more here about the different components that make up Service Broker, but I’ve gone on for longer than I expected just on this, so I think I’ll leave that for the next post.

Lessons

I’m trying something new with my blog post this week. I want to start doing different series on subjects I think people would benefit from a deep dive into. I’m starting with Service Broker, a topic I knew nothing about until a few months ago. Other possible topics inclued:

  • Different components of SQL Server
  • SSAS
  • MDS
  • DQS
  • SQLCLR

Some of these I know quite a bit about, others I don’t have much of a clue at the moment and that’s part of the reason I want to write about them, to force myself to learn. These posts will be titled something like [topic] 101 Lesson 1: [lesson subject]. Depending on how this goes I might add some 201 or 301 series in the future, but for now the idea is to assume no knowledge from the reader and try to get to a point where they can not only (in this case) code a simple service broker solution, but also understand what they are doing.

Anyway, the first service broker post goes up today. Enjoy.

TSQL Tuesday #126: Folding@Home

This month’s T-SQL Tuesday comes from Glenn Berry, and is all about what you are doing to help during the ongoing Coronavirus crisis.

He links to Folding@Home, which allows you to use your personal machine(s) to do complex protein folding calculations to help with medical research. I don’t pretend to understand all of what they’re doing, but essentially it’s taking a problem, breaking it down into lots of mini problems, and sending these mini problems out to individual computers to find solutions. I’ve signed up to that now, and joined the Tech Nottingham team, so that’s one thing I’m doing. Plus I’m trying to get my old desktop working again, so it can run on there as well as my newer machine, and I can fold double the things.

I’m finding this a hard blog to write though, because I don’t feel like I’m doing that much apart from that. The main thing I’ve done is to set up a Microsoft Teams organisation for my local creative writing group, Nottingham Writers Collective. We’ve run a few meetups on there and it’s been a fun way to keep in touch with some people who are very important to me. We’ve also set up a number of things in the team to help people share work, and set challenges for themselves, but so far nobody has really used them. They’re there though, if anyone does feel a need, and I hope as the lockdown continues we will take advantage of them a bit more.

I think this is one of the little things that a lot of us are probably in a position to do. So much of our lives are suddenly being lived online, and it’s easy for us as techie people to forget how daunting a lot of this stuff is. So, make more of an effort to be patient with your family as they ask you for the 10th time how Skype works, or want to know why they can’t see everyone on a Zoom chat. Look out for things that can help keep people connected, and tell everyone about them.

I’ve discovered tabletop simulator, humbe bundle’s offer on Asmodee games, and board game arena’s free to play browser version of many tabletop games, and managed to organise a gaming session with some of my friends a few weekends ago. I think it helped us feel a bit less alone, and I will try to organise similar things in the future.

This was an odd blog to write, I’ve not really written anything on here until now that wasn’t very SQL/tech focussed. I hope anyone who reads this is keeping well and coping ok with everything.

AND and OR interactions

I’ve been working through a particularly nasty bug recently, and when I eventually found the cause it turned out to be a mistake in a WHERE clause including several ANDs and ORs. I thought it’d make an interesting topic to dive into for a quick blog post.

The basic issue looked something like this:

INSERT INTO dbo.TargetTable
    (
          TableGUID
        , Column1
        , Column2
    )
SELECT DISTINCT
      TableGUID
    , Column1
    , Column2
FROM dbo.SourceTable
WHERE SourceTable.StatusColumn = 'A'
    OR (SourceTable.StatusColumn = 'B' AND SourceTable.StatusDate IS NULL)
    AND SourceTable.TableGUID NOT IN 
        (SELECT TableGUID FROM dbo.TargetTable)

The problem was we wanted to apply the last AND predicate every time, but the interactions between the ANDs and the OR meant that wasn’t happening. To see exactly what I mean, here’s a couple of simplified versions of the code where I’ve used brackets to make it clearer what is happening:

SELECT 1 -- returns successfully
WHERE 1 = 1
    OR 2 = 2
    AND 2 = 1 -- we want it to not return because of this
    
SELECT 1 -- this is what is actually happening
WHERE 1 = 1
    OR (2 = 2 AND 2 = 1)

SELECT 1 -- this is what we should have done
WHERE (1 = 1 OR 2 = 2)
    AND 2 = 1

So, basically, the OR treats everything after it as being part of the OR, so when the first predicate returns true it doesn’t matter what the rest of the predicates are because they’re all on the other side of the OR. At this point we have a diagnosis, and the solution seems pretty clear: re-write the code with some brackets to tell the query engine what to do.

INSERT INTO dbo.TargetTable
    (
          TableGUID
        , Column1
        , Column2
    )
SELECT DISTINCT
      TableGUID
    , Column1
    , Column2
FROM dbo.SourceTable
WHERE (SourceTable.StatusColumn = 'A'
    OR (SourceTable.StatusColumn = 'B' AND SourceTable.StatusDate IS NULL))
    AND SourceTable.TableGUID NOT IN 
        (SELECT TableGUID FROM dbo.TargetTable)

That gives us a functionally correct solution, but to me there’s another issue. We have re-written the code to clarify things for the query engine, but I’d argue we haven’t made it particularly clear for the next developer who has to edit this code (this is all part of the same insane block of code I wrote about in my code noise post a couple of weeks ago), and that can lead to all kinds of issues further down the line.

I have a particular approach whenever I’m writing a set of predicates connected with both ANDs and ORs. I effectively layer the predicates, starting with a top layer of either ANDs or ORs, then moving to the second layer which will be the opposite. Each sub-layer is wrapped in brackets and indented, and I usually keep each predicate on a different line. For example, this is how I would lay out the code we started this post with:

INSERT INTO dbo.TargetTable
    (
          TableGUID
        , Column1
        , Column2
    )
SELECT DISTINCT
      TableGUID
    , Column1
    , Column2
FROM dbo.SourceTable
WHERE 1 = 1
    AND (SourceTable.StatusColumn = 'A' -- top layer of ANDs
         OR (SourceTable.StatusColumn = 'B' -- second layer of ORs
             AND SourceTable.StatusDate IS NULL)) -- third layer of ANDs
    AND SourceTable.TableGUID NOT IN
        (SELECT TableGUID FROM dbo.TargetTable)

This makes it quite clear that the last AND needs to be evaluated separately to the rest of the WHERE clause.

Now you might be wondering where the 1 = 1 came from. That’s something I like to include in all of my code to make it easier to debug by allowing you to comment out the first predicate easily. Without that, if you want to comment out the first predicate and keep the second you end up having to do something awkward like this:

FROM dbo.SourceTable
WHERE --(SourceTable.StatusColumn = 'A'
       --OR (SourceTable.StatusColumn = 'B'
           --AND SourceTable.StatusDate IS NULL))
    --AND 
    SourceTable.TableGUID NOT IN
        (SELECT TableGUID FROM dbo.TargetTable)

But with the 1 = 1 you can do this instead:

FROM dbo.SourceTable
WHERE 1 = 1
    --AND (SourceTable.StatusColumn = 'A'
         --OR (SourceTable.StatusColumn = 'B'
             --AND SourceTable.StatusDate IS NULL))
    AND SourceTable.TableGUID NOT IN (SELECT TableGUID FROM dbo.TargetTable)

Which saves you from messing about with the last AND predicate at all.

Now, if your query is largely ORs so you want that to be your top layer, you can’t do quite the same thing because the OR means the WHERE always comes back as TRUE. So, what you use instead is 1 = 2, which achieves the same thing as far as ease of debugging is concerned:

FROM dbo.SourceTable
WHERE 1 = 2
    OR (SourceTable.StatusColumn = 'A'
        AND SourceTable.TableGUID NOT IN
            (SELECT TableGUID FROM dbo.TargetTable))
    OR (SourceTable.StatusColumn = 'B'
        AND SourceTable.StatusDate IS NULL
        AND SourceTable.TableGUID NOT IN
            (SELECT TableGUID FROM dbo.TargetTable))

This isn’t the neatest way of writing the code, because we have to repeat the NOT IN across the different OR predicates, but it does the same thing as the rest of the code we’ve been looking at. I suppose for consistency, I should include the 1 = 1 or 1 = 2 in the bracketed predicates as well, and that would help when it comes to debugging, but it would also clutter the code more than a little as we can see:

FROM dbo.SourceTable
WHERE 1 = 2
    OR (1 = 1
        AND SourceTable.StatusColumn = 'A'
        AND SourceTable.TableGUID NOT IN
            (SELECT TableGUID FROM dbo.TargetTable))
    OR (1 = 1
        AND SourceTable.StatusColumn = 'B'
        AND SourceTable.StatusDate IS NULL
        AND SourceTable.TableGUID NOT IN
            (SELECT TableGUID FROM dbo.TargetTable))

Having said that, I do quite like the way that looks. In particular, I like the way each new AND block is clearly defined with the 1 = 1. These kind of standards are something to discuss with your team, if possible, and work together to standardise the way you write code.

Finally, here’s a made up example with several layers to show how this can look with very complex statements. The numeric predicates (1 = 1 and 1 = 2) are there to allow the commenting out of other predicates, and everything else is there as a stand-in for actual query logic:

SELECT
      1
WHERE 1 = 2 -- false or false or true or false = true
    OR (1 = 1        -- true and true and false = false
        AND 'A' = 'A'  -- true
        AND 'B' = 'B'  -- true
        AND (1 = 2     -- false or false = false
            OR 'AB' = 'AC' -- false
            OR 'AB' IN     -- false
                ('AD', 'A', 'AA', 'ABA')))
    OR (1 = 1        -- true and true and false = false
        AND (1 = 2     -- true or false = true
            OR 'X' = 'X'  -- true
            OR 'Z' = 'A') -- false
        AND (1 = 2     -- true or false = true
            OR 'G' = 'G'  -- true
            OR 'F' = 'W') -- false
        AND (1 = 2     -- false or false = false
            OR 'F' = 'G'   -- false
            OR 'H' = 'I')) -- false
    OR (1 = 1        -- true and true = true
        AND (1 = 2     -- false or true = true
            OR (1 = 1     -- false and true = false
                AND 'E' = 'F'  -- false
                AND 'G' = 'G') -- true
            OR 'A' = 'A') -- true
        AND 'B' = 'B') -- true
    OR NOT 'A' = 'A' -- not true = false

So, in conclusion, if you include an OR in your code, be aware that anything after the OR should be treated as being bracketed together. And ideally write your code with explicit brackets and style it in a way to make it clear what is going on.

Values blocks

Values blocks are a really useful little bit of code in SQL Server. Basically, they are a block of defined values that you can use pretty much like any other data set.

The main place you may have encountered them before is as a source for an input. Often when people need to add a set of records to a table I see something like this:

INSERT INTO dbo.Table1
    (
          Column1
        , Column2
    )
SELECT 1, 'some value'
UNION
SELECT 2, 'some other value';

Or even worse:

INSERT INTO dbo.Table1 (Column1, Column2)
SELECT 1, 'some value';
INSERT INTO dbo.Table1 (Column1, Column2)
SELECT 2, 'some other value';

The first attempt is ok, but it’s bulky and unnecessary, and it needs you to keep typing UNION over and over. The second attempt is actively inefficient, as each row is inserted individually instead of inserting everything as a set.

The cleaner way, that uses a VALUES block, is:

INSERT INTO dbo.Table1
    (
          Column1
        , Column2
    )
VALUES
      (1, 'some value')
    , (2, 'some other value');

This saves you from typing out the UNION all the time, and in my opinion looks neater on the page and makes your block of values easier to read.

The basic rules for a values block are:

  1. The data in each row is comma separated
  2. Each row of data is wrapped in a set of brackets
  3. The rows themselves are also separated by commas
  4. Each row has the same number of values (2 in the example above)
  5. Each value position has to hold data of the same type for every row (int, varchar in the example above)
  6. NULLs are allowed for any value

This use case is useful to know about by itself, but I think the more powerful use of values blocks comes when you start using them in other queries. To do this you need to treat them as a subquery, like in this example:

SELECT
      val.Column2
    , tbl.Column4
FROM dbo.Table2 AS tbl
INNER JOIN
    (
        VALUES
              (1, 'some value')
            , (2, 'some other value')
    ) AS val(Column1, Column2)
    ON tbl.Column1 = val.Column1;

So, all you have to do is wrap the values block in brackets, alias it , and name the columns, and you can use it like any other subquery. The only thing that’s different here is the need to name those columns when you do the aliasing, but you do that simply by listing the names in brackets after the alias.

Interestingly, this is something you can do with regular subqueries as well. It’s probably not something you will use very often as you’re more likely to rename the column in the subquery, but it never hurts to know about these things.

Finally, if I’m using a big values block in a SQL statement, it can be a bit unwieldy to have it in a subquery. It can dominate the rest of the statement, and if it’s long enough you won’t be able to see all of the statement on the screen, and the individual values rarely add much to your understanding of the code. That’s why I will often put the values block in a common table expression at the start of the statement. That also allows you to reuse it if you need to refer to it more than once. Example below:

WITH val AS
(
    SELECT
          val.Column1
        , val.Column2
    FROM
        (
            VALUES
                  (1, 'some value')
                , (2, 'some other value')
        ) AS val(Column1, Column2)
)
SELECT
      val.Column2
    , tbl.Column4
FROM dbo.Table2 AS tbl
INNER JOIN val
    ON tbl.Column1 = val.Column1;

Another option here might be to put the values into a table variable or temp table, but those won’t be as efficient for the query to process. The only time I’d consider that is if there were several statements that wanted to access the same data set, and the data set was small.

So, that’s pretty much everything I know about values blocks. Hope that was useful.

Code noise

I love the term code noise. It’s one of those terms that succinctly encapsulates a quite complex topic in a couple of words, and is instantly recognisable to anyone who’s encountered it even if they had never heard the term before.

Basically code noise is anything that pulls your attention away from what the code is supposed to be doing, or obscures the true nature of the code in some way. It’s not something we consider enough when writing T-SQL code, but I think there is a lot to be said for writing code the next person will be able to read.

As a small example, I was debugging something recently and found that all of the insert statements had ORDER BY clauses. I couldn’t work out why these were making me so angry, after all it’s not doing anything to hurt performance, and in fact isn’t doing anything at all, until one of the other devs in the office pointed out that it’s one example of the code noise that the whole code base is filled with.

More extreme examples are the tendency some developers have to load data into temporary tables after temporary table, or to write nested subqueries 5 layers deep. Both of these things largely just hide where the actual logic of the code is, and make it a nightmare to debug. Here’s an example:

WITH WorkOrderRoutingPlusLocation
AS
    (
        SELECT
              WrkOrdrRtng.WorkOrderID
            , WrkOrdrRtng.ProductID
            , WrkOrdrRtng.ActualCost
            , WrkOrdrRtng.PlannedCost
            , WrkOrdrRtng.OperationSequence
            , Loc.Name AS LocationName
        FROM Production.WorkOrderRouting AS WrkOrdrRtng
        INNER JOIN Production.[Location] AS Loc
            ON WrkOrdrRtng.LocationID = Loc.LocationID
    )
SELECT
      WrkOrdr.WorkOrderID
    , WrkOrdr.ScrappedQty
    , WrkOrdrRtng.OperationSequence
    , Prod.Name AS ProductName
    , WrkOrdrRtng.LocationName AS LocationMovedTo
    , PrevWrkOrdrRtng.LocationName AS LocationMovedFrom
    , WrkOrdrRtng.PlannedCost
    , WrkOrdrRtng.ActualCost
    , CASE
          WHEN WrkOrdrRtng.ActualCost
              > WrkOrdrRtng.PlannedCost
          THEN 'Over budget'
          WHEN WrkOrdrRtng.ActualCost
              < WrkOrdrRtng.PlannedCost
          THEN 'Under budget'
          WHEN WrkOrdrRtng.ActualCost
              = WrkOrdrRtng.PlannedCost
          THEN 'On budget'
          ELSE NULL
      END AS BudgetStatus
FROM Production.WorkOrder AS WrkOrdr
INNER JOIN WorkOrderRoutingPlusLocation AS WrkOrdrRtng
    ON WrkOrdr.WorkOrderID = WrkOrdrRtng.WorkOrderID
INNER JOIN Production.Product AS Prod
    ON WrkOrdrRtng.ProductID = Prod.ProductID
LEFT JOIN WorkOrderRoutingPlusLocation AS PrevWrkOrdrRtng
    ON WrkOrdrRtng.WorkOrderID = PrevWrkOrdrRtng.WorkOrderID
    AND WrkOrdrRtng.ProductID = PrevWrkOrdrRtng.ProductID
    AND WrkOrdrRtng.OperationSequence 
        = PrevWrkOrdrRtng.OperationSequence + 1
INNER JOIN
    (
        SELECT
              WrkOrdrRtng.WorkOrderID
            , SUM(WrkOrdrRtng.PlannedCost) AS TotalPlannedCost
            , SUM(WrkOrdrRtng.ActualCost) AS TotalActualCost
        FROM Production.WorkOrderRouting AS WrkOrdrRtng
        GROUP BY
              WrkOrdrRtng.WorkOrderID
    ) AS sq_WrkOrdrRtngTotals
    ON WrkOrdr.WorkOrderID = sq_WrkOrdrRtngTotals.WorkOrderID
WHERE 1 = 1
    AND sq_WrkOrdrRtngTotals.TotalActualCost
        > sq_WrkOrdrRtngTotals.TotalPlannedCost
ORDER BY
      WrkOrdr.WorkOrderID
    , WrkOrdrRtng.OperationSequence

I think this is fairly self-explanatory code, even without any comments. There’s not much here that isn’t necessary, just the 1 = 1 in the WHERE clause, but that’s to help with debugging. The CTE is there because we use these tables joined together more than once in the query, and one of those times is the right side of a left join. The subquery is there because we genuinely want to look at things at a different level of aggregation to the main query. Everything else is joined together very logically.

Under other circumstances I would have formatted it slightly different, but to make it fit well on the blog post I’ve tried to make it as thin as possible. To that end I’ve done things like splitting predicates across multiple lines that I wouldn’t ordinarily do, but I don’t think that affects the readability of the code too much.

Now, consider this alternative way of writing this query:

SELECT
      Sub3.WorkOrderID
    , WrkOrdr.ScrappedQty
    , Sub3.OperationSequence
    , Prod.Name AS ProductName
    , LNam AS LocationMovedTo
    , L.[Name] AS LocationMovedTo
FROM
    (
        SELECT
              Sub2.WorkOrderID
            , ProductID
            , BudgetStatus
            , OperationSequence
        FROM
            (
                SELECT
                      WorkOrderID
                    , ScrappedQty
                    , SUM(PlannedCost) AS TotalPlannedCost
                    , SUM(ActualCost) AS TotalActualCost
                FROM
                    (
                        SELECT DISTINCT
                              WrkOrdr.WorkOrderID
                            , ScrappedQty
                            , PlannedCost
                            , ActualCost
                        FROM Production.WorkOrder AS WrkOrdr
                        INNER JOIN Production.WorkOrderRouting AS WrkOrdrRtng
                            ON WrkOrdr.WorkOrderID
                                = WrkOrdrRtng.WorkOrderID
                    ) AS Sub1
                GROUP BY
                      WorkOrderID
                    , ScrappedQty
            ) AS Sub2
        INNER JOIN
            (
                SELECT
                      CASE
                          WHEN ActualCost > PlannedCost
                          THEN 'Over budget'
                          WHEN ActualCost < PlannedCost
                          THEN 'Under budget'
                          WHEN ActualCost = PlannedCost
                          THEN 'On budget'
                          ELSE NULL
                      END AS BudgetStatus
                    , WorkOrderID
                    , ProductID
                    , OperationSequence
                FROM Production.WorkOrderRouting AS WrkOrdrRtng
            ) AS Sub21
            ON Sub2.WorkOrderID = Sub21.WorkOrderID
        WHERE 1 = 1
            AND TotalPlannedCost < TotalActualCost
    ) AS Sub3
INNER JOIN Production.Product AS Prod
    ON Sub3.ProductID = Prod.ProductID
INNER JOIN Production.WorkOrder AS WrkOrdr
    ON Sub3.WorkOrderID = WrkOrdr.WorkOrderID
INNER JOIN Production.WorkOrderRouting AS WrkOrdrRtng
    ON Sub3.WorkOrderID = WrkOrdrRtng.WorkOrderID
    AND Sub3.ProductID = WrkOrdrRtng.ProductID
    AND Sub3.OperationSequence = WrkOrdrRtng.OperationSequence
INNER JOIN
    (
        SELECT
              L.LocationID
            , L.[Name] AS LNam
        FROM Production.[Location] AS L
    ) AS Sub4
    ON WrkOrdrRtng.LocationID= Sub4.LocationID
LEFT JOIN
    (
        SELECT
              WorkOrderID
            , ProductID
            , OperationSequence
            , LocationID
        FROM Production.WorkOrderRouting
    ) SubWOR
    ON WrkOrdrRtng.WorkOrderID = SubWOR.WorkOrderID
    AND WrkOrdrRtng.ProductID = SubWOR.ProductID
    AND WrkOrdrRtng.OperationSequence
        = SubWOR.OperationSequence + 1
LEFT JOIN Production.[Location] AS L
    ON SubWOR.LocationID = L.LocationID

Now, I think these two statements do the same thing, although to be honest I got a little lost writing this second one. It should be obvious, however, that the second statement is not nearly as clear, that there is a lot of extra code around it making a lot of noise.

Obviously this is a made up example, but it is similar to a lot of real-world examples I’ve seen. In particular, the overuse of subqueries (and subqueries inside subqueries inside subqueries) to filter or join data. The danger here, apart from it looking ugly, is that another developer comes along, can’t read the intention behind the original code because of all the noise, and just hacks something else onto the existing mess. You can see this has happened on line 63 when someone has added the WorkOrderRouting table to the query, because they need to join from it to the Location table. The WorkOrderRouting table is already part of this query, in Sub21 inside of Sub3, but the new developer hasn’t been able to figure this out, or maybe they’re not sure about how to bubble up the LocationID through all of the subqueries (especially as WorkOrderRouting also exists in Sub1 inside Sub2 inside Sub3 but can’t be bubbled up because there’s some aggregation along the way). Instead they’ve just hacked a new join to the table onto the existing mess and everything has gotten that much harder to understand.

Another thing that’s obviously bad about this statement is the inconsistent naming standards. Sub1 is always a terrible name, you need to alias anything, but especially a subquery, with something meaningful. I like to prefix any subquery aliases with sq_ so when you reference it elsewhere in the query you know you’re referencing a subquery. You also need to make all column names 2 part. Where does LNam in the outer SELECT come from? Or OperationSequence in Sub3? Without 2 part names for all columns, this can be a nightmare to figure out.

I want to end with an example of really bad code noise I found yesterday in my actual work. Table and column names are changed, but the rest of the code is as is:

DELETE FROM TriggerTable_A
WHERE A_GUID IN (SELECT tmp_Sync7.A_GUID
                        FROM tmp_Sync7
                        INNER JOIN TriggerTable_A a on tmp_Sync7.A_GUID = a.A_GUID
                        WHERE tmp_Sync7.UpdatedDate = a.UpdatedDate)

This looks a bit odd, but the main question you have looking at it is what’s that tmp_Sync7 table, right? Well, almost 300 lines of code previously we have this little code snippet (and I checked a few times and there is nothing in those 300 lines of code that does anything to tmp_Sync7):

IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'tmp_Sync7') 
    DROP TABLE tmp_Sync7

SELECT * INTO tmp_Sync7 from TriggerTable_A

My brain pretty much exploded when I saw this. What we’re saying here is DELETE from TriggerTable_A if A_GUID is in TriggerTable_A joined to TriggerTable_A. Basically, DELETE FROM TriggerTable_A. As a bonus, you don’t have that nasty permanent tmp_Sync7 taking up space on your database, and you improve performance because you’re doing a lot less.

Bottom line is code noise makes it harder for other developers to read your code. It makes it harder for you to read your code when you come back to it in 6 months to fix some crazy bug that’s only just started showing up. It can make it harder for the query optimiser to read your code, meaning it takes longer to come up with an execution plan, and has more chance of hitting the time-out limit and returning a sub-optimal plan. Both of these things hurt performance, but ultimately your drive to eliminate code noise shouldn’t just be driven by that. You should want to keep your code clean, simple, and elegant, so that when your fellow developer come to build on what you’ve done in 6 months or a year, they can easily understand what your code is doing and make clean, simple, elegant changes to it themselves.