Skip to content

T-SQL Tuesday #137: Using Notebooks Every Day

This is my 8th TSQLTuesday, and this month Steve Jones has invited us to write about Jupyter Notebooks.

I’ve been aware of Notebooks for a little while, but didn’t really know what they were or how to get started with them. Then I attended a talk a few years ago by Rich Benner on Azure Data Studio, which was really interesting in its own right, but also included the ability to create and edit notebooks.

After that I got a bit more interested, and had a bit of a play and did a bit of research. I could see a few use cases for them that other people had written about, things like creating a notebook to pass to the DBAs/admins to run on live and save with the results in as a way to easily do some basic troubleshooting or analysis on a production system

This seemed really appealing at the time, as I was working somewhere where devs were expected to troubleshoot issues on live databases without being able to access them. However, the organisation I was part of moved very slowly and the chances of notebooks or ADS being adopted any time soon was pretty slim so that was, unfortunately, a bit of a non-starter.

Since then I’ve continued to have a play every once in a while, but I’ve never been anywhere where the people around me were much interested, and without that it’s quite hard to get traction with this sort of thing.

I have found one use, however, and that’s as a set of notes for myself.

Recently I was looking to explain to someone about some of the newer syntax that’s available in T-SQL (and when I say newer I mean anything post 2008). I did a quick bit of research and realised there was plenty that I’d forgotten about and didn’t use even when it could be useful for me, so I set up a notebook to very briefly explain the new functionality and include some basic examples. It’s not complete, but you can find it in my GitHub repo.

Going forward I plan on adding a few more notebooks there to help me keep track of any other bits of syntax that I either forget exist or have to research every time I need to use it. I’m thinking one for XML and another for JSON might be really handy, as well as one for all the aggregation/windowing function options. If I can get motivated enough (and that can be a big if) this will hopefully grow into kind of a mini books online, but personalised for me to help me remember the things I need to know.

Error message: “Adding a value to a ‘date’ column caused an overflow”

This is an error I ran across a little while ago, and thought it was worth a quick blog post (as a reminder to myself as much as anything).

The error came about because I was working with a date dimension, and I knew I would encounter cases where the date wasn’t exactly known but was known to be some point in the future. To try and handle these cases so that any WHERE SYSDATETIME() < DateColumn predicates would work correctly, I included a row for the 31st December 9999 in the date dimension. I ran some data into the table and tested a few simple queries and everything worked, but soon after a lot of other queries started to fail with the “Adding a value to a ‘date’ column caused an overflow” error.

This happened because I had some queries where I was joining to the date dimension and trying to return the next date with a DATEADD() function. Even though I was only running this on tables where the dates were in the past, SQL Server was starting the query execution by calculating the ‘tomorrow’ date against every row in the date table before joining the results to the main fact table. This resulted in the query trying to add a day to 9999-12-31, which would put the date out of range of the DATETYPE2 data type (and any data type).

Once I realised what was happening I removed the row from the date dimension and shifted to using NULL for the ‘sometime in the future’ DateID. This meant putting a bit of extra work into the query, but everything runs now.

As a bonus, it’s clearer what is happening. NULL is the correct marker to use in these scenarios, because it represents an unknown value. You may be able to get away with using a default value instead, but there’s always the danger that some future developer won’t realise that’s what’s happening and write code that will treat the default as if it’s a valid value.

SSDT database projects: deployment error messages

Today I’d like to talk about three error messages you might see when deploying an SSDT database project, either through Visual Studio or via a dacpac and script. I’m going to focus here on what you see from inside of Visual Studio, but you will see similar errors returned when you deploy using a script and the reasons behind them will be the same.

Chances are whatever the error is, you will start off with something like:

This appears in the Data Tools Operations tab, usually at the bottom of the Visual Studio window

From here, you need to click on the View Results link inside Visual Studio, which brings up a tab with the release script in the main window, and a Message tab in the results window. The message tab is what we’re interested in, it will show you the feedback from all the print commands, as well as the actual error that came back when the publish process tried to run the script against an actual database.

Regular errors

These will almost always be something to do with aggregation. For some reason the IntelliSense inside of Visual Studio doesn’t catch aggregation issues, and the build process lets them through as well.

The Message tab will probably look something like this:

And what you need to do is scroll right to see the end of that long line I’ve cut off in that snapshot:

As expected, I’ve included something in a SELECT and not in the corresponding GROUP BY, and IntelliSense hasn’t spotted that until we’re actually trying to deploy the code. Easy fix here, add the column to the GROUP BY or, more generally, fix the T-SQL error.

The included file does not exist

This is more usually something you will find when building the project, but sometimes it can sneak in at deployment stage as well. It’s something that happens when you’ve used a file reference in a pre/post deployment file and the referenced file doesn’t exist. This should just appear in your error window like this:

This really is a top quality error message, it tells you the file that’s missing, it tells you the file that contains the reference, and even the line in the file with that reference. Simple solution here, figure out what’s going on with the reference and either make sure the file is there or remove the reference.

Changes to [X] may introduce runtime errors in [Y]

This error occurs because you have an object [Y] in your target database that you have not included in your database project for whatever reason. [Y] includes a reference to another object, [X], that you have included in your database project and that also exists in your target database.

The current deployment will make a change to [X], and the deployment process can see that there in an object [Y] in the target database that may be affected by the change. It cannot, however, see if [Y] will still be in a valid state after the change to [X], because [Y] is not in the database project.

This is not something you can turn off, or work around. The only way to fix this is to bring the project and the target database in line, either by deleting [Y] on the server or bringing [Y] into your project.

This originally started out as a couple of slides in my New Stars of Data talk, but got cut due to time constraints. I was sad to get rid of them because there’s some useful stuff there, but they didn’t quite fit the theme of the talk and so were dropped in favour of something that did. Anyway, I thought they might make a useful blog post for someone.

New Stars of Data feedback

As mentioned in my last post, I gave my first ever conference talk at New Stars of Data last Friday. Recordings of all of the talks (including mine) are now available on YouTube, and all of the slide decks are available in the NSoD GitHub, or you can find mine here. I’ve not yet been able to listen to mine, but I will manage it someday (my voice always sounds so strange when it’s recorded).

I thought the session went very well, and I had a few nice comments at the end, but I was still more than a little nervous about the feedback form.

As it happens I did pretty well…

If you can’t see them in the picture, they are:

Demos: 4.3, Overall Session: 4.71, Presentation Style: 4.57, Session Matched Abstract: 4.77, Session Met Expectations: 4.69, and Speaker’s Knowledge: 4.93

I don’t want to brag, but I’m ridiculously happy with these scores. The Speaker’s Knowledge and Overall Session scores in particular just blow me away.

Then, looking at the comments…

Again, really couldn’t be happier.

A few highlights:

This session blew my mind. I did not know that these things are possible with SSDT even though I played with it before. Thank you Chris!

Awesome job Chris! I love it when I can learn new things and you knocked it out of the park. Very well done. You have great pace, tone, & answered questions well. Keep it up and I look forward to more sessions from you.

Excellent session. Interesting topic with clear understanding, clearly expressed.

Feedback

Clearly there’s one person who wanted more demos, and it feels like they wanted in general more of an introduction to Visual Studio database projects. I feel bad for them because I want everyone to get something out of the talk, but it’s not an introduction session. I’m seriously considering putting together an introduction session as a companion to this one, but this session can largely stay as it is. The main thing I might tweak is the abstract, to make it clearer that this is an advanced dive into the topic.

That might seem a bit harsh, but I only have an hour to get everything across. I did want to include some demos in the presentation but that would have meant cutting out 10-15 minutes of talking, which maybe means I don’t get to the really good stuff. Some conferences only have 50 minute slots for speakers so I may need to cut this talk down a bit if I want to try and speak at them in the future (spoiler alert: I definitely do want to try and speak at some of them, speaking was awesome and I very much have the bug now).

How I built the talk in the first place

Part of the reason I know I can’t do everything is because of the way this talk evolved in the first place. I wanted to give a talk at my local user group a couple of years ago, and settled on a general overview of best practices, or things we can learn from the .NET world. This was largely because I’d been working with .NET developers for a few years, and had noticed a lot of ways in which their best practices seemed more embedded than they did in the SQL Server world. I’m talking about things like source control, unit tests, versioning, all kinds of stuff.

Anyway, that first talk was fairly well received even if I was a stuttering mess throughout, and I decided I wanted to do another. This time I figured I’d follow up on what I’d done before and talk in more detail about Visual Studio database projects as an option for source control, as well as a few of the other interesting things you can do with them.

That went a bit better, but I still felt like it wasn’t great. Then I heard about New Stars of Data, and checking the entry requirements I saw that having spoken at a user group before didn’t disqualify you (I guess they were looking for anyone who hadn’t spoken at conference level). I applied with an abstract about what you could do with database projects besides just putting your project under source control, with the aim being to build on the previous talk.

Long story short, I got accepted, and worked really hard with my mentor to try and make a good talk. At the start I figured I’d cut out some of the stuff from the last talk and add a demo, but after Rob saw my slides he said something along the lines of “We have a lot of work to do” and I knew I’d need something a bit more drastic. We ended up with something that looked completely different to the previous presentation, with at least half the content gone, and I needed to cut that. If I hadn’t, if I’d tried to give the introduction talk, I wouldn’t have had the time I needed to really dive into the good stuff that ended up impressing people. As it was, I could definitely have spoken for longer on some of those advanced features, and would love to give the 2-4 hour version that one of the commenters requested! Unfortunately conferences don’t often have slots that long, so I have to pick and choose.

I remain incredibly grateful to the conference organisers, to Rob, to everyone who showed up to listen to me, and to everyone who left feedback. My hope is I can kick on and make this the start of my speaker journey.

New Stars of Data

I gave my first ever conference talk yesterday at New Stars of Data. This is a conference specifically for new speakers, where you get paired with a mentor who gives you advice and works with you to make your talk better over the months before the conference. It started out as a new speaker track at DataGrillen, but with the main event being cancelled due to Coronavirus it’s shifted to being a separate conference for the last couple of years.

I can honestly say the experience was fantastic. My mentor was great (Rob Sewell, also known as Beardy McBeardface or DBAWithABeard), and I learned a huge amount from him over the couple of months we were working on the talk together. I also got some really positive feedback from some of the more experienced presenters watching, so that’s really good as well.

I can’t recommend enough that you apply for this if you’re thinking about getting into speaking. Hopefully next year the DataGrillen conference will happen, and I think at that point this will be folded back into the main conference, but you will still have the support and mentoring I had. I’m definitely going to try and speak at more events, and I can’t think of a better way to get started on my speaking journey than this. Huge thank you to everyone involved.

T-SQL Tuesday #135: The outstanding tools of the trade that make your job awesome

This image has an empty alt attribute; its file name is tsqltues.png

This is my 7th TSQLTuesday, and this month we’re all invited to blog about the tools we use in our job that we are particularly attached to.

As it happens, I’m giving a talk in just over a month at New Stars of Data on a tool I think a lot of the community overlook: Visual Studio Database Projects.

I’m not going to regurgitate the whole talk here, that would take far too long, but I will give a quick preview of what’s covered on a couple of slides that cover what I like to call the “quality of life” improvements, small things that just become easier when coding through a Visual Studio project rather than in Management Studio or similar.

Renaming and moving objects

Usually we avoid renaming anything in SQL Server. You don’t know what other bit of code might be lurking out there that depends on the object you just renamed, or when that code might be called, and so spelling errors end up being left in for years because remembering it’s the Customet table not the Customer table is easier than having the application fall over at 3am because we missed a reference somewhere in our code base.

Database projects gets rid of that fear. If you rename an object, or move it from one schema to another, Visual Studio automatically propagates that change through your solution. And even if there’s some reference somewhere it didn’t recognise as needing updating, the intellisense-style validation will cause the next build to fail and tell you the database is no longer fully functional until you fix the issue. You do have to remember to rename through the interface rather than in the scripts, but in exchange you get the peace of mind that comes with knowing you’ve not broken anything with your change and you can finally query the Customer table.

All of your scripts are object creation scripts

This is a bit of an odd point but bear with me, it leads to something good.

Every script you write in a database project is the script to create the object from scratch. It isn’t a generated script from the object, which is how you view objects in Management Studio.

The reason this is significant is it means you can add comments to objects you wouldn’t normally be able to add comments to, like tables. This means you can add a comment next to a column name and give a decent explanation of what it means, or add a description at the top of the table script to say exactly what each row in this table represents, and why the table exists.

If you’re encrypting your code, which you might do if you are selling a managed service that gets deployed to the client’s servers, you can also store the unencrypted code in your database project. If you try to retrieve the code from Management Studio you won’t get anything (that’s the point of encrypted objects) but Visual Studio just holds the create script, not the object itself.

F12 to jump to object

Finally, if you highlight an object in your code and hit F12, you will open another tab with the code for that object. No more hunting around in the database for the view definition, now you can just open it up instantly. I found this incredibly useful once I got into the habit of it, but I have to warn you that it only works if the code you’re currently in is error free.

So there you have it, three nice little features in Visual Studio database projects that can make a real difference to your coding. I hope this was helpful to someone, and I’d encourage anyone reading this who isn’t a TSQLTuesday regular to give it a go next month.

TSQLTuesday #134: Give me a break

This image has an empty alt attribute; its file name is tsqltues.png

This is my 6th TSQLTuesday, and it’s a subject close to my heart this month: travel.

Travelling is awesome

As I wrote about in a previous TSQLTuesday blog, I really like to travel. I have a big list of places I want to see, as a big part of my epic quest of awesome, and I know I’m never going to get to see everything I want to. Obviously money and time are factors in how often I can get away, and so after the Everest trip in 2019 I’d set myself the fairly modest goal in 2020 of visiting Rome, to tick off the Colosseum, Vatican City, and take a day trip to see the Leaning Tower of Pisa. I also wanted to visit Stonehenge, as it’s in the same country which means I can make it a weekend trip and see some friends in Bristol at the same time.

Reassessing plans

Clearly the Rome trip never happened. It became quite obvious fairly early in the year that travel to Italy was a bad idea, and I came to terms with that pretty quickly. I held out some hopes for the Stonehenge trip, but in the end the situation never felt safe enough for that kind of frivolous holiday.

Instead I spent the year getting some things sorted in my home life, and that included a big think about my plans for the future. I plan on writing a more detailed post (or maybe several) about the process I went through to do that at some point, but I won’t go into it too much detail here. One of the things that came out of that was I had a serious look at my finances for the first time in a long while, and looked at the types of holiday I want to do, and how regularly I could afford to do them.

Categorising holidays

As a data person, obviously I needed to split the holidays up into categories:

  • Weekend trips – things like the trip to Stonehenge, usually in the UK but could be in Europe, little to no time off work and relatively inexpensive. I can probably absorb the cost of one of those into a month’s budget (or maybe spread across 2 months) by cutting down on other expenses like meals out/takeaway and not making any other big purchases. These are holidays I don’t have to plan for.
  • City breaks – these are trips up to a week long, where I travel to another city (probably in Europe) and see stuff. They obviously need a bit of time off work, won’t come with too much of a plan, but will need some organisation and saving beforehand. The Rome trip I planned was a good example of one of those.
  • Adventures – these are trips like the trek to Everest base camp, or the Vietnam experience I talked about in my other blog post. They last anywhere between 1-4 weeks, will be booked as a tour with a set itinerary, usually involve changes of location, and often need me to buy quite a bit of gear. These obviously come with a fair price tag and require some significant organisation beforehand.
  • Crazy adventures – Never done one of these yet, but it would be a combination of city breaks and adventures strung together in one holiday.

Next trip

Having taken stock, through about some of the holidays I wanted to do soon, and mentally assigned them some categories, I decided I could probably bump the next holiday I take up from a city break to an adventure. There were a couple of factors in that, but the main one was that the extra year of not doing anything was an extra year to save and get myself in a good financial position to absorb a holiday like that into my budget.

I had a think about what exactly I wanted out of the next holiday, especially given what had happened on the Everest trek. One of the things I decided was it needed to be a physical holiday, I think I need to prove something to myself that I am still capable of these more physical holidays. I also decided I wanted to go somewhere warm, after the cold of the Everest trek. With those criteria in mind I settled on a trek through the Jordan desert to Petra and the Dead Sea. There are a few different options for that type of trek, but this one looks pretty representative of what you get.

The whole trip looks very exciting, and as a bonus I get to pretend I’m in the Last Crusade.

Other next trips

I also plan on doing Stonehenge this year, provided things get better in time. I know a few people who may want to come with me, and I have friends in Bristol so I can catch up with them at the same time.

Beyond that I might try to fit in a city break in Berlin to see my sister. A few people I know rave about the city so I’d like to see what it is that makes it so special, and obviously I want to spend some time with my sister after not being able to see her for so long.

Grand plans for the future

For a long time I’ve had a crazy adventure in mind. I want to start somewhere in South America, maybe Brazil, and travel north. Start off with something like an adventure tour in the Amazon, maybe one where you canoe down the river for a week, and see some of Rio. Then travel through different countries to see places like Machu Picchu, Chichen Itza, and the Paricutin volcano, eventually entering the US. From there I’d maybe rent a car and spend a couple of months visiting different cities in the US and seeing places like Death Valley and the Grand Canyon, before finally heading north to Canada where I’d see the Northern Lights.

This is probably never going to happen, and may well end up being broken down into several different holidays, but it’s definitely a nice dream to have.

Why plans for the future matter

I think it’s important to have these types of plans, they give you something to look forward to and something to strive for. These things are especially important in the strange times we live in now, but they were also important before and will continue to matter when the world gets back to normal. I know that when work gets bad (as it does sometimes for all of us) and I feel like I’m not making any difference, I can remind myself that even if I’m not achieving stuff at work I am getting myself closer to my next epic adventure.

TSQLTuesday #132: How are you coping with the pandemic?

This image has an empty alt attribute; its file name is tsqltues.png

This is my 5th TSQLTuesday, and it’s a little rushed because I only saw the invite this morning.

This isn’t the usual type of blog post that I write. I’m not sure if I have much in the way of a readership out there, but if I do they are probably used to technical articles, or some long post about best practices with lots of code examples sprinkled in. This, obviously, won’t be one of those. It may also meander a little bit, but bear with me, it will all come together in the end.

Travel

Me in Washington DC

I like to travel, and I’ve not done it nearly enough over the last few years. When I was younger I found myself flying to different parts of Europe quite a bit. A former partner was from Luxembourg so I flew there several times to visit her. A friend I made at Uni was from Switzerland, so I flew out there once, and we visited Germany while I was there. Another former partner’s parents had a holiday home in France so we went there a few times. I also ended up in the US a couple of times, once in Florida for Disneyland, and once in Washington DC for my sister’s wedding. I got pretty comfortable on planes from these trips, and came to like airports and long flights, and travel in general.

This is fine…

I also had a few more epic adventures. I spent a couple of months in Southern Africa, living on a nature reserve and doing a bit of backpacking. I also got to white water raft on the Zambezi river below Victoria Falls.

#Epic

I didn’t do another epic adventure for a while, but 5 years ago I went on a 4 week trip to Vietnam, with a few days in Bangkok at the start. It was pretty epic, and I got to see some amazing things, including Ha Long Bay which remains one of the most beautiful places I’ve ever been. I came back from that holiday determined to do more epic travelling, which leads us to the crucial bit of the story…

Everest

I’d settled on my next destination pretty early. I wanted to do the trek to Everest base camp. It isn’t cheap, and needs a lot of fitness, and for various reasons I didn’t make it happen for a few years. Finally, last year, I had the money and the fitness and I booked it. I made sure I had all of the equipment, and I was confident I was fit enough (I ran a half marathon a couple of weeks before going).

We walked across those bridges.

The trip itself was awesome, but very hard at the same time. The days were long, the nights were cold, and the food was dull. But, the scenery was amazing, the people were amazing, and the whole thing felt like I was stepping into another world.

I loved the trek, and I loved the feeling of accomplishment that came with every day. Every so often we would pass markers that said how many hours and kilometers we were from base camp, and I think everyone got more and more excited as we got closer.

Then I got ill. It came on really suddenly. I was pretty much ok at the start of the day, and could barely walk by the end of it. The next morning my fingertips and lips had turned blue, and they had to call a helicopter for me. I spent 3 days in hospital in Kathmandu recovering. It turned out I had what’s called a High Altitude Pulmonary Edema. This is where the decreased oxygen levels cause the heart to beat faster to move oxygen around the body faster, but the increased blood pressure causes fluid to leak from the blood vessels into the lungs further reducing the oxygen levels. It’s fatal if you don’t get to lower altitude in time, but once you do you’re basically going to be ok.

PTSD

For some time after this I thought I was fine. Physically I felt a bit beaten up, but nothing a bit of heathy eating and exercise wouldn’t get me through. It wasn’t until the increased stresses from work started to pile up that I started to really struggle. The tipping point was when I saw a video of someone with an oxygen monitor on their finger. I basically became completely useless for days after that, and had to book some time off work to get through it. I spoke to doctors, and ended up with a PTSD diagnosis, a course of therapy, and some time off work.

I fought my way back to health through a combination of therapy and routines. I’m not going to detail the therapy here, but it forced me to confront what had happened, and in doing that my brain processed it and moved on. The routines I put in place involved waking up at particular times, planning my week pretty rigidly on a Sunday, making sure I got some exercise, going for walks, yoga, a big variety of things.

When it came time to go back to work, I felt like I was ready. I think I needed the structure work provided. I had some conversations with my boss, and we agreed that I would only work my contracted hours, and take a 1.5 hour lunch break. This meant 8-5 with lunch usually starting at 12 and finishing at 1.30. This was all to make sure I had time to keep up the routines and coping mechanisms I had in place.

This hasn’t lasted particularly well. Last week I had to work late most days, which threw my evening plans out and has disrupted my routine a lot. I’ve spoken to my manager a few times, and we are taking steps to fix the issues. One thing we agreed today is I will do 9-6 instead, as that way emergencies late in the day won’t force me to go over my hours. This might even be better in some ways, it may give me some time in the morning to work on this blog or do some other personal project.

Is there a point to this story?

I’m not entirely sure. I wanted to share something about my lockdown experience, and this kind of dominates everything about not just lockdown but the last year for me. I certainly feel like I’ve come out of the whole thing with a better understanding of myself. I know my mental limits a bit better, and understand what I can do to keep myself mentally fit a lot better than I did. I’ve also experienced failure, on a scale that I don’t think I had before, and I’m still working on coming to terms with the idea that it wasn’t my fault, and that I failed despite doing everything right.

I guess the message for people reading this is to make sure you prioritise self-care. If you start to feel anxious every day, if you can’t focus, if you dread starting work, these are signs something is wrong. You don’t necessarily have to change jobs to fix it (right now I think we’re all pretty lucky if we have jobs) but maybe change your relationship to your job in some ways. Set some boundaries, and experiment a bit with what you can do outside of work to let that stress go. I recommend walking and yoga.

Avoiding coding traps

Brent Ozar put up a post last week called If Your Trigger Uses UPDATE(), It’s Probably Broken. In it he talks about the UPDATE() function and how it can cause unexpected behaviour in your code.

For anyone not aware, UPDATE() is a function that can be used in triggers, and which returns TRUE if an INSERT or UPDATE action happens on the column passed into the function. So, something like this:

CREATE TABLE dbo.Primarch
    (
          PrimarchID INT IDENTITY(1, 1) NOT NULL
        , PrmarchName VARCHAR(200) NOT NULL
        , ChatpterName VARCHAR(200) NULL
        , LastKnownStatus VARCHAR(100) NOT NULL
        , LastKnownLocation VARCHAR(100) NULL
        , YearsAtLastKnownLocation VARCHAR(100) NULL
        , NumberOfSuccessorChapters INT NOT NULL
    );
GO

CREATE TRIGGER Primarch_Update
    ON dbo.Primarch
    AFTER UPDATE
AS
    IF UPDATE(LastKnownLocation)
        UPDATE Prim
        SET
             YearsAtLastKnownLocation = 0
        FROM dbo.Primarch AS Prim
        INNER JOIN inserted AS ins
            ON Prim.PrimarchID = ins.PrimarchID

The idea here is to set the YearsAnLastKnownLocation to 0 if the LastKnownLocation is changed, with the logic being we are receiving real-time information on the last known location of the primarchs (this example is a bit contrived but go with it). However, if you want to run a full update on a primarch you might have a bit of code like this:

CREATE PROCEDURE dbo.UpdatePrimarchInformation
    (
          @PrimarchID INT
        , @LastKnownStatus VARCHAR(100)
        , @LastKnownLocation VARCHAR(100)
        , @NumberOfSuccessorChapters INT
    )
AS
    UPDATE Prim
    SET
          LastKnownStatus = @LastKnownStatus
        , LastKnownLocation = @LastKnownLocation
        , NumberOfSuccessorChapters = @NumberOfSuccessorChapters
    FROM dbo.Primarch AS Prim
    WHERE 1 = 1
        AND Prim.PrimarchID = @PrimarchID

Now, when you run this and input the current LastKnownLocation and LastKnownStatus, and a new NumberOfSuccessorChapters, you still trigger the update of YearsAtLastKnownLocation to 0.

The exact reasons why aren’t important, and Brent does a typically great job of explaining the issue on his blog. The reason for this post is I wanted to look at these types of errors a bit more, and a coding principal I like to adopt that helps to avoid them.

Coding traps, not bugs

Brent calls this a bug at one point, and while I agree with him that it’s bad practice to code like this I wouldn’t call it a bug myself. The reasoning behind that is without the second bit of code (the stored proc) the application functions as it was intended to. It’s not until someone codes in something that might update LastKnownLocation without changing the value that you start to have undesirable behaviour in the app.

It is, however, absolutely a trap for any future developer working on the app. The stored proc above is an entirely reasonable bit of code to write, most developers I know won’t check every table their code hits for triggers, and similarly won’t read every bit of documentation on a table before writing a simple bit of code against it.

Some people might say that testing should save you, and in theory it should but things do get missed in testing, and that’s assuming you’re working somewhere that encourages testing in the first place. And in any case, even if it does get caught in testing, that’s another bit of work for the dev to do to figure out why this update is causing the YearsAtLastKnownLocation column to always set to 0.

That’s extra time to do the work, extra stress on the dev, and all because of a bit of code in the trigger that could be written better.

Coding to the existing code

So, this is the principal I want to talk about, that I try to work to as much as I can. Essentially it means you code to what is possible within the existing code base, not what people tell you the existing code base will be used for.

One great example of this is always coding around NULLs if they are allowed in a column. It doesn’t matter if the business assures you there will never be a NULL in that column, you should always assume there will be. As an example, the following code assumes the LastKnownLocation will never be NULL:

SELECT
      Prim.PrmarchName 
        + ', primarch of the ' 
        + Prim.ChatpterName 
        + ' chapter. Current status: ' 
        + Prim.LastKnownStatus 
        + '. Last seen at ' 
        + Prim.LastKnownLocation 
        + ',  ' 
        + CAST(Prim.YearsAtLastKnownLocation AS VARCHAR(100)) 
        + ' years ago.'
FROM dbo.Primarch AS Prim
WHERE 1 = 1
    AND Prim.PrimarchID = @PrimarchID

If the LastKnownLocation is NULL, because we don’t have a last known location for the Primarch, the whole string returns as NULL. You may have a guarantee from the project manager, or the business sponsor, that we will always have a value for LastKnownLocation, but the code says something different. If you follow the business steer in these circumstances, and code as if the column will never be NULL, you are introducing a trap for developers working on this software in the future. One of them will see the column allows NULLs and will write some code that can add a NULL into the column, and then this bit of display code somewhere else in the app will break.

Another example that may be familiar to people who’ve written a lot of ETL processes is when you come to create an import table. In a lot of cases you are importing from files that don’t necessarily remain the same every time, so you have a process that reads the column headers from the file and creates a table to hold them all, something a bit like this:

DECLARE @ColumnList AS TABLE
    (
          ColumnName VARCHAR(255)
        , OrdinalPosition SMALLINT
    );
INSERT INTO @ColumnList
    (
          ColumnName
        , OrdinalPosition
    )
VALUES
      ('PrmarchName', 1)
    , ('ChatpterName', 2)
    , ('LastKnownStatus', 3)
    , ('LastKnownLocation', 4)
    , ('YearsAtLastKnownLocation', 5)
    , ('NumberOfSuccessorChapters', 6);

DECLARE
      @Counter AS INT
    , @SQLScript AS NVARCHAR(MAX) = N'CREATE TABLE dbo.PrimarchImport(RowNumber INT IDENTITY(1, 1) NOT NULL';

SELECT @Counter = MIN(OrdinalPosition) FROM @ColumnList;

WHILE @Counter <= (SELECT MAX(OrdinalPosition) FROM @ColumnList)
BEGIN
    SELECT
          @SQLScript = @SQLScript + N', ' + ColLst.ColumnName + N' NVARCHAR(MAX) NULL'
    FROM @ColumnList AS ColLst
    WHERE 1 = 1
        AND ColLst.OrdinalPosition = @Counter;

    SET @Counter = @Counter + 1;
END

SET @SQLScript = @SQLScript + N');';

EXEC sp_executesql
      @stmt = @SQLScript;

This is a bit of a cut down version as I’m just adding the column names direct to the @ColumnList table variable rather than generating them with C# code or something else, but it works for our purposes. This particular block of code will run just fine, but what happens when we get a column name in the source file with a space, or a column, or anything else that will cause us issues with our create table script? And again, we may be promised that this will never happen, but why take the risk? All we need to do to fix it is escape the column names in our dynamic SQL by wrapping them in [].

In both of these examples the application may work as expected for years, or even forever. But it may not, and if it it does break the root cause of the problem will be you not coding for a scenario the existing code told you was possible.

Another fun thing with NULLs

I saw this on a post at SQLServerCentral and wanted to share.

The poster had a CASE statement and was wondering why it didn’t work as expected. Essentially they were doing something like:

CASE WHEN @a = @b OR (@a IS NULL AND @b IS NULL) THEN 1 ELSE 0

CASE WHEN NOT(@a = @b OR (@a IS NULL AND @b IS NULL)) THEN 1 ELSE 0

And they wanted to know why both were returning 0 when @a or @b were set to NULL. The issue here is that any normal predicate involving NULL returns an unknown. They had tried to compensate with the OR, which got them the result they wanted in the first statement, but didn’t understand why it did that.

To fully understand I want to break down what each bit returns here, assuming @a IS NULL and @b is some value

First, @a = @b returns UNKNOWN, represented in T-SQL as NULL. This is because you are comparing unknown (@a) to an actual value (@b), and the SQL engine can’t say whether that is TRUE or FALSE.

Second, @a IS NULL returns TRUE, and @B IS NULL returns FALSE. Put together, @a IS NULL AND @b IS NULL returns FALSE as an AND requires both parts to be true to return TRUE. If you replace the AND with an OR it would return TRUE, as OR only requires one part to be true.

So, what we have here is UNKNOWN OR (TRUE AND FALSE). This simplifies to UNKNOWN OR FALSE when we solve the bracket. And, logically, that has to simplify to UNKNOWN. After all, what we are asking here is “are either this unknown result or this false result true?” The false result is obviously not true, but the unknown result might be.

This brings us to the essential issue in the original question, which is how an expression and the reverse of that expression could both return the same value. Taking the first CASE statement, that is saying when the expression is TRUE return 1, else return 0. The expression returns UNKNOWN, but the CASE statement can’t just say “I don’t know”, it has to return 1 or 0, and with the way predicates work in SQL Server it will only return 1 if the expression returns TRUE.

Taking the second CASE statement, that is saying when the reverse of the expression is TRUE return 1, else return 0. But the expression returns UNKNOWN, so what’s the reverse. It’s helpful here to think of UNKNOWN as more of a placeholder than a proper response. The logic will always behave as if there is a TRUE or FALSE response that would be returned if we had all of the facts. Therefore, what it wants to return is the reverse of that TRUE or FALSE, but it doesn’t know which is the correct one, so the NOT(expression) also returns UNKNOWN when the inner expression returns UNKNOWN. This obviously leads to the CASE statement returning 0 again, as it only returns 1 when the predicate returns TRUE.

An alternative way of thinking about it is to imagine someone asks you if my eyes are brown. Clearly there is an answer to that question, but you as a reader of this blog who’s never met me won’t know what it is. Now what if someone asks you if my eyes are not brown? Again, the answer is “I don’t know.” Now what if the question is “do any of these 10 people have brown eyes?” At this point we’re getting into the territory of IN/NOT IN predicates, which are another area where NULLs cause issues. If I’m one of those 10 people, then even if you know all of the other 9 don’t have brown eyes, you can’t give a definite yes or no to that question, and therefore can’t give a definite yes or no to the reverse either. Then, because in T-SQL (as in most coding languages) predicates only trigger on a definitive yes/TRUE, the predicate logic will behave as if the expression returned a FALSE result, but for the reasons outlined in this post it’s important to know that it actually returned UNKNOWN.