This is just a quick post, about a development principal that I think applies across pretty much all development languages.
Always believe the code over what someone else tells you.
A good example in databases is NULLable columns. You will sometimes get the situation where you are selecting from a NULLable column, but someone in the business tells you there can’t ever be NULLs in there because of how the application works. I get very skeptical of this, because users can always find a way to make applications do strange things, because applications change over time and just because it’s impossible now doesn’t mean it always has been, and because it’s always possible to change the data in other ways that bypass the particual application being discussed (manual data fix, for exmple).
The first thing to do in that scenario is check if there are any NULLs in there, because if there are you can just go back with the evidence. However, sometimes there won’t be, and sometimes you’ll even run into people who will double down on their assertion that there can’t be NULLs in that column despite the evidence. In that case you have two options: either you code to account for the possibility of NULLs anyway, which may involve making some assumptions about what the business wants to replace them with; or you code as if the column is not NULLable, but if you do that I strongly advise getting the assurances that the data will never be NULL in writing.
Which route you end up going down is very much down to your situation at the company, the seniority of the person telling you to not code for NULLs, how serious it would be if there are NULLs, and a whole load of other factors that fall under the broad heading of ‘office politics’ (and just typing that makes me feel a bit dirty inside).
It’s also important to note that this won’t just be limited to NULLability, that’s just a convenient example. I had a case once where we had a complaint that related to one of our core processes. I investigated and found a fairly serious flaw that would have major implcations for a lot of the work we had done for this client. I took my findings to the person pushing this investigation, who was also the CEO. He flat our refused to believe what I was saying and insisted the problem had to be elsewhere. In this case, I stuck to my assertion that this was what the code was doing, and ended up being pulled off the investigation and replaced with a developer who was more likely to go along with the CEO (that developer privately admitted to me that the process was known to not work correctly). It wasn’t the ideal result for the company, but it didn’t do my standing in the company any real harm, and meant I was kept away from any fallout that might ensue from the investigation.
In this case the biggest factor in me sticking to my guns was the seriousness of the issue. I definitely didn’t want anyone to be able to say I gave bad advice on this, or misrepresented the code, and a small loss of standing with the CEO was a price I was willing to pay to either get the right result or get me off the project.
This is my 11th TSQLTuesday, and this month Damien Jones has given the prompt: “What’s on your wishlist for the partnership between Microsoft and OpenAI?”
This is probably going to be quite a short blog, both because I’m pressed for time today, and also because unlike most people I haven’t spent much time looking at the new shiny AI stuff that’s come out recently. There was someone at my company who gave a talk a few weeks ago about how he sees AI being able to write the simpler code that we currently end up repeating over and over again (as a BI developer, merging into a slowly changing dimension springs to mind here), and I think there is something to talk about there, but that’s not what I’m going to talk about today.
Instead, the main hope I have for the Microsoft/OpenAI partnership is that it will produce something to make ingesting files easier.
I have spent a lot of my career writing ETL processes to ingest data into a data warehouse, usually via some sort of staging layer. Inevitably, some of that data will come from files rather than other databases, and that is a problem. The reasons I’ve needed to import file data have varied, sometimes my employer dealt with third parties and they were only willing to send the data in file format. Sometimes we needed ways for other people in the business to manage some of the data, and for whatever reason nobody was willing/able to build a proper master data system. One company’s entire business model was managing data for other companies, so we needed to ingest data from both our clients and the third parties they had contracts with. The one thing all of these scenarios had in common was that the files did not remain consistent. By this, I mean everything could change. Some examples off the top of my head:
- Header text could change
- Column order could change
- Data types in the same column could change
- Headers could be on the wrong columns (always fun)
- New columns could appear
- The number of rows before the data started could change
- Values might be wrapped in “” one month and not the next
- In Excel, sheet names could change
- In Excel, region names could change if we were picking only the data in a particular region
- I even had one case where row delimiters would sometimes change
Needless to say, this is always extremely frustrating to deal with. I’ve come up with various ways to mitigate some of these issues over the years, usually involving something like checking inside the file, building a basic import table based off that, and dynamically building an import and cleanse process that would hopefully produce the columns we needed at the end of it.
This really feels like something an AI might be able to do, from my largely unqualified vantage point. The import is largely repetitive, but with some changes every time, so maybe you would need to train it on a set of files you have already, but I feel like it could learn the particular quirks unique to a data supplier and the best ways of dealing with them when they happen. I feel like I’m being quite unambitious here, when talking about a tech that might change the world, but I’ve been at a company where the the files they were receiving were so numerous, and so important to the business, that they employed half a dozen contractors just to write and troubleshoot file imports.
This is a little quirk of SQL Server that I ran into a while ago.
We’ve probably all seen or used an INSERT... INTO
script before, but just in case:
An INSERT... INTO
script typically looks something like
SELECT
*
INTO #Test1
FROM dbo.Test1;
And it will create a new table (#Test1) from the data selected in the SELECT
statement, with the column definitions being pulled from the columns in the SELECT
in the same way that the engine defines data types etc. for a view.
But what happens when one of the columns being selected is an identity column. To answer that, let’s run a quick test:
DROP TABLE IF EXISTS #Test1;
DROP TABLE IF EXISTS dbo.Test2;
DROP TABLE IF EXISTS dbo.Test3;
CREATE TABLE #Test1
(
Test1ID INT IDENTITY(1, 1) NOT NULL
, SomeData VARCHAR(50) NOT NULL
);
INSERT INTO #Test1
(
SomeData
)
VALUES
('Some data')
, ('Some more data')
, ('Even more data')
, ('And a bit more data');
SELECT
*
INTO dbo.Test2
FROM #Test1;
SELECT
Test1ID
INTO dbo.Test3
FROM #Test1;
What we’re doing here is creating a temp table (#Test1) with an identity column, then running a couple of SELECT... INTO
statements that include that identity column to create new permanent tables in the dbo schema (Test2 and Test3). The reason I’m creating permanent tables is it make it easier for us to look at their definitions, which are:
CREATE TABLE [dbo].[Test2](
[Test1ID] [int] IDENTITY(1,1) NOT NULL,
[SomeData] [varchar](50) NOT NULL
);
CREATE TABLE [dbo].[Test3](
[Test1ID] [int] IDENTITY(1,1) NOT NULL
);
So, we can see in this case the SELECT... INTO
has created the Test1ID column as an IDENTITY
column, the same as the source. The next question is what happens when the IDENTITY
column exists at a different grain to the results?
DROP TABLE IF EXISTS #Test1;
DROP TABLE IF EXISTS #Test2;
DROP TABLE IF EXISTS dbo.Test3;
CREATE TABLE #Test1
(
Test1ID INT IDENTITY(1, 1) NOT NULL
, SomeData VARCHAR(50) NOT NULL
);
CREATE TABLE #Test2
(
Test2ID INT IDENTITY(1, 1) NOT NULL
, Test1ID INT NOT NULL
, SomeOtherData VARCHAR(50) NOT NULL
);
SET IDENTITY_INSERT #Test1 ON;
INSERT INTO #Test1
(
Test1ID
, SomeData
)
VALUES
(1, 'Some data')
, (2, 'Some more data')
, (3, 'Even more data')
, (4, 'And a bit more data');
SET IDENTITY_INSERT #Test1 OFF;
INSERT INTO #Test2
(
Test1ID
, SomeOtherData
)
VALUES
(1, 'DataData')
, (1, 'DataDtaData')
, (2, 'DataDataDataData')
, (2, 'Too much Data')
, (3, 'Not enough data')
, (4, 'I also like data')
, (4, 'Feed me data')
, (4, 'I hunger for data');
SELECT
T1.Test1ID
, T1.SomeData
, T2.Test2ID
, T2.SomeOtherData
INTO dbo.Test3
FROM #Test1 AS T1
INNER JOIN #Test2 AS T2
ON T1.Test1ID = T2.Test1ID;
So in this case we have 2 tables joined, both have IDENTITY
columns that essentially make up the primary key, but the result is:
CREATE TABLE [dbo].[Test3](
[Test1ID] [int] NOT NULL,
[SomeData] [varchar](50) NOT NULL,
[Test2ID] [int] NOT NULL,
[SomeOtherData] [varchar](50) NOT NULL
);
So no IDENTITY column in the newly created table. In this case the engine can’t see which of these can be IDENTITY
columns. But what if we feed it a bit more information.
DROP TABLE IF EXISTS dbo.Test2;
DROP TABLE IF EXISTS dbo.Test1;
DROP TABLE IF EXISTS dbo.Test3;
DROP TABLE IF EXISTS dbo.Test4;
CREATE TABLE dbo.Test1
(
Test1ID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED
, SomeData VARCHAR(50) NOT NULL
);
CREATE TABLE dbo.Test2
(
Test2ID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED
, Test1ID INT NOT NULL
, SomeOtherData VARCHAR(50) NOT NULL
, CONSTRAINT FK_Test2_Test1ID
FOREIGN KEY (Test1ID)
REFERENCES dbo.Test1(Test1ID)
);
SET IDENTITY_INSERT dbo.Test1 ON;
INSERT INTO dbo.Test1
(
Test1ID
, SomeData
)
VALUES
(1, 'Some data')
, (2, 'Some more data')
, (3, 'Even more data')
, (4, 'And a bit more data');
SET IDENTITY_INSERT dbo.Test1 OFF;
INSERT INTO dbo.Test2
(
Test1ID
, SomeOtherData
)
VALUES
(1, 'DataData')
, (1, 'DataDtaData')
, (2, 'DataDataDataData')
, (2, 'Too much Data')
, (3, 'Not enough data')
, (4, 'I also like data')
, (4, 'Feed me data')
, (4, 'I hunger for data')
SELECT
T1.Test1ID
, T1.SomeData
, T2.Test1ID AS FKTest1ID
, T2.Test2ID
, T2.SomeOtherData
INTO dbo.Test3
FROM dbo.Test2 AS T2
LEFT JOIN dbo.Test1 AS T1
ON T1.Test1ID = T2.Test1ID;
SELECT
Test2ID
, SomeOtherData
INTO dbo.Test4
FROM dbo.Test2;
So here I’ve made it explicit that Test2ID is the primary key of dbo.Test2, added in a foreign key to dbo.Test1 to also make it explicit that the join won’t be introducing any duplicate rows, and even turned the join into a LEFT JOIN
to make explicit to the engine that we’re returning every row from dbo.Test2, but the answer is still:
CREATE TABLE [dbo].[Test3](
[Test1ID] [int] NOT NULL,
[SomeData] [varchar](50) NOT NULL,
[Test2ID] [int] NOT NULL,
[SomeOtherData] [varchar](50) NOT NULL
);
CREATE TABLE [dbo].[Test4](
[Test2ID] [int] IDENTITY(1,1) NOT NULL,
[SomeOtherData] [varchar](50) NOT NULL
);
So we can see from dbo.Test4 that we can get Test2ID to be recognised as an IDENTITY
column but only if we don’t join to anything else.
This is mainly a bit of fun for me, going down a bit of a rabbit hole to see what’s at the end, but it does have some real world implications. If you’re using SELECT... INTO
anywhere in your scripts then you do need to be aware of when it will and won’t create IDENTITY
columns and account for that in future interactions with the created table. I became aware of this because I was editing an existing script and needed to add new data to a table created in this way, and I got errors because I was trying to insert into an IDENTITY
column, so there are problems it can cause. You also need to be aware that if you’ve accounted for the IDENTITY
column in the created table, but subsequently add a join into your SELECT
, that column will no longer be created as an IDENTITY
and you may need to do some editing downstream.
In general I don’t particularly like SELECT... INTO
, I feel that I lose some control as a developer when I use it rather than explicitly creating the table, and I lose some of the self-documenting nature of the code, and this just gives me another reason to avoid it. But I also know that at a minimum I will end up debugging someone else’s code that uses it in the future, so it’s a good idea to know about these quirks.
Just a quick post today, to highlight one of the quirks of SSDT database projects.
Usually when you hit publish, or publish through your automated pipeline, or generate a script, or do whatever else you do to push the project out onto a server the process is pretty simple. There’s some sort of comparison engine that runs under the hood, and it does some technical wizardry to detect any differences between your project and the database you’re deploying to. It generates a script based on those changes, and runs that script against the database to bring it in line with the project. Simple enough, right?
Mostly, yes. But there are some cases where it isn’t that simple. I’ve blogged before about how you need to write any check constraints in a specific way, and today I’m going to look at another of these quirks.
The scenario
Recently in my job we were looking at tidying up the cases of some description fields in our dimensions. One of the procedures that generates the dimension data was generating a description field as ‘YES’ and ‘NO’, and we felt that ‘Yes’ and ‘No’ were better options. This was all being generated through a case statement (CASE WHEN Column = 1 THEN 'YES' WHEN Column = 0 THEN 'NO' END
) so we thought it would be an easy fix to just change the YES to a Yes etc. We were wrong.
When we came to deploy, we found that the changes weren’t deploying to any of our environments. As they were part of a larger change set, we were really puzzled about why it was just these procedures that weren’t deploying, until we realised that it was because the only difference between the project version and the database version of these procedures was the case of the labels. I did some looking into it, and it turns out this is expected behaviour. I had a look around the publish options, and couldn’t find any ‘detect changes in case’ type option, and the internet suggested the only way to make the publish process work the way we wanted was to update the collation of the database to be case sensitive.
This was definitely a pain, and the answer in the end was to add a comment to the procedures to trick the comparison engine into thinking they’d changed, but the more I thought about it the more it made sense. By setting the database to be case insensitive, we had told SQL Server that we didn’t care about the case of anything in that database. Why, in that case, would the comparison engine waste time altering a proc when the only difference was something we had called out as unimportant?
More generally, I think this speaks to a principal of doing the right thing in the right place. If your database is case insensitive then don’t try to do case sensitive stuff in there. Instead, worry about the case in whatever layer you have that does care about the case of your labels. And if that layer is your database then make it explicit by making the database case sensitive.
As I mentioned in my blog last week, I’ve not posted for a long time on here. In fact, before last week, you had to go back to May 2021 for my last post, so that’s almost a 2 year gap. This gap wasn’t really by choice, but was a result of external factors sweeping in and blowing all of my plans out the window. But before I get to what happened, I want to go through where I was professionally in 2021, to give a bit of context.
2021
So, 2021, the pandemic had been going for a while at this point, and I was generally coping ok. I’d come through some nasty mental health issues in 2020, and was feeling a lot better about a lot of things. I’d managed to blog at least once a week for 3 months in 2020, and had set myself the goal of doing that for 6 months in 2021, as well as regularly contributing to T-SQL Tuesdays. I also applied to present at New Stars of Data, and was accepted, which meant getting some mentoring from the awesome DBA with a beard. The presentation itself went really well, and off the back of it I got accepted to give a quick talk at DataMinutes. I also applied to give the New Stars of Data talk at GroupBy, and got accepted. This was a huge deal, because I’d be putting my work in front of a couple of hundred people, and it really felt like between the conferences and this blog I was taking the first steps to becoming known in the community.
Why this didn’t happen
Obviously, this didn’t happen. Instead I got a call one day from one of my aunts to say my mum had suffered a stroke. Immediately every plan I had was thrown out, personally and professionally. On the personal side, I ended up moving to be close to my mum when she was released back home, so I could help her through the transition back to whatever life she’d have. Professionally I obviously stopped blogging, I cancelled the talk at GroupBy (they were great about it by the way), and ended up still giving the DataMinutes talk (they were also great about it, and were happy for me to either give the talk or not), although I probably shouldn’t have, and I don’t think it was very good (one of the big names in the SQL world was watching and he called out a major mistake).
I also had to take a break from work to deal with the incredible stress all of this put on me. In the early days covid meant nobody could visit in the hospital so the only communication we had with doctors was over the phone, and they didn’t always have the time to explain things. The only communication I had with my mum was via video chat, which needed the nurses to know how to make that happen, and they weren’t always technically savvy (not a criticism, knowing how to use Zoom was not a job requirement and they were doing awesome work under horrible circumstances). When I came back to work, they were starting to bring people back into the office and I was planning to move away for an unknown amount of time, so I ended up changing jobs as part of the move.
Where I am now
Since then, things have been difficult but we’ve all come through. My mum is home, and largely independant, and I’m starting to look at picking up my life from where I left it in 2021. Professionally I feel like I’ve lost the momentum I had in 2021, so that is something I need to work on getting back. That means picking up this blog, so I’ve again set the target of posting at least once a week for 6 months in 2023, as well as trying to do some speaking and maybe picking up some new skills. My post last week had a lot about my goals for the year, and these are all on there.
I also took a holiday at the end of the year, and that helped a lot. I spent almost 2 weeks in Bali, having fun, relaxing by the pool, drinking cocktails out of coconuts, doing yoga and meditation, and generally putting a bit of distance between me and everything that was going on. It gave me a chance to think, and to put things into perspective, and I’ve come back feeling a lot more optimistic about my future.
This isn’t a post for sympathy, just an explanation of why I’ve been away. I don’t think I really have a regular readership, but I wanted to explain for anyone who was curious about why I vanished. I also wanted to remind people that these things happen, and they will knock you off course. All you can really do is pick yourself up at the end of it, see where you are, and start working on making progress again. And when you are in the middle of something like this, remember to look after yourself. It’s something I wasn’t always good at, but if you’re not ok then you can’t help anyone else.
This is my 10th TSQLTuesday, andthis month Deepthi Goguri has given the double prompt to talk about our new favourite features in SQL Server 2022 and our New Year’s resolutions.
Now, for various reasons, I’ve not been posting on this blog for a while. Like, a long while. Like the last post was TSQLTuesday #138. I was going to do a post about why that was, but I haven’t got it written yet and I really want to post on this topic so that might come next week.
Anyway, the same things that kept me away from the blog (and a lot of other things) also stopped me from looking into SQL Server 2022 in any great detail. I did a post in April 2021 about the GREATEST
and LEAST
functions that were present in Azure and were coming to on prem at some point in the future, but apart from that I haven’t really been paying attention. With all of that, I guess the thing I’m looking forward to the most is the thing I don’t know is there yet. It’s always fun to watch talks and read blogs on new features, and think about how you can apply them to the big issues at your workplace and make life easier, so that’s what I’m hoping I’ll get a chance to do over the next few months.
New Year’s goals
Many years ago I stumbled on Brent Ozar’s epic life quest. Reading around that led me to Steve Kamb’s Epic Quest of Awesome, and I decided I wanted to do something similar. I write about this in more detail here, and list all of my goals here, but the basic idea is to make a list of things you want to do in your life, then focus on ticking them off. I really struggle with keeping focus on things for a long period of time, so having something like this allows me to take some time and set down what’s important, and refer back to it later as a guide to what I should be working on.
Some years after I did this I decided I was done with the idea of New Year’s Resolutions, because they always end up being a bit vague, and you get to the end of the year and you can’t really say if you did them, and they just tend to fade from my attention after a bit. I decided that what I wanted to do instead was set myself some concrete, S.M.A.R.T. goals, and to do that I largely pulled from my epic quest, along with a few other things. The last couple of years have been really bad when it came to achieving anything, because of the pandemic and other reasons, but this year I’m feeling a lot better, and I have some catching up to do, so I’ve been ambitious and set myself quite a few goals. The plan is to check in every 3 months and see how I’m doing, and keep all of them in mind as I go through the year. So, goals for 2023 are:
- Weigh under 95kg every day for 3 months. This has been something I’ve tried for a couple of years before now, but I think I’m in a better place to tackle it this year. The idea is if I can get to that weight and maintain it for 3 months it will set some healthier habits in place rather than immediately putting it all back on again.
- Deadlift 200kg. Current tested max is 165kg, but I think I’d be closer to 180kg if I tested it today.
- Deadlift 250kg. This is definitely more ambitious, and probably won’t happen until near the end of the year if at all.
- I’ve entered a novice strongman comp that’s coming up in about a month, so I’ve set myself the challenge of completing every event. That means getting at least one rep on the ‘for reps’ events, hitting all of the weights in the ‘progressive’ events (things like deadlift x weight then y then z, I want to get all the way to z), and completing the carry and pull medley in the time (carry 2 sandbags then pull a sled a set distance in 60 seconds).
- Finish a half marathon. I’ve not done much running the last couple of years, but I’ve entered the Nottingham half, which is the end of September, and if I train right I can definitely complete it.
- Paint a Warhammer 40k combat patrol. This was nearly completed last year, but I didn’t quite make it so I want to finish it off this year.
- Paint a Warhammer 40k incursion force. This will incorporate the combat patrol, but double the size of it. I really enjoy painting these models, so this is about having fun and reducing my ‘to be painted’ pile.
- Paint a gang/team/squad/something else for one of the Games Workshop boxed games. Same as above, painting relaxes me and gives me a feeling of achieving something.
- Related to painting, I had an idea for a simple app to track all the paints I currently own, and what paints I used on each model, to help me repeat existing paint jobs on similar models in the future. I’ve not done app development before, or really anything outside of databases, so I’m thinking I’ll set myself the challenge of building this in Azure, and hopefully learn some stuff along the way.
- Blog every week for 6 months. This blog is really neglected and I need to show it some love.
- Speak at another SQL Server user group. I spoke at the Nottingham user group a couple of times, and was just starting to get accepted to conferences when things got difficult. I’d like to do more speaking, and both the Cardiff and Bristol groups aren’t that far from me so I’m keeping an eye out for when they start up again this year.
- Sort out a group tent for larp. If you don’t know what larp is, this won’t make a lot of sense, but essentially I want to get a tent and decorate it like it’s from a magical fantasy kingdom.
- Book another big holiday. I may not get to go on one this year but I’d like to at least have one lined up for 2024 by the end of the year. I went to Bali over New Year and it was really good for me, and I feel like I need to do those trips more often.
- Have 5 smaller adventures this year. So far I’ve booked a running festival in the summer, and a new larp event, so I need to find another 3. I’m thinking some weekends in London to catch some shows.
Anyway, that is it (apart from one goal I’m keeping secret for personal reasons, but I’ll post if it happens). Like I said, some are off my epic quest and others are just things I thought would be good for me this year, but having the list to refer back to every time I have a spare half a day helps keep me focussed on chasing the things that matter more to me.
This is my 9th TSQLTuesday, and this month Andy Leonard has given the prompt “How Do You Respond When Technology Changes Underneath You?”
I’ve largely worked at places where we’ve used pretty stable products. Despite my best efforts, most of the teams I’ve been part of have been reluctant to adopt much in the way of cutting edge technologies. There are things I could write about like the change in SSIS from 2008 to 2012, and the associated relearning I had to do to keep myself current, but I think I want to talk about something a little different:
SQL Server version upgrades
I’m not 100% sure this is what Andy had in mind with his prompt, but it feels on topic enough that I’m going to go with it anyway.
SQL Server version upgrades are something most developers and DBAs will have to confront at one point or another, and I’ve found myself dealing with a few significant ones in the last few years. All have been from 2008R2, including the one my company is currently in the middle of (I wasn’t kidding when I said they don’t tend to adopt the most cutting edge technologies), and they’ve all gone a similar way. I’m not going into all of the details (one blog post is way too short to write a complete how to on this), but there are a few things I want to share about my experiences.
Data Migration Assistant
The SQL Server Data Migration Assistant is an amazing tool. You download it, point it at a SQL instance, tell it what database(s) you want to upgrade, and tell it the target version. It then tells you a lot of what you need to know in terms of incompatible code, code that might behave differently, code that might perform differently, features marked for deprecation etc. Some of these things are easy fixes, for instance using the no longer valid syntax for calling RAISERROR. Others require a bit more work, like any old style outer joins using the *= syntax, in this case you will need to re-write the statement but it should be pretty obvious how you need to do that. Still others may need some more involved work, for instance one of our upgrades was using the old version of database mail. The fix for that meant setting up and configuring the target to use the up to date version, and changing every proc to call the new system procedures instead of the old.
As you might be gathering, there can be quite a bit of work involved even with the Data Migration Assistant’s help. But that’s only half the story.
Things the Assistant can’t help with
There are some design patterns you might adopt that make these upgrades harder, because they put code in places the migration assistant can’t see. The obvious example here is dynamic SQL, the migration assistant can’t see the code you will eventually generate because it’s not held directly in any object, and therefore can’t check what obsolete syntax might be lurking.
Similar principals apply to any code held in tables, or held in SSIS packages, or generated in scripts in SSIS packages, or run from the application side, and so on. Basically, any T-SQL code you put anywhere other than directly in a database object will cause you some issues here.
Obviously the first bit of advice I’m going to give you is “don’t do that”. If you keep your code inside your database, limit calls from external sources like SSIS or applications to just executing stored procedures, and resist the urge to write any dynamic SQL, you should be fine. But these things do exist for a reason, and can be useful, and as I’ve written about before, I am actually a fan of dynamic SQL when it’s used properly.
If you do choose to use some of these design patterns, you will do yourself a big favour by keeping the numbers to a minimum, and heavily documenting them somewhere. That way, when you come to upgrade you should have a ready list of trouble spots in your code base, and if you’ve kept them to a manageable number you should be able to check their upgrade compatibility fairly simply.
One way to perform those checks, that we are currently experimenting with, is to just generate stored procedures from those dynamic scripts. Run through the different possible paths (which you should have documented), generate a script for each of them, and wrap them in CREATE PROCEDURE
statements. Once you’ve done that, the migration assistant should pick them up the same as anything else. Another option is just to test them thoroughly in your migration test environment. You should ideally have some re-runnable unit and end-to-end tests for this functionality, so do your test migration and run them.
As I said at the start, there’s a lot more to write about SQL Server version upgrades but this post is probably long enough already so I’m leaving it there. Thanks for the prompt, I hope I wasn’t too far off topic.
Today I want to talk a bit about my goals in life, and how I organise them.
Some time ago I cam across the idea of epic life quests. I originally heard about them from Brent Ozar, who had originally heard about them from Steve Kamb, and I liked the idea behind them. I started to work on my own and about 5 years ago I started writing it down. The full list of everything on my quest is here. I realise I’m probably never going to achieve everything on this list, but it helps me keep a certain amount of focus on things that matter to me long-term.
I broke my goals up into various sections, including travel (because I love to travel and have a lot of places I want to see), and hobbies (because it’s important for me to keep doing the things I enjoy and not just working), as well as some work goals.
A couple of years after that, when it was coming close to the end of December, I started to think about what my New Year’s Resolutions would be that year. I was looking at the usual stuff, like lose weight, exercise more, learn something new, when I realised those goals sucked. They weren’t measurable, they never motivated me, they usually couldn’t be broken down into smaller chunks, and largely they were just vague aspirations to be better at some stuff. I decided that year I wasn’t going to do resolutions, I was going to set myself some New Year’s Goals.
I did that, and pulled some of them from my epic life quest and made up a couple more that were just things I needed to get done in the year to stop myself from backsliding, and came up with 6 goals. I hit 3 of them at the end of the year, and although a 50% success rate wasn’t ideal, it was dramatically better than my usual 0% success rate on resolutions. I decided to do that again in the next year, and set myself 10 goals. Unfortunately that was last year, and we all know what a dumpster fire 2020 turned out to be, but I still achieved some of them which was, again, better than nothing.
So, this year I’ve carried that on. So far I’m not doing great but that’s because the New Stars of Data talk happened and interrupted a lot of my planning. I still feel like I’m on course to achieve quite a few of these:
- Tidy up and create some ‘branding’ for my professional presence e.g. this blog, potential talks etc.
- I think the content on this blog is alright, but it could probably use a facelift
- The about me page needs an update, and I need to create a couple more pages as well
- When I update the look of the blog it would be good to update the presentations I give to have a similar look
- Have a painted 40k combat patrol
- One of my nerd hobbies, I’ve always loved the Games Workshop games and now seems like a good time to get back into them as I live just down the road from their HQ.
- Have a painted blood bowl team/underworlds warband
- More Games Workshop stuff
- Blog once a week for 6 months solid
- Last year I did this for 3 months, this year I’m pushing for 6
- Weigh under 95kg every day for 3 months
- Sort out the garden
- Not on my epic quest
- Garden is a state, and I need to properly attack it at some point or I will get in trouble with the letting agency
- Does potentially get me closer to having my ‘garden party’ garden, which is on my quest, but only if I get the full garden party setup before I move
- Earn the Azure Data Engineer certification
- This is almost certainly getting dropped to make room for the New Stars of Data talk
- Finish my novel
- Host a proper Halloween party
- This means everyone dresses up, appropriately spooky decorations and snacks
- Frame and hang all holiday pictures
- I’ve got quite a few paintings etc. from different holidays and I want to get them all framed and hung up around the house
- Not on my epic quest
- Stretch goal if Covid allows – Jordan desert trek
- Not happening this year
- Plan was to trek through the Jordan desert to Petra and the Dead Sea
- New plan is to do a combined Egypt/Jordan holiday next spring when that will hopefully be more practical
- This will tick a few goals off the holiday quests
I came across an interesting quirk of database projects a few days ago. More specifically, a quirk when you come to publish a project that includes check constraints.
I had a few tables with check constraints added to different columns to specify that only particular values could be used in those columns. It looked something like the following:
CREATE TABLE [CheckConstraintExamples].Customer
(
[CustomerID] INT NOT NULL,
[Forename] VARCHAR(50) NOT NULL,
[Surname] VARCHAR(50) NOT NULL,
[Email] VARCHAR(200) NOT NULL,
[CustomerType] VARCHAR(20) NOT NULL,
[CustomerLoyalty] VARCHAR(20) NOT NULL,
[CustomerValue] VARCHAR(20) NOT NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY ([CustomerID]),
CONSTRAINT [CK_Customer_CustomerType] CHECK
(CustomerType IN (
'Basic'
, 'Premium'
, 'Platinum')),
CONSTRAINT [CK_Customer_CustomerLoyalty] CHECK
(CustomerLoyalty IN (
'Low'
, 'Medium'
, 'High'
, 'Lapsed')),
CONSTRAINT [CK_Customer_CustomerValue] CHECK
(CustomerValue IN (
'Low'
, 'Medium'
, 'High'
, 'Superspender'
, 'None'))
);
That all looks ok, and everything published fine, and kept on publishing fine until I happened to check the generated script for some of the publishes (I was in the early stages of development at this point so deployments were just being done through Visual Studio direct to my machine).
When I did I noticed that every deployment I was getting code like this:
GO
PRINT
N'Dropping [CheckConstraintExamples].[CK_Customer_CustomerLoyalty]...';
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
DROP CONSTRAINT [CK_Customer_CustomerLoyalty];
GO
PRINT
N'Dropping [CheckConstraintExamples].[CK_Customer_CustomerType]...';
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
DROP CONSTRAINT [CK_Customer_CustomerType];
GO
PRINT
N'Dropping [CheckConstraintExamples].[CK_Customer_CustomerValue]...';
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
DROP CONSTRAINT [CK_Customer_CustomerValue];
GO
PRINT
N'Creating [CheckConstraintExamples].[CK_Customer_CustomerLoyalty]...';
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH NOCHECK
ADD CONSTRAINT [CK_Customer_CustomerLoyalty]
CHECK (CustomerLoyalty IN (
'Low'
, 'Medium'
, 'High'
, 'Lapsed'));
GO
PRINT
N'Creating [CheckConstraintExamples].[CK_Customer_CustomerType]...';
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH NOCHECK
ADD CONSTRAINT [CK_Customer_CustomerType]
CHECK (CustomerType IN (
'Basic'
, 'Premium'
, 'Platinum'));
GO
PRINT
N'Creating [CheckConstraintExamples].[CK_Customer_CustomerValue]...';
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH NOCHECK
ADD CONSTRAINT [CK_Customer_CustomerValue]
CHECK (CustomerValue IN (
'Low'
, 'Medium'
, 'High'
, 'Superspender'
, 'None'));
GO
PRINT
N'Checking existing data against newly created constraints';
GO
USE [$(DatabaseName)];
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH CHECK
CHECK CONSTRAINT [CK_Customer_CustomerLoyalty];
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH CHECK
CHECK CONSTRAINT [CK_Customer_CustomerType];
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH CHECK
CHECK CONSTRAINT [CK_Customer_CustomerValue];
That’s dropping all my constraints, recreating them using WITH NOCHECK
, and then using WITH CHECK
to check the existing data. Obviously all the data will pass those checks, but that’s going to add some time to my deployments, and as the size of the data and the number of checks both increase, these checks will take more and more time.
So, what’s going on? The answer was pretty simple really, when you create a table with check constraints like the ones I’ve used, SQL Server doesn’t use the IN
when generating the table definition. Instead it breaks that out into a series of OR
predicates, and when you deploy again the deployment process doesn’t recognise that the OR
predicates are the same as the single IN
predicate. Because it thinks the constraints are different, it drops the existing one and creates a new one based on the code in your project. Then the SQL Server engine transforms the IN
predicate in the new constraint into a series of OR
predicates and the cycle begins again.
The only solution is to re-write the CHECK CONSTRAINT
s to use OR
instead of IN
, like this:
CREATE TABLE [CheckConstraintExamples].Customer
(
[CustomerID] INT NOT NULL,
[Forename] VARCHAR(50) NOT NULL,
[Surname] VARCHAR(50) NOT NULL,
[Email] VARCHAR(200) NOT NULL,
[CustomerType] VARCHAR(20) NOT NULL,
[CustomerLoyalty] VARCHAR(20) NOT NULL,
[CustomerValue] VARCHAR(20) NOT NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY ([CustomerID]),
CONSTRAINT [CK_Customer_CustomerType] CHECK
(
CustomerType ='Basic'
OR CustomerType = 'Premium'
OR CustomerType = 'Platinum'
),
CONSTRAINT [CK_Customer_CustomerLoyalty] CHECK
(
CustomerLoyalty = 'Low'
OR CustomerLoyalty = 'Medium'
OR CustomerLoyalty = 'High'
OR CustomerLoyalty = 'Lapsed'
),
CONSTRAINT [CK_Customer_CustomerValue] CHECK
(
CustomerValue = 'Low'
OR CustomerValue = 'Medium'
OR CustomerValue = 'High'
OR CustomerValue = 'Superspender'
OR CustomerValue = 'None'
)
)
Once I’d done that I thought everything would be fine so I generated another deployment script. Turns out there was one more little thing. When the SQL Server engine generated the object definition for those checks, it ordered the different predicates according to where they were in the original IN
predicate, from last to first, which gives definitions like this:
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH CHECK ADD
CONSTRAINT [CK_Customer_CustomerLoyalty] CHECK
((
[CustomerLoyalty]='Lapsed'
OR [CustomerLoyalty]='High'
OR [CustomerLoyalty]='Medium'
OR [CustomerLoyalty]='Low'
))
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH CHECK ADD
CONSTRAINT [CK_Customer_CustomerType] CHECK
((
[CustomerType]='Platinum'
OR [CustomerType]='Premium'
OR [CustomerType]='Basic'
))
GO
ALTER TABLE [CheckConstraintExamples].[Customer]
WITH CHECK ADD
CONSTRAINT [CK_Customer_CustomerValue] CHECK
((
[CustomerValue]='None'
OR [CustomerValue]='Superspender'
OR [CustomerValue]='High'
OR [CustomerValue]='Medium'
OR [CustomerValue]='Low'
))
GO
Whereas if you check my definitions in the previous code block, I’d defined them the other way around. I suspected this meant the deployment engine still thought they were different check constraints, and to test it I deployed my revised project and generated another script. Sure enough, this final script wasn’t dropping and recreating those check constraints any more, and when I regenerated the table definition in management studio, it showed all of the predicates in the order I specified.
I said above that the only way to fix the issue I could see is to re-write the check constraints using a list of OR
predicates. That’s not quite true. The other thing you can do is write your constraints like that to begin with, because you’re aware of the issue, and hopefully this blog means a few more of you will do that and not have to go through the re-write task I now have ahead of me.
So, this is pretty cool. Microsoft have brought GREATEST
and LEAST
functions to SQL Azure, and will be bringing them to on premise versions in a later release (presumably that means the next release).
These work like scalar functions versions of the MIN
and MAX
aggregations, you write something like GREATEST(1, 3, 4, 2, 6, 7, 5, 9, 13)
and it returns the maximum of the values entered (13 in this case). The article gives a few use cases, and there is full documentation for both GREATEST and LEAST in Microsoft Docs.
This is something that’s been wanted for a long time, and in its absence we’ve often resorted to some complicated CASE
statements that we’d probably all like to forget, so I’m really happy it’s there in Azure and coming soon to on prem. Now I just have to convince my company that we should probably migrate of SQL 2008 at some point…