Skip to content

INSERT INTO with IDENTITY columns

March 11, 2023

This is a little quirk of SQL Server that I ran into a while ago.

We’ve probably all seen or used an INSERT... INTO script before, but just in case:

An INSERT... INTO script typically looks something like

SELECT
      *
INTO #Test1
FROM dbo.Test1;

And it will create a new table (#Test1) from the data selected in the SELECT statement, with the column definitions being pulled from the columns in the SELECT in the same way that the engine defines data types etc. for a view.

But what happens when one of the columns being selected is an identity column. To answer that, let’s run a quick test:

DROP TABLE IF EXISTS #Test1;
DROP TABLE IF EXISTS dbo.Test2;
DROP TABLE IF EXISTS dbo.Test3;

CREATE TABLE #Test1
    (
          Test1ID INT IDENTITY(1, 1) NOT NULL
        , SomeData VARCHAR(50) NOT NULL
    );

INSERT INTO #Test1
    (
          SomeData
    )
VALUES
      ('Some data')
    , ('Some more data')
    , ('Even more data')
    , ('And a bit more data');

SELECT
      *
INTO dbo.Test2
FROM #Test1;

SELECT
      Test1ID
INTO dbo.Test3
FROM #Test1;

What we’re doing here is creating a temp table (#Test1) with an identity column, then running a couple of SELECT... INTO statements that include that identity column to create new permanent tables in the dbo schema (Test2 and Test3). The reason I’m creating permanent tables is it make it easier for us to look at their definitions, which are:

CREATE TABLE [dbo].[Test2](
	[Test1ID] [int] IDENTITY(1,1) NOT NULL,
	[SomeData] [varchar](50) NOT NULL
);

CREATE TABLE [dbo].[Test3](
	[Test1ID] [int] IDENTITY(1,1) NOT NULL
);

So, we can see in this case the SELECT... INTO has created the Test1ID column as an IDENTITY column, the same as the source. The next question is what happens when the IDENTITY column exists at a different grain to the results?

DROP TABLE IF EXISTS #Test1;
DROP TABLE IF EXISTS #Test2;
DROP TABLE IF EXISTS dbo.Test3;

CREATE TABLE #Test1
    (
          Test1ID INT IDENTITY(1, 1) NOT NULL
        , SomeData VARCHAR(50) NOT NULL
    );

CREATE TABLE #Test2
    (
          Test2ID INT IDENTITY(1, 1) NOT NULL
        , Test1ID INT NOT NULL
        , SomeOtherData VARCHAR(50) NOT NULL
    );

SET IDENTITY_INSERT #Test1 ON;

INSERT INTO #Test1
    (
          Test1ID
        , SomeData
    )
VALUES
      (1, 'Some data')
    , (2, 'Some more data')
    , (3, 'Even more data')
    , (4, 'And a bit more data');

SET IDENTITY_INSERT #Test1 OFF;

INSERT INTO #Test2
    (
          Test1ID
        , SomeOtherData
    )
VALUES
      (1, 'DataData')
    , (1, 'DataDtaData')
    , (2, 'DataDataDataData')
    , (2, 'Too much Data')
    , (3, 'Not enough data')
    , (4, 'I also like data')
    , (4, 'Feed me data')
    , (4, 'I hunger for data');

SELECT
      T1.Test1ID
    , T1.SomeData
    , T2.Test2ID
    , T2.SomeOtherData
INTO dbo.Test3
FROM #Test1 AS T1
INNER JOIN #Test2 AS T2
    ON T1.Test1ID = T2.Test1ID;

So in this case we have 2 tables joined, both have IDENTITY columns that essentially make up the primary key, but the result is:

CREATE TABLE [dbo].[Test3](
	[Test1ID] [int] NOT NULL,
	[SomeData] [varchar](50) NOT NULL,
	[Test2ID] [int] NOT NULL,
	[SomeOtherData] [varchar](50) NOT NULL
);

So no IDENTITY column in the newly created table. In this case the engine can’t see which of these can be IDENTITY columns. But what if we feed it a bit more information.

DROP TABLE IF EXISTS dbo.Test2;
DROP TABLE IF EXISTS dbo.Test1;
DROP TABLE IF EXISTS dbo.Test3;
DROP TABLE IF EXISTS dbo.Test4;

CREATE TABLE dbo.Test1
    (
          Test1ID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED
        , SomeData VARCHAR(50) NOT NULL
    );

CREATE TABLE dbo.Test2
    (
          Test2ID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED
        , Test1ID INT NOT NULL
        , SomeOtherData VARCHAR(50) NOT NULL
        , CONSTRAINT FK_Test2_Test1ID
            FOREIGN KEY (Test1ID)
            REFERENCES dbo.Test1(Test1ID)
    );

SET IDENTITY_INSERT dbo.Test1 ON;

INSERT INTO dbo.Test1
    (
          Test1ID
        , SomeData
    )
VALUES
      (1, 'Some data')
    , (2, 'Some more data')
    , (3, 'Even more data')
    , (4, 'And a bit more data');

SET IDENTITY_INSERT dbo.Test1 OFF;

INSERT INTO dbo.Test2
    (
          Test1ID
        , SomeOtherData
    )
VALUES
      (1, 'DataData')
    , (1, 'DataDtaData')
    , (2, 'DataDataDataData')
    , (2, 'Too much Data')
    , (3, 'Not enough data')
    , (4, 'I also like data')
    , (4, 'Feed me data')
    , (4, 'I hunger for data')

SELECT
      T1.Test1ID
    , T1.SomeData
    , T2.Test1ID AS FKTest1ID
    , T2.Test2ID
    , T2.SomeOtherData
INTO dbo.Test3
FROM dbo.Test2 AS T2
LEFT JOIN dbo.Test1 AS T1
    ON T1.Test1ID = T2.Test1ID;

SELECT
      Test2ID
    , SomeOtherData
INTO dbo.Test4
FROM dbo.Test2;

So here I’ve made it explicit that Test2ID is the primary key of dbo.Test2, added in a foreign key to dbo.Test1 to also make it explicit that the join won’t be introducing any duplicate rows, and even turned the join into a LEFT JOIN to make explicit to the engine that we’re returning every row from dbo.Test2, but the answer is still:

CREATE TABLE [dbo].[Test3](
	[Test1ID] [int] NOT NULL,
	[SomeData] [varchar](50) NOT NULL,
	[Test2ID] [int] NOT NULL,
	[SomeOtherData] [varchar](50) NOT NULL
);

CREATE TABLE [dbo].[Test4](
	[Test2ID] [int] IDENTITY(1,1) NOT NULL,
	[SomeOtherData] [varchar](50) NOT NULL
);

So we can see from dbo.Test4 that we can get Test2ID to be recognised as an IDENTITY column but only if we don’t join to anything else.

This is mainly a bit of fun for me, going down a bit of a rabbit hole to see what’s at the end, but it does have some real world implications. If you’re using SELECT... INTO anywhere in your scripts then you do need to be aware of when it will and won’t create IDENTITY columns and account for that in future interactions with the created table. I became aware of this because I was editing an existing script and needed to add new data to a table created in this way, and I got errors because I was trying to insert into an IDENTITY column, so there are problems it can cause. You also need to be aware that if you’ve accounted for the IDENTITY column in the created table, but subsequently add a join into your SELECT, that column will no longer be created as an IDENTITY and you may need to do some editing downstream.

In general I don’t particularly like SELECT... INTO, I feel that I lose some control as a developer when I use it rather than explicitly creating the table, and I lose some of the self-documenting nature of the code, and this just gives me another reason to avoid it. But I also know that at a minimum I will end up debugging someone else’s code that uses it in the future, so it’s a good idea to know about these quirks.

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: