Skip to content

Converting IMAGE to VARBINARY(max) on SQL Server

With Microsoft SQL Server 2005 the IMAGE and (N)TEXT large data types have been deprecated in favor of VARBINARY(MAX) and (N)VARCHAR(max). We have some older database schema at customers who still use the IMAGE type so I am preparing a note to guide the migration.

There is an easy ALTER TABLE command to convert the existing tables, however it is not clear what the impact of this conversion is. So I took a deeper look. I noticed that converting a VARBINARY(max) back to IMAGE caused a lot of I/O and log usage. The other way around this seems to not happen and I can not repeat the problem.

Nevertheless I tried a few scenarios with type and parameter conversion. With IMAGE the default is to not inline data and it is configured with the 'text in row' table option which allows to define a limit of data which is inlined.

The following statement creates 4 tables with different settings for images, two heap tables and two clustered:

DROP TABLE tableHeapWithImageDefault;
DROP TABLE tableHeapWithImageInline;
DROP TABLE tableClusteredWithImageDefault;
DROP TABLE tableClusteredWithImageInline;

CREATE TABLE tableHeapWithImageDefault (cID INT identity(1,1) PRIMARY KEY NOT NULL, cImage IMAGE);
CREATE TABLE tableClusteredWithImageDefault (cID INT identity(1,1) PRIMARY KEY CLUSTERED NOT NULL, cImage IMAGE);

CREATE TABLE tableHeapWithImageInline (cID INT identity(1,1) PRIMARY KEY NOT NULL, cImage IMAGE);
EXEC sp_tableoption 'tableHeapWithImageInline', 'text in row', 'ON';

CREATE TABLE tableClusteredWithImageInline (cID INT identity(1,1) PRIMARY KEY CLUSTERED NOT NULL, cImage IMAGE);
EXEC sp_tableoption 'tableClusteredWithImageInline', 'text in row', 'ON';

If I query the details I see the following:

select name, type_desc, lob_data_space_id, text_in_row_limit, large_value_types_out_of_row from sys.tables where name like 'table%'

name                             type_desc     text_in_row_limit large_value_types_out_of_row
tableClusteredWithImageDefault   USER_TABLE    0                 0
tableClusteredWithImageInline    USER_TABLE    256               0
tableHeapWithImageDefault        USER_TABLE    0                 0
tableHeapWithImageInline         USER_TABLE    256               0

The large_value_type_out_of_row cannot be modified as long as the table does not contain a new large type.

So lets fill the tables with smaller and larger values (total blob size of (36000+180)*50000=1.6GB) for the cImage column and see what space usage the allocation units have:

while @x < 10 BEGIN
  set @x = @x + 1
  select @x
  BEGIN TRANSACTION
  declare @i int = 0;
  while @i < 5000 BEGIN
    INSERT INTO tableHeapWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5)) -- 180
    INSERT INTO tableHeapWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000)) -- 36000
    INSERT INTO tableHeapWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5))
    INSERT INTO tableHeapWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000))
    INSERT INTO tableClusteredWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5)) -- 180
    INSERT INTO tableClusteredWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000)) -- 36000
    INSERT INTO tableClusteredWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5))
    INSERT INTO tableClusteredWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000))
    SET @i = @i + 1
  END
  COMMIT TRANSACTION
END

The following query looks at the three possible types of allocation units (especially IN_ROW_DATA for data stored in the table and LOB_DATA for large external data. The third type, ROW_OVERFLOW_DATA, is not used in this example):

select o.name AS tablename, au.type_desc, CEILING(au.used_pages * 8 /1024.0) as usedMB,
       au.data_pages, au.used_pages, au.total_pages, p.partition_number, p.rows,
       au.used_pages * 8 * 1024 / p.rows as [bytes/row]
FROM sys.allocation_units AS au
  JOIN sys.partitions AS p ON au.container_id = p.hobt_id
  JOIN sys.objects AS o ON p.object_id = o.object_id
WHERE o.name like 'table%';

The result shows as expected where the data is stored (keep in mind the average length for the rows includes long and short rows).

tablename                       type_desc       usedMB    data_pages  used_pages   total_pages  rows  bytes/row
tableHeapWithImageDefault       IN_ROW_DATA          5    582         586          587          100000       48
tableHeapWithImageDefault       LOB_DATA          1778    0           227574       227611       100000    18642
tableHeapWithImageInline        IN_ROW_DATA         16    1961        1971         1971         100000      161
tableHeapWithImageInline        LOB_DATA          1758    0           225003       225035       100000    18432
tableClusteredWithImageDefault  IN_ROW_DATA          5    582         586          587          100000       48
tableClusteredWithImageDefault  LOB_DATA          1778    0           227574       227611       100000    18642
tableClusteredWithImageInline   IN_ROW_DATA         16    1961        1971         1971         100000      161
tableClusteredWithImageInline   LOB_DATA          1758    0           225003       225043       100000    18432

So now we can also record the data pages used for the allocation units so we can verify after conversion that the data has actually not been touched (idea):

select o.name, au.allocation_unit_id, au.type_desc, au.total_pages, au.first_iam_page, au.first_page, au.root_page
from sys.system_internals_allocation_units au
  JOIN sys.system_internals_partitions AS p ON au.container_id = p.partition_id
  JOIN sys.objects AS o ON p.object_id = o.object_id
WHERE o.name like 'table%';

With the following result:

name                           allocation_unit_id type_desc    total   first_iam_page first_page     root_page
tableHeapWithImageDefault      72057594197966848  IN_ROW_DATA  587     0x3DB302000100 0x3CB302000100 0x748707000100
tableHeapWithImageDefault      72057594198032384  LOB_DATA     227611  0x3BB302000100 0x38B302000100 0x38B302000100
tableHeapWithImageInline       72057594198097920  IN_ROW_DATA  1971    0xB91507000100 0x2E8B06000100 0x3FDA00000100
tableHeapWithImageInline       72057594198163456  LOB_DATA     225035  0x478707000100 0x418707000100 0x418707000100
tableClusteredWithImageDefault 72057594198228992  IN_ROW_DATA  587     0x4F8707000100 0x4E8707000100 0x768707000100
tableClusteredWithImageDefault 72057594198294528  LOB_DATA     227611  0x4D8707000100 0x4C8707000100 0x4C8707000100
tableClusteredWithImageInline  72057594198360064  IN_ROW_DATA  1971    0x558707000100 0x548707000100 0x4FDA00000100
tableClusteredWithImageInline  72057594198425600  LOB_DATA     225043  0x578707000100 0x568707000100 0x568707000100

So and now finally we can alter the columns:

SET STATISTICS TIME ON
ALTER TABLE tableHeapWithImageDefault ALTER COLUMN cImage VARBINARY(MAX);
ALTER TABLE tableHeapWithImageInline ALTER COLUMN cImage VARBINARY(MAX);
ALTER TABLE tableClusteredWithImageDefault ALTER COLUMN cImage VARBINARY(MAX);
ALTER TABLE tableClusteredWithImageInline ALTER COLUMN cImage VARBINARY(MAX);
SET STATISTICS TIME OFF

And this returns in elapsed time = 0ms with the following new data layout. What is visible at first is that it has generated empty ROW_OVERFLOW_DATA allocation units for all rows (this might indicate that the conversion could differ if the rows are (nearly) full, which is not the case for our narrow tables in the experiment).

name                           allocation_unit_id type_desc          total first_iam_page first_page     root_page
tableHeapWithImageDefault      72057594197966848  IN_ROW_DATA          587 0x3DB302000100 0x3CB302000100 0x748707000100
tableHeapWithImageDefault      72057594198032384  LOB_DATA          227611 0x3BB302000100 0x38B302000100 0x38B302000100
tableHeapWithImageDefault      72057594198491136  ROW_OVERFLOW_DATA      0 0x000000000000 0x000000000000 0x000000000000
tableHeapWithImageInline       72057594198097920  IN_ROW_DATA         1971 0xB91507000100 0x2E8B06000100 0x3FDA00000100
tableHeapWithImageInline       72057594198163456  LOB_DATA          225035 0x478707000100 0x418707000100 0x418707000100
tableHeapWithImageInline       72057594198556672  ROW_OVERFLOW_DATA      0 0x000000000000 0x000000000000 0x000000000000
tableClusteredWithImageDefault 72057594198228992  IN_ROW_DATA          587 0x4F8707000100 0x4E8707000100 0x768707000100
tableClusteredWithImageDefault 72057594198294528  LOB_DATA          227611 0x4D8707000100 0x4C8707000100 0x4C8707000100
tableClusteredWithImageDefault 72057594198622208  ROW_OVERFLOW_DATA      0 0x000000000000 0x000000000000 0x000000000000
tableClusteredWithImageInline  72057594198360064  IN_ROW_DATA         1971 0x558707000100 0x548707000100 0x4FDA00000100
tableClusteredWithImageInline  72057594198425600  LOB_DATA          225043 0x578707000100 0x568707000100 0x568707000100
tableClusteredWithImageInline  72057594198687744  ROW_OVERFLOW_DATA     0  0x000000000000 0x000000000000 0x000000000000

And on the second glance we notice that neither the AU unit ID nor the page addresses has changed for any of the IN_ROW_DATA or LOB_DATA. So this means the change from IMAGE to VARBINARY(max) is low impact for the tested cases.

When turning on the out_of_row setting for the table(Heap/Clustered)WithDefault tables (which had no inline data before) the situation does not change. The elapsed time = 0ms and the result is unchanged:

SET STATISTICS TIME ON
exec sp_tableoption 'tableHeapWithImageDefault', 'large value types out of row', '1';
exec sp_tableoption 'tableClusteredWithImageDefault', 'large value types out of row', '1';
SET STATISTICS TIME OFF

Results in this (removed overflows):

name                           allocation_unit_id type_desc   total_pages first_iam_page first_page     root_page
tableHeapWithImageDefault      72057594197966848  IN_ROW_DATA 587         0x3DB302000100 0x3CB302000100 0x748707000100
tableHeapWithImageDefault      72057594198032384  LOB_DATA    227611      0x3BB302000100 0x38B302000100 0x38B302000100
tableClusteredWithImageDefault 72057594198228992  IN_ROW_DATA 587         0x4F8707000100 0x4E8707000100 0x768707000100
tableClusteredWithImageDefault 72057594198294528  LOB_DATA    227611      0x4D8707000100 0x4C8707000100 0x4C8707000100

Running the same option (elasped 15ms) change on the previously inlined-enabled tables:

SET STATISTICS TIME ON
exec sp_tableoption 'tableClusteredWithImageInline', 'large value types out of row', '1';
exec sp_tableoption 'tableHeapWithImageInline', 'large value types out of row', '1';
SET STATISTICS TIME OFF

With the unchanged pages:

name                          allocation_unit_id type_desc   total_pages first_iam_page first_page     root_page
tableHeapWithImageInline      72057594198097920  IN_ROW_DATA 1971        0xB91507000100 0x2E8B06000100 0x3FDA00000100
tableHeapWithImageInline      72057594198163456  LOB_DATA    225035      0x478707000100 0x418707000100 0x418707000100
tableClusteredWithImageInline 72057594198360064  IN_ROW_DATA 1971        0x558707000100 0x548707000100 0x4FDA00000100
tableClusteredWithImageInline 72057594198425600  LOB_DATA    225043      0x578707000100 0x568707000100 0x568707000100

So even that change has no impact. According to MSDN the reason for this is that the LOBs are only changed when updated. So this is also quite safe.

The same should be true for the other way around, however I have seen cases, where it took much longer with lots of logfile usage. Maybe you have an idea how I can recreate this scenario?