Converting IMAGE to VARBINARY(max) on SQL Server
With Microsoft SQL Server 2005 the IMAGE and (N)TEXT large data types have been deprecated in favor of VARBINARY(MAX) and (N)VARCHAR(max). We have some older database schema at customers who still use the IMAGE type so I am preparing a note to guide the migration.
There is an easy ALTER TABLE command to convert the existing tables, however it is not clear what the impact of this conversion is. So I took a deeper look. I noticed that converting a VARBINARY(max) back to IMAGE caused a lot of I/O and log usage. The other way around this seems to not happen and I can not repeat the problem.
Nevertheless I tried a few scenarios with type and parameter conversion. With IMAGE the default is to not inline data and it is configured with the 'text in row' table option which allows to define a limit of data which is inlined.
The following statement creates 4 tables with different settings for images, two heap tables and two clustered:
DROP TABLE tableHeapWithImageDefault; DROP TABLE tableHeapWithImageInline; DROP TABLE tableClusteredWithImageDefault; DROP TABLE tableClusteredWithImageInline; CREATE TABLE tableHeapWithImageDefault (cID INT identity(1,1) PRIMARY KEY NOT NULL, cImage IMAGE); CREATE TABLE tableClusteredWithImageDefault (cID INT identity(1,1) PRIMARY KEY CLUSTERED NOT NULL, cImage IMAGE); CREATE TABLE tableHeapWithImageInline (cID INT identity(1,1) PRIMARY KEY NOT NULL, cImage IMAGE); EXEC sp_tableoption 'tableHeapWithImageInline', 'text in row', 'ON'; CREATE TABLE tableClusteredWithImageInline (cID INT identity(1,1) PRIMARY KEY CLUSTERED NOT NULL, cImage IMAGE); EXEC sp_tableoption 'tableClusteredWithImageInline', 'text in row', 'ON';
If I query the details I see the following:
select name, type_desc, lob_data_space_id, text_in_row_limit, large_value_types_out_of_row from sys.tables where name like 'table%' name type_desc text_in_row_limit large_value_types_out_of_row tableClusteredWithImageDefault USER_TABLE 0 0 tableClusteredWithImageInline USER_TABLE 256 0 tableHeapWithImageDefault USER_TABLE 0 0 tableHeapWithImageInline USER_TABLE 256 0
The large_value_type_out_of_row cannot be modified as long as the table does not contain a new large type.
So lets fill the tables with smaller and larger values (total blob size of (36000+180)*50000=1.6GB) for the cImage column and see what space usage the allocation units have:
while @x < 10 BEGIN set @x = @x + 1 select @x BEGIN TRANSACTION declare @i int = 0; while @i < 5000 BEGIN INSERT INTO tableHeapWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5)) -- 180 INSERT INTO tableHeapWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000)) -- 36000 INSERT INTO tableHeapWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5)) INSERT INTO tableHeapWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000)) INSERT INTO tableClusteredWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5)) -- 180 INSERT INTO tableClusteredWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000)) -- 36000 INSERT INTO tableClusteredWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5)) INSERT INTO tableClusteredWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000)) SET @i = @i + 1 END COMMIT TRANSACTION END
The following query looks at the three possible types of allocation units (especially IN_ROW_DATA for data stored in the table and LOB_DATA for large external data. The third type, ROW_OVERFLOW_DATA, is not used in this example):
select o.name AS tablename, au.type_desc, CEILING(au.used_pages * 8 /1024.0) as usedMB, au.data_pages, au.used_pages, au.total_pages, p.partition_number, p.rows, au.used_pages * 8 * 1024 / p.rows as [bytes/row] FROM sys.allocation_units AS au JOIN sys.partitions AS p ON au.container_id = p.hobt_id JOIN sys.objects AS o ON p.object_id = o.object_id WHERE o.name like 'table%';
The result shows as expected where the data is stored (keep in mind the average length for the rows includes long and short rows).
tablename type_desc usedMB data_pages used_pages total_pages rows bytes/row tableHeapWithImageDefault IN_ROW_DATA 5 582 586 587 100000 48 tableHeapWithImageDefault LOB_DATA 1778 0 227574 227611 100000 18642 tableHeapWithImageInline IN_ROW_DATA 16 1961 1971 1971 100000 161 tableHeapWithImageInline LOB_DATA 1758 0 225003 225035 100000 18432 tableClusteredWithImageDefault IN_ROW_DATA 5 582 586 587 100000 48 tableClusteredWithImageDefault LOB_DATA 1778 0 227574 227611 100000 18642 tableClusteredWithImageInline IN_ROW_DATA 16 1961 1971 1971 100000 161 tableClusteredWithImageInline LOB_DATA 1758 0 225003 225043 100000 18432
So now we can also record the data pages used for the allocation units so we can verify after conversion that the data has actually not been touched (idea):
select o.name, au.allocation_unit_id, au.type_desc, au.total_pages, au.first_iam_page, au.first_page, au.root_page from sys.system_internals_allocation_units au JOIN sys.system_internals_partitions AS p ON au.container_id = p.partition_id JOIN sys.objects AS o ON p.object_id = o.object_id WHERE o.name like 'table%';
With the following result:
name allocation_unit_id type_desc total first_iam_page first_page root_page tableHeapWithImageDefault 72057594197966848 IN_ROW_DATA 587 0x3DB302000100 0x3CB302000100 0x748707000100 tableHeapWithImageDefault 72057594198032384 LOB_DATA 227611 0x3BB302000100 0x38B302000100 0x38B302000100 tableHeapWithImageInline 72057594198097920 IN_ROW_DATA 1971 0xB91507000100 0x2E8B06000100 0x3FDA00000100 tableHeapWithImageInline 72057594198163456 LOB_DATA 225035 0x478707000100 0x418707000100 0x418707000100 tableClusteredWithImageDefault 72057594198228992 IN_ROW_DATA 587 0x4F8707000100 0x4E8707000100 0x768707000100 tableClusteredWithImageDefault 72057594198294528 LOB_DATA 227611 0x4D8707000100 0x4C8707000100 0x4C8707000100 tableClusteredWithImageInline 72057594198360064 IN_ROW_DATA 1971 0x558707000100 0x548707000100 0x4FDA00000100 tableClusteredWithImageInline 72057594198425600 LOB_DATA 225043 0x578707000100 0x568707000100 0x568707000100
So and now finally we can alter the columns:
SET STATISTICS TIME ON ALTER TABLE tableHeapWithImageDefault ALTER COLUMN cImage VARBINARY(MAX); ALTER TABLE tableHeapWithImageInline ALTER COLUMN cImage VARBINARY(MAX); ALTER TABLE tableClusteredWithImageDefault ALTER COLUMN cImage VARBINARY(MAX); ALTER TABLE tableClusteredWithImageInline ALTER COLUMN cImage VARBINARY(MAX); SET STATISTICS TIME OFF
And this returns in elapsed time = 0ms with the following new data layout. What is visible at first is that it has generated empty ROW_OVERFLOW_DATA allocation units for all rows (this might indicate that the conversion could differ if the rows are (nearly) full, which is not the case for our narrow tables in the experiment).
name allocation_unit_id type_desc total first_iam_page first_page root_page tableHeapWithImageDefault 72057594197966848 IN_ROW_DATA 587 0x3DB302000100 0x3CB302000100 0x748707000100 tableHeapWithImageDefault 72057594198032384 LOB_DATA 227611 0x3BB302000100 0x38B302000100 0x38B302000100 tableHeapWithImageDefault 72057594198491136 ROW_OVERFLOW_DATA 0 0x000000000000 0x000000000000 0x000000000000 tableHeapWithImageInline 72057594198097920 IN_ROW_DATA 1971 0xB91507000100 0x2E8B06000100 0x3FDA00000100 tableHeapWithImageInline 72057594198163456 LOB_DATA 225035 0x478707000100 0x418707000100 0x418707000100 tableHeapWithImageInline 72057594198556672 ROW_OVERFLOW_DATA 0 0x000000000000 0x000000000000 0x000000000000 tableClusteredWithImageDefault 72057594198228992 IN_ROW_DATA 587 0x4F8707000100 0x4E8707000100 0x768707000100 tableClusteredWithImageDefault 72057594198294528 LOB_DATA 227611 0x4D8707000100 0x4C8707000100 0x4C8707000100 tableClusteredWithImageDefault 72057594198622208 ROW_OVERFLOW_DATA 0 0x000000000000 0x000000000000 0x000000000000 tableClusteredWithImageInline 72057594198360064 IN_ROW_DATA 1971 0x558707000100 0x548707000100 0x4FDA00000100 tableClusteredWithImageInline 72057594198425600 LOB_DATA 225043 0x578707000100 0x568707000100 0x568707000100 tableClusteredWithImageInline 72057594198687744 ROW_OVERFLOW_DATA 0 0x000000000000 0x000000000000 0x000000000000
And on the second glance we notice that neither the AU unit ID nor the page addresses has changed for any of the IN_ROW_DATA or LOB_DATA. So this means the change from IMAGE to VARBINARY(max) is low impact for the tested cases.
When turning on the out_of_row setting for the table(Heap/Clustered)WithDefault tables (which had no inline data before) the situation does not change. The elapsed time = 0ms and the result is unchanged:
SET STATISTICS TIME ON exec sp_tableoption 'tableHeapWithImageDefault', 'large value types out of row', '1'; exec sp_tableoption 'tableClusteredWithImageDefault', 'large value types out of row', '1'; SET STATISTICS TIME OFF
Results in this (removed overflows):
name allocation_unit_id type_desc total_pages first_iam_page first_page root_page tableHeapWithImageDefault 72057594197966848 IN_ROW_DATA 587 0x3DB302000100 0x3CB302000100 0x748707000100 tableHeapWithImageDefault 72057594198032384 LOB_DATA 227611 0x3BB302000100 0x38B302000100 0x38B302000100 tableClusteredWithImageDefault 72057594198228992 IN_ROW_DATA 587 0x4F8707000100 0x4E8707000100 0x768707000100 tableClusteredWithImageDefault 72057594198294528 LOB_DATA 227611 0x4D8707000100 0x4C8707000100 0x4C8707000100
Running the same option (elasped 15ms) change on the previously inlined-enabled tables:
SET STATISTICS TIME ON exec sp_tableoption 'tableClusteredWithImageInline', 'large value types out of row', '1'; exec sp_tableoption 'tableHeapWithImageInline', 'large value types out of row', '1'; SET STATISTICS TIME OFF
With the unchanged pages:
name allocation_unit_id type_desc total_pages first_iam_page first_page root_page tableHeapWithImageInline 72057594198097920 IN_ROW_DATA 1971 0xB91507000100 0x2E8B06000100 0x3FDA00000100 tableHeapWithImageInline 72057594198163456 LOB_DATA 225035 0x478707000100 0x418707000100 0x418707000100 tableClusteredWithImageInline 72057594198360064 IN_ROW_DATA 1971 0x558707000100 0x548707000100 0x4FDA00000100 tableClusteredWithImageInline 72057594198425600 LOB_DATA 225043 0x578707000100 0x568707000100 0x568707000100
So even that change has no impact. According to MSDN the reason for this is that the LOBs are only changed when updated. So this is also quite safe.
The same should be true for the other way around, however I have seen cases, where it took much longer with lots of logfile usage. Maybe you have an idea how I can recreate this scenario?