With Microsoft SQL Server 2005 the IMAGE and (N)TEXT large data types have been deprecated in favor of VARBINARY(MAX) and (N)VARCHAR(max). We have some older database schema at customers who still use the IMAGE type so I am preparing a note to guide the migration.
There is an easy ALTER TABLE command to convert the existing tables, however it is not clear what the impact of this conversion is. So I took a deeper look. I noticed that converting a VARBINARY(max) back to IMAGE caused a lot of I/O and log usage. The other way around this seems to not happen and I can not repeat the problem.
Nevertheless I tried a few scenarios with type and parameter conversion. With IMAGE the default is to not inline data and it is configured with the 'text in row' table option which allows to define a limit of data which is inlined.
The following statement creates 4 tables with different settings for images, two heap tables and two clustered:
DROP TABLE tableHeapWithImageDefault;
DROP TABLE tableHeapWithImageInline;
DROP TABLE tableClusteredWithImageDefault;
DROP TABLE tableClusteredWithImageInline;
CREATE TABLE tableHeapWithImageDefault (cID INT identity(1,1) PRIMARY KEY NOT NULL, cImage IMAGE);
CREATE TABLE tableClusteredWithImageDefault (cID INT identity(1,1) PRIMARY KEY CLUSTERED NOT NULL, cImage IMAGE);
CREATE TABLE tableHeapWithImageInline (cID INT identity(1,1) PRIMARY KEY NOT NULL, cImage IMAGE);
EXEC sp_tableoption 'tableHeapWithImageInline', 'text in row', 'ON';
CREATE TABLE tableClusteredWithImageInline (cID INT identity(1,1) PRIMARY KEY CLUSTERED NOT NULL, cImage IMAGE);
EXEC sp_tableoption 'tableClusteredWithImageInline', 'text in row', 'ON';
If I query the details I see the following:
select name, type_desc, lob_data_space_id, text_in_row_limit, large_value_types_out_of_row from sys.tables where name like 'table%'
name type_desc text_in_row_limit large_value_types_out_of_row
tableClusteredWithImageDefault USER_TABLE 0 0
tableClusteredWithImageInline USER_TABLE 256 0
tableHeapWithImageDefault USER_TABLE 0 0
tableHeapWithImageInline USER_TABLE 256 0
The large_value_type_out_of_row cannot be modified as long as the table does not contain a new large type.
So lets fill the tables with smaller and larger values (total blob size of (36000+180)*50000=1.6GB) for the cImage column and see what space usage the allocation units have:
while @x < 10 BEGIN
set @x = @x + 1
select @x
BEGIN TRANSACTION
declare @i int = 0;
while @i < 5000 BEGIN
INSERT INTO tableHeapWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5)) -- 180
INSERT INTO tableHeapWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000)) -- 36000
INSERT INTO tableHeapWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5))
INSERT INTO tableHeapWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000))
INSERT INTO tableClusteredWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5)) -- 180
INSERT INTO tableClusteredWithImageDefault(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000)) -- 36000
INSERT INTO tableClusteredWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),5))
INSERT INTO tableClusteredWithImageInline(cImage) VALUES(replicate(cast(newID() as VARCHAR(max)),1000))
SET @i = @i + 1
END
COMMIT TRANSACTION
END
The following query looks at the three possible types of allocation units (especially IN_ROW_DATA for data stored in the table and LOB_DATA for large external data. The third type, ROW_OVERFLOW_DATA, is not used in this example):
select o.name AS tablename, au.type_desc, CEILING(au.used_pages * 8 /1024.0) as usedMB,
au.data_pages, au.used_pages, au.total_pages, p.partition_number, p.rows,
au.used_pages * 8 * 1024 / p.rows as [bytes/row]
FROM sys.allocation_units AS au
JOIN sys.partitions AS p ON au.container_id = p.hobt_id
JOIN sys.objects AS o ON p.object_id = o.object_id
WHERE o.name like 'table%';
The result shows as expected where the data is stored (keep in mind the average length for the rows includes long and short rows).
tablename type_desc usedMB data_pages used_pages total_pages rows bytes/row
tableHeapWithImageDefault IN_ROW_DATA 5 582 586 587 100000 48
tableHeapWithImageDefault LOB_DATA 1778 0 227574 227611 100000 18642
tableHeapWithImageInline IN_ROW_DATA 16 1961 1971 1971 100000 161
tableHeapWithImageInline LOB_DATA 1758 0 225003 225035 100000 18432
tableClusteredWithImageDefault IN_ROW_DATA 5 582 586 587 100000 48
tableClusteredWithImageDefault LOB_DATA 1778 0 227574 227611 100000 18642
tableClusteredWithImageInline IN_ROW_DATA 16 1961 1971 1971 100000 161
tableClusteredWithImageInline LOB_DATA 1758 0 225003 225043 100000 18432
So now we can also record the data pages used for the allocation units so we can verify after conversion that the data has actually not been touched (idea):
select o.name, au.allocation_unit_id, au.type_desc, au.total_pages, au.first_iam_page, au.first_page, au.root_page
from sys.system_internals_allocation_units au
JOIN sys.system_internals_partitions AS p ON au.container_id = p.partition_id
JOIN sys.objects AS o ON p.object_id = o.object_id
WHERE o.name like 'table%';
With the following result:
name allocation_unit_id type_desc total first_iam_page first_page root_page
tableHeapWithImageDefault 72057594197966848 IN_ROW_DATA 587 0x3DB302000100 0x3CB302000100 0x748707000100
tableHeapWithImageDefault 72057594198032384 LOB_DATA 227611 0x3BB302000100 0x38B302000100 0x38B302000100
tableHeapWithImageInline 72057594198097920 IN_ROW_DATA 1971 0xB91507000100 0x2E8B06000100 0x3FDA00000100
tableHeapWithImageInline 72057594198163456 LOB_DATA 225035 0x478707000100 0x418707000100 0x418707000100
tableClusteredWithImageDefault 72057594198228992 IN_ROW_DATA 587 0x4F8707000100 0x4E8707000100 0x768707000100
tableClusteredWithImageDefault 72057594198294528 LOB_DATA 227611 0x4D8707000100 0x4C8707000100 0x4C8707000100
tableClusteredWithImageInline 72057594198360064 IN_ROW_DATA 1971 0x558707000100 0x548707000100 0x4FDA00000100
tableClusteredWithImageInline 72057594198425600 LOB_DATA 225043 0x578707000100 0x568707000100 0x568707000100
So and now finally we can alter the columns:
SET STATISTICS TIME ON
ALTER TABLE tableHeapWithImageDefault ALTER COLUMN cImage VARBINARY(MAX);
ALTER TABLE tableHeapWithImageInline ALTER COLUMN cImage VARBINARY(MAX);
ALTER TABLE tableClusteredWithImageDefault ALTER COLUMN cImage VARBINARY(MAX);
ALTER TABLE tableClusteredWithImageInline ALTER COLUMN cImage VARBINARY(MAX);
SET STATISTICS TIME OFF
And this returns in elapsed time = 0ms with the following new data layout. What is visible at first is that it has generated empty ROW_OVERFLOW_DATA allocation units for all rows (this might indicate that the conversion could differ if the rows are (nearly) full, which is not the case for our narrow tables in the experiment).
name allocation_unit_id type_desc total first_iam_page first_page root_page
tableHeapWithImageDefault 72057594197966848 IN_ROW_DATA 587 0x3DB302000100 0x3CB302000100 0x748707000100
tableHeapWithImageDefault 72057594198032384 LOB_DATA 227611 0x3BB302000100 0x38B302000100 0x38B302000100
tableHeapWithImageDefault 72057594198491136 ROW_OVERFLOW_DATA 0 0x000000000000 0x000000000000 0x000000000000
tableHeapWithImageInline 72057594198097920 IN_ROW_DATA 1971 0xB91507000100 0x2E8B06000100 0x3FDA00000100
tableHeapWithImageInline 72057594198163456 LOB_DATA 225035 0x478707000100 0x418707000100 0x418707000100
tableHeapWithImageInline 72057594198556672 ROW_OVERFLOW_DATA 0 0x000000000000 0x000000000000 0x000000000000
tableClusteredWithImageDefault 72057594198228992 IN_ROW_DATA 587 0x4F8707000100 0x4E8707000100 0x768707000100
tableClusteredWithImageDefault 72057594198294528 LOB_DATA 227611 0x4D8707000100 0x4C8707000100 0x4C8707000100
tableClusteredWithImageDefault 72057594198622208 ROW_OVERFLOW_DATA 0 0x000000000000 0x000000000000 0x000000000000
tableClusteredWithImageInline 72057594198360064 IN_ROW_DATA 1971 0x558707000100 0x548707000100 0x4FDA00000100
tableClusteredWithImageInline 72057594198425600 LOB_DATA 225043 0x578707000100 0x568707000100 0x568707000100
tableClusteredWithImageInline 72057594198687744 ROW_OVERFLOW_DATA 0 0x000000000000 0x000000000000 0x000000000000
And on the second glance we notice that neither the AU unit ID nor the page addresses has changed for any of the IN_ROW_DATA or LOB_DATA. So this means the change from IMAGE to VARBINARY(max) is low impact for the tested cases.
When turning on the out_of_row setting for the table(Heap/Clustered)WithDefault tables (which had no inline data before) the situation does not change. The elapsed time = 0ms and the result is unchanged:
SET STATISTICS TIME ON
exec sp_tableoption 'tableHeapWithImageDefault', 'large value types out of row', '1';
exec sp_tableoption 'tableClusteredWithImageDefault', 'large value types out of row', '1';
SET STATISTICS TIME OFF
Results in this (removed overflows):
name allocation_unit_id type_desc total_pages first_iam_page first_page root_page
tableHeapWithImageDefault 72057594197966848 IN_ROW_DATA 587 0x3DB302000100 0x3CB302000100 0x748707000100
tableHeapWithImageDefault 72057594198032384 LOB_DATA 227611 0x3BB302000100 0x38B302000100 0x38B302000100
tableClusteredWithImageDefault 72057594198228992 IN_ROW_DATA 587 0x4F8707000100 0x4E8707000100 0x768707000100
tableClusteredWithImageDefault 72057594198294528 LOB_DATA 227611 0x4D8707000100 0x4C8707000100 0x4C8707000100
Running the same option (elasped 15ms) change on the previously inlined-enabled tables:
SET STATISTICS TIME ON
exec sp_tableoption 'tableClusteredWithImageInline', 'large value types out of row', '1';
exec sp_tableoption 'tableHeapWithImageInline', 'large value types out of row', '1';
SET STATISTICS TIME OFF
With the unchanged pages:
name allocation_unit_id type_desc total_pages first_iam_page first_page root_page
tableHeapWithImageInline 72057594198097920 IN_ROW_DATA 1971 0xB91507000100 0x2E8B06000100 0x3FDA00000100
tableHeapWithImageInline 72057594198163456 LOB_DATA 225035 0x478707000100 0x418707000100 0x418707000100
tableClusteredWithImageInline 72057594198360064 IN_ROW_DATA 1971 0x558707000100 0x548707000100 0x4FDA00000100
tableClusteredWithImageInline 72057594198425600 LOB_DATA 225043 0x578707000100 0x568707000100 0x568707000100
So even that change has no impact. According to MSDN the reason for this is that the LOBs are only changed when updated. So this is also quite safe.
The same should be true for the other way around, however I have seen cases, where it took much longer with lots of logfile usage. Maybe you have an idea how I can recreate this scenario?