August 02, 2022

Introduction

Ever ran into a situation while writing SQL where you need to split a column into multiple vales or join a column from multiple rows together? There are multiple ways of doing this. One of the most common ways I have run into is using XML functions. Alternatively, SQL Sever has STRING_AGGR and STRING_SPLIT which seem to be lesser known. Below we will take a look at a few examples of these and then we can look and see if we can see any performance differences between the different methods.

The examples I am displaying below were written and ran in SQL Server 2019 Developer Edition. The functions that I am writing about here may not work on older versions of SQL Server.

Splitting A Delimited List

First, we will take a look at using XML to split a comma delimited list of colors. This tends to be the most common way of splitting a comma separated list into individual rows. In order for this to work, we have to create an XML document in memory and then manipulate it.

-- For sake of running this multiple times for testing, we drop the temp table called #Colors
DROP TABLE IF EXISTS #Colors;

-- Create our comma separated list
DECLARE @colors NVARCHAR(MAX) = N'White,Yellow,Blue,Red,Green,Black,Brown,Azure,Ivory,Teal,Silver,Purple,Navy blue,Pea green,Gray,Orange,Maroon,Charcoal,Hunter green,Aquamarine,Coral,Fuchsia,Wheat,Lime,Crimson,Khaki,Hot pink,Magenta,Olden,Plum,Olive,Cyan,Forest Green';

-- In order to use XML functions, we need to create some XML. 
DECLARE @sql_xml XML = Cast('<root><C>'+ Replace(@colors, ',', '</C><C>')+ '</C></root>' AS XML);

/* This will create XML structured something similar to the XML below. I broke the XML out into multiple lines to make it more readable
<root>
	<C>White</C>
	<C>Yellow</C>
	<C>Blue</C>
	<C>Red</C>
	<C>Green</C>
	<C>Black</C>
	...
</root>
*/
 
-- Display the values from the nodes called C in the root object of the XML
SELECT f.x.value('.', 'VARCHAR(MAX)') AS [color]
FROM @sql_xml.nodes('/root/C') f(x);

Now we will take a look at using STRING_SPLIT to accomplish the same task. This is a very basic example.

-- For sake of running this multiple times for testing, we drop the temp table called #Colors
DROP TABLE IF EXISTS #Colors;

-- Create our comma separated list
DECLARE @colors NVARCHAR(MAX) = N'White,Yellow,Blue,Red,Green,Black,Brown,Azure,Ivory,Teal,Silver,Purple,Navy blue,Pea green,Gray,Orange,Maroon,Charcoal,Hunter green,Aquamarine,Coral,Fuchsia,Wheat,Lime,Crimson,Khaki,Hot pink,Magenta,Olden,Plum,Olive,Cyan,Forest Green';

-- Display the values from the list splitting the list on commas
SELECT value AS [color]
FROM STRING_SPLIT(@colors, ',');

Now that looks easy enough. What if we are pulling data from a table and need to split out a column into multiple rows?

-- For sake of running this multiple times for testing, we drop the temp table called #Colors
DROP TABLE IF EXISTS #ColorTable;

-- Create our comma separated list
DECLARE @colors NVARCHAR(MAX) = N'White,Yellow,Blue,Red,Green,Black,Brown,Azure,Ivory,Teal,Silver,Purple,Navy blue,Pea green,Gray,Orange,Maroon,Charcoal,Hunter green,Aquamarine,Coral,Fuchsia,Wheat,Lime,Crimson,Khaki,Hot pink,Magenta,Olden,Plum,Olive,Cyan,Forest Green';

-- Create the temp table to emulate a table holding values
SELECT 
	  'Colors' AS [category]
	, @colors AS [options]
INTO #ColorTable;

-- All we need to do is CROSS APPLY the temp table to the column that holds the list of values we are wanting split
SELECT 
	  [category]
	, value AS [color]
FROM #ColorTable
	CROSS APPLY STRING_SPLIT([options], ',');

Measuring the timing on this small of a data set is rather difficult. However, I went ahead and tried it anyways just out of curiosity. Using the XML method, I ran the script 10 times and recorded the times.

00:00:00.046
00:00:00.046
00:00:00.046
00:00:00.062
00:00:00.062
00:00:00.062
00:00:00.062
00:00:00.077
00:00:00.062
00:00:00.062

You can see it maxed at 77 milliseconds and averaged at about 58.7 milliseconds. Considering SQL Server has to create an XML document and parse the XML, this is pretty quick.

Taking a look at the STRING_SPLIT function, I followed the same method of running it 10 times and recording the times.

00:00:00.046
00:00:00.062
00:00:00.062
00:00:00.062
00:00:00.062
00:00:00.062
00:00:00.046
00:00:00.046
00:00:00.062
00:00:00.062

Using this method we maxed out at 62 milliseconds and averaged at 57.2 milliseconds. While this is a little faster in this test, it is not monumental by any means.

In practice it is not likely you will encounter large lists in a single column of data. Either method would work in most situations. I personally prefer the STRING_SPLIT function due to it is a simpler syntax.

Creating A Delimited List From Multiple Rows

Now that we have seen multiple ways to split a string, we will take a look at how to join that data back together into a single column. Just like splitting a string, there are multiple ways to join it back together as well. In order to have some data to play with, I am going to use the code in the previous section to create a temp table for us to use.

DROP TABLE IF EXISTS #Colors;

SELECT value AS [color]
INTO #Colors
FROM STRING_SPLIT( N'White,Yellow,Blue,Red,Green,Black,Brown,Azure,Ivory,Teal,Silver,Purple,Navy blue,Pea green,Gray,Orange,Maroon,Charcoal,Hunter green,Aquamarine,Coral,Fuchsia,Wheat,Lime,Crimson,Khaki,Hot pink,Magenta,Olden,Plum,Olive,Cyan,Forest Green', ',');

We will first look at creating a string using STUFF and FOR XML PATH.

SELECT STUFF(( 
			SELECT  ', ' + [color]
			FROM ( 
				SELECT DISTINCT [color]
				FROM #Colors
			) x
			FOR XML PATH('')
		), 1, 2, ''
	) AS [Colors]

Now to take a look at STRING_AGG().

SELECT STRING_AGG(CONVERT(NVARCHAR(max), [color]), ', ') AS colors 
FROM #Colors;

I locally tested the speeds of these and on this scale the results are negligible. However, I personally worked with some code where switching from FOR XML PATH to STRING_AGG() reduced a query by minutes. It wasn't a very complicated task, but STRING_AGG() was just much faster.

In that situation, the previous developer was using the FOR XML PATH in a sub query and joining to multiple tables. Swapping in STRING_AGG() not only increased the performance of the query, it also greatly reduced the syntax of the query making it much easier to read.

Conclusion

There are multiple ways to work with lists in a SQL Server database. The most common way of working with them that I have seen is using XML functions inside of SQL Server to manipulate the lists. I only recent became familiar with STRING_SPLIT() and STRING_AGG(), but they are a very effective alternative to XML.

I have been very impressed with the performance of them in some real-world situations and I find the syntax much easier to work with than the XML functions.

I don't have a comments section yet, so feel free to send me feedback on this blog.


Kevin Williams

Kevin is a data engineer and is the Business Intelligence Practice Lead at Software Design Partners specializing in data warehousing. He is a father, an occasional gamer, and lover of many different types of music.


The opinions expressed on this site are my own and may not represent my employer's view.
Share this post
About this blog...

This post takes a look at using STRING_AGG() and STRING_SPLIT() as alternatives to XML functions to work with lists In SQL Server

Archives


An error has occurred. This application may no longer respond until reloaded. Reload 🗙