Yes, another string splitting UDF from a guy who's obvioiusly become obsessed with TSQL string splitting. This time we delve into a mysterious world that I call, "Tokenization."
So what is Tokenization? It's a word I made up for this problem.
But what is it, really? It's splitting up a string based on a delimiter -- in this case, a comma -- but being wary of substring delimiters. In this case, that's a pair of apostrophes, because that's what TSQL uses for strings.
I think this is best illustrated with an example string:
DECLARE @Tokens VARCHAR(50)
SET @Tokens = 'a, ''b'', ''''c'', ''d'', ''e'''', f, ''1,2,3,4'''
The basic split string function that you can find will produce the following output:
SELECT *
FROM dbo.SplitString(@Tokens, ',')
OutParam
-------------
a
'b'
''c'
'd'
'e''
f
'1
2
3
4'
Well, that's wrong. Because what I want to do is maintain the substrings (or, "tokens," as I like to call them -- thus, Tokenization!)
The output I desire is:
Token
--------
a
'b'
''c', 'd', 'e''
f
'1,2,3,4'
Notice that substrings -- delimited with apostrophes -- should be maintained.
And here's how I've solved this problem...
CREATE FUNCTION dbo.Tokenize
(
@Input NVARCHAR(2000)
)
RETURNS @Tokens TABLE
(
TokenNum INT IDENTITY(1,1),
Token NVARCHAR(2000)
)
AS
BEGIN
DECLARE @i INT SET @i = 0
DECLARE @StartChar INT SET @StartChar = 1
DECLARE @Quote INT SET @Quote = 0
DECLARE @Chars TABLE
(
CharNum INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
TheChar CHAR(1),
TheCount INT,
StartChar INT
)
SET @Input = ' , ' + @Input + ' , '
INSERT @Chars (TheChar)
SELECT SUBSTRING(@Input, n.Number, 1)
FROM Numbers n
WHERE n.Number > 0
AND n.Number <= LEN(@Input)
ORDER BY n.Number
UPDATE Chars SET
@i = Chars.TheCount =
CASE
WHEN Chars1.TheChar = ','
AND @Quote % 2 = 0 THEN 0
ELSE @i + 1
END,
@Quote =
CASE
WHEN Chars1.TheChar = '''' THEN @Quote + 1
WHEN @i = 0 THEN 0
ELSE @Quote
END,
@StartChar = Chars.StartChar =
CASE
WHEN @i = 1 THEN Chars1.CharNum - 1
WHEN @i = 0 THEN @StartChar + 1
ELSE @StartChar
END
FROM @Chars Chars
JOIN @Chars Chars1 ON Chars1.CharNum = Chars.CharNum + 1
INSERT @Tokens(Token)
SELECT
RTRIM(LTRIM(SUBSTRING(@Input, StartChar, CharNum - StartChar + 1)))
FROM (
SELECT StartChar, CharNum
FROM @Chars
WHERE TheCount = 0
UNION ALL
SELECT
MAX
(
CASE TheCount
WHEN 0 THEN CharNum
ELSE 0
END
) + 1,
MAX(CharNum)
FROM @Chars
) x
WHERE RTRIM(LTRIM(SUBSTRING(@Input, StartChar, CharNum - StartChar + 1))) NOT IN ('', ',')
ORDER BY x.StartChar
RETURN
END
A word of warning: This UDF uses the undocumented -- and unsupported -- "aggregate update" functionality. I've tested thoroughly in this case and believe it works perfectly (and it sure is handy!), but I would advise you to not use it in your own projects without extensive testing! MS doesn't support this one, so handle with care.
And by the way, you need a numbers table to use this thing. Of course.
As for using this thing, it's pretty easy:
DECLARE @Tokens VARCHAR(50)
SET @Tokens = 'a, ''b'', ''''c'', ''d'', ''e'''', f, ''1,2,3,4'''
SELECT Token
FROM dbo.Tokenize(@Tokens)
Token
--------
a
'b'
''c', 'd', 'e''
f
'1,2,3,4'
... and it even appears to work properly!
Enjoy... and application for this and other strange things I've been posting recently coming very, very soon.