Saturday, February 25, 2012

Reporting on delimited fields.... best practice? suggestions?

Hi guys... I need to do some statistical reporting on the answers to registration questions. The answers are stored in pipe-delimited varchar fields to allow for one-or-many answers to a single question... here's an example:

Q: What are your fav colors?
A: [checkbox list of colors]

Now the data is stored like this:

Reg001
----
red|blue|green|yellow

Now I need to determine how many people selected each individual answer, so like, how many people picked red, how many people picked yellow... to report like this:

Red - 12
Blue - 23
Green - 15
Yellow - 9

What is the best, or suggested way, to go about splitting this data up and reporting on it? Can it be done in T-SQL, or will it need to be a combination of T-SQL and .Net language?

Thanks!

-e::The answers are stored in pipe-delimited varchar fields to allow for one-or-many answers
::to a single question...

First step: shoot the person who designed the database. If he survives, get him get a boko about sql databases for beginners.

Ok, now:

* Any normal SQL is just off. Sadly. You need to preprocess this field before accumulating the values.
* This is string manipulation. T-SQL is not particularly nice in this.

Is the number of possible answers known? I mean, the list which is stored is defined, right?

If this is the case:
* one statement can work. Accumulate (count) or whatever over a computed column comparing for a defined value.

Basically the function in the field definition can check whether the string field is like - for example - '%red%, hitting on all lines containing red. You need to run one query per color then you get the data.

Main problem: makes my toes come up - this is terribly inefficient from any SQL Lover's point of view. But I fear there simply is no other way to process this. From a relational point of view this way of storing multiple choices is a horrible aberration, and as a result SQL is not really efficient in handling it.

For using a .NET language you have (right now - before yukon) to transport all the strings to the client. Not sure this is more efficient - the bandwith requirement can simply kill it off. Processing is earsier, though - just run coutners against color values stored in a hashtable (with the current count as value).

No comments:

Post a Comment