用于在列中显示词频的sql server函数

Sql server function for displaying word frequency in a column(用于在列中显示词频的sql server函数)
本文介绍了用于在列中显示词频的sql server函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表格,其中列出了一项调查中的自由文本输入,允许参与者输入他们的回答(关于他们希望在婚礼中使用的颜色)

I have a table that lists a freet text input from a survey where enterents were allowed to enter their responses (regarding colours they would like to have in their wedding)

我想写一个sql函数,把这个列的所有信息都收集起来,orders统计每个词出现的频率,按照这个计数对结果集进行排序.

I would like to write a sql function that gathers all the information from this column, and orders counts the frequency of each word, ordering the result set by this count.

Response
--------
Red and White
green
White and blue
Blue
Dark blue

我希望上表的顺序如下

Response  Frequency
--------  ---------
Blue      3
White     2
And       2
Red       1
Green     1

我可以在函数运行后去掉所有像and"这样的垃圾词.有谁知道产生这种行为的任何好的功能?

I can strip all the rubbish words like "and" after the function has run. Does anyone know any good functions that produce this behaviour?

推荐答案

好的,这是一种享受.首先是一个分离值的函数......

Okay this works a treat. Firstly a function to separate the values...

Alter Function dbo.SeparateValues    

(    
 @data VARCHAR(MAX),    
 @delimiter VARCHAR(10)     
)     
RETURNS     
@tbldata TABLE(col VARCHAR(MAX))    
As    
--Declare @data VARCHAR(MAX) ,@delimiter VARCHAR(10)     
--Declare @tbldata TABLE(col VARCHAR(10))    
--Set @data = 'hello,how,are,you?,234234'    
--Set @delimiter = ','    
--DECLARE @tbl TABLE(col VARCHAR(10))    
Begin    
DECLARE @pos INT    
DECLARE @prevpos INT    
SET @pos = 1     
SET @prevpos = 0    

WHILE @pos > 0     
BEGIN    
SET @pos = CHARINDEX(@delimiter, @data, @prevpos+1)    
if @pos > 0     
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, @pos-@prevpos-1))))    
else    
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, len(@data)-@prevpos))))    
SET @prevpos = @pos     
End    

RETURN       
END    

然后我就将它应用到我的桌子上...

then I just apply it to my table...

Select Count(*), sep.Col FROM (
        Select * FROM (
            Select value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(response, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), '  ', ' '), '  ', ' ')))) FROM Responses
        ) easyValues
        Where value <> '' 
    ) actualValues 
    Cross Apply dbo.SeparateValues(value, ' ') sep
    Group By sep.Col
    Order By Count(*) Desc

好的,所以我使用嵌套表进行了 OTT,但我已经去除了所有废话字符,分离了值并保留了最常用单词的运行总数.

Okay, so I went OTT with my nested tables, but I've stripped out all the crap characters, separated the values and kept a running total of the most frequently used words.

这篇关于用于在列中显示词频的sql server函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Execute complex raw SQL query in EF6(在EF6中执行复杂的原始SQL查询)
Hibernate reactive No Vert.x context active in aws rds(AWS RDS中的休眠反应性非Vert.x上下文处于活动状态)
SSIS: Model design issue causing duplications - can two fact tables be connected?(SSIS:模型设计问题导致重复-两个事实表可以连接吗?)
SQL Server Graph Database - shortest path using multiple edge types(SQL Server图形数据库-使用多种边类型的最短路径)
Invalid column name when using EF Core filtered includes(使用EF核心过滤包括时无效的列名)
How should make faster SQL Server filtering procedure with many parameters(如何让多参数的SQL Server过滤程序更快)