计算表列中的单词出现次数

Counting word occurrences in a table column(计算表列中的单词出现次数)
本文介绍了计算表列中的单词出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有 varchar(255) 字段的表.我想(通过查询、函数或 SP)获取该表中一组行中每个单词的出现次数.

如果有 2 行包含这些字段:

<前>我喜欢吃香蕉"我不喜欢像猴子一样吃东西"

我想得到

<前>词 |数数()---------------喜欢 3吃2到 2我 21

有什么想法吗?我使用的是 MySQL 5.2.

解决方案

@Elad Meidar,我喜欢你的问题,我找到了解决方案:

SELECT SUM(total_count) as total, value从 (SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') 作为值从 (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) 值FROM table_name t CROSS JOIN(选择 a.N + b.N * 10 + 1 n从(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9),(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b按 n 排序) nWHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))按值排序) 作为 x按 x.value 分组) 作为按值分组

这是完整的工作小提琴:http://sqlfiddle.com/#!2/17481a/1

首先,我们进行查询以提取所有单词,如@peterm 所述here(按照他的说明(如果您想自定义处理的总字数).然后我们将其转换为子查询,然后我们 COUNTGROUP BY 每个单词的值,然后在其上对 GROUP 进行另一个查询BY 未将可能存在伴随标志的单词分组.即:你好=你好!带有 REPLACE

I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.

If there are 2 rows with these fields:

"I like to eat bananas"
"I don't like to eat like a monkey"

I want to get

    word | count()
    ---------------
    like  3
    eat   2
    to    2
    i     2
    a     1

Any idea? I am using MySQL 5.2.

解决方案

@Elad Meidar, I like your question and I found a solution:

SELECT SUM(total_count) as total, value
FROM (

SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
  FROM table_name t CROSS JOIN 
(
   SELECT a.N + b.N * 10 + 1 n
     FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
   ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
 ORDER BY value

) AS x
GROUP BY x.value

) AS y
GROUP BY value

Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1

First we do a query to extract all words as explained here by @peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE

这篇关于计算表列中的单词出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Hibernate reactive No Vert.x context active in aws rds(AWS RDS中的休眠反应性非Vert.x上下文处于活动状态)
Bulk insert with mysql2 and NodeJs throws 500(使用mysql2和NodeJS的大容量插入抛出500)
Flask + PyMySQL giving error no attribute #39;settimeout#39;(FlASK+PyMySQL给出错误,没有属性#39;setTimeout#39;)
auto_increment column for a group of rows?(一组行的AUTO_INCREMENT列?)
Sort by ID DESC(按ID代码排序)
SQL/MySQL: split a quantity value into multiple rows by date(SQL/MySQL:按日期将数量值拆分为多行)