问题描述
我有一个表,其中存储了许多推特推文的信息,包括推文文本和发布推文的用户的屏幕名称.推文包含主题标签(以 # 开头),我想计算特定用户发推文的主题标签数量:
I have a table which stores information of a lot of twitter tweets including the tweet text and the screen name of the user who tweeted the tweet. The tweets contain hashtags (starting with #), I want to count the number of hashtags that a specific user has tweeted:
tweet_id | tweet_text | screen_name |
--------------------------------------------------------------------------------------------
1 | #hashtag1 #otherhashtag2 #hashtag3 some more text | tweeter_user_1 |
2 | some text #hashtag1 #hashtag4 more text | tweeter_user_2 |
3 | #hashtag5 #hashtag1 @not a hashtag some#nothashtag | tweeter_user_1 |
4 | #hashtag1 with more text | tweeter_user_3 |
5 | #otherhashtag2 #hashtag3,#hashtag4 more text | tweeter_user_1 |
如果我要计算 tweeter_user_1 的主题标签,我期望的结果是 8,如果我想要 tweeter_user_3 的主题标签,它应该返回 1.假设我的表名是推文,我该怎么做.
If I were to count the hashtags of tweeter_user_1, the result i expect is 8, if i wanted the hashtags of tweeter_user_3 it should return 1. How can I do it assuming that my table name is tweets.
我试过这个: SELECT COUNT( * ) FROM tweets WHERE( LENGTH( REPLACE( tweet_text, '#%', '@') = 0 ) ) AND screen_name = 'tweeter_user_1'
但它没用
I tried this: SELECT COUNT( * ) FROM tweets WHERE( LENGTH( REPLACE( tweet_text, '#%', '@') = 0 ) ) AND screen_name = 'tweeter_user_1'
but it didn't work
如果 tweeter_user_1 的结果也是 9,我会很高兴 :D
I would be happy if the result of tweeter_user_1 was 9 too :D
推荐答案
这应该会给你一个 screen_names 列表和他们使用的所有主题标签的总数.
This should give you a list of screen_names and the total count of all hashtags they use.
SELECT foo.screen_name, SUM(foo.counts) FROM
(
SELECT screen_name,
LENGTH( tweet_text) - LENGTH(REPLACE(tweet_text, '#', '')) AS counts
FROM tweet_table
) as foo
GROUP BY foo.screen_name
但是......如果表很大,这是一个令人讨厌的查询.如果您只需要对单个用户进行计数,我可能会在内部选择中指定特定用户.像这样:
But.... it's a nasty query if the table is huge. I might specify a specific users in the inner select if you just need counts for a single user. Like this:
SELECT foo.screen_name, SUM(foo.counts) FROM
(
SELECT screen_name,
LENGTH( tweet_text) - LENGTH(REPLACE(tweet_text, '#', '')) AS counts
FROM tweet_table WHERE screen_name = 'tweeter_user_1'
) as foo
GROUP BY foo.screen_name
这篇关于计算 MySQL 列中子字符串的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!