TSQL 时间序列模式数据挖掘

TSQL Time Series Pattern Data Mining(TSQL 时间序列模式数据挖掘)
本文介绍了TSQL 时间序列模式数据挖掘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以包含以下 3 个字段的 SQL 表为例:

Take a SQL table with the following 3 fields:

Id,TimeStamp,Item,UserId

我想确定会话中 UserId 最常见的 Item 序列.会话将简单地由时间阈值定义(即,如果 X 分钟内没有完整内容,则未来的任何条目都将被分组到一个新会话中).

I would like to determine the most common sequences of Item for a UserId in a session. A session would simply be defined by a threshold of time (i.e. if there are no entires for X minutes, any future entries would be grouped into a new session).

理想情况下,项目序列可以有一种模糊分组,其中序列中的一个或两个差异仍然可以被视为相同并组合在一起.

Ideally, the sequence of Items could have a sort of fuzzy grouping where one or two differences in the sequence could still be counted as the same and grouped together.

有人知道我如何在 SQL 中解决这个问题吗?

Anyone know how I might tackle this problem in SQL?

更新:
为了澄清,让我们假设 Items 是杂货店岛.我有一个月的人去杂货店.基本问题是人们使用什么岛以及它的顺序是什么.他们最常去的是1,2,3还是1,2,1,3,4?

(现在我很好奇用户在我们网站上的路径,但你知道,杂货店更直观).

(Right now I am curious about paths of users on our sites, but you know, grocery store is more visual).

更新 2:
这是一个简单的案例:

Update 2:
Here is a simple case:

CREATE Table #StoreActivity
(
    id int,
    CreationDate datetime ,
    Isle int,
    UserId int
)

Insert INTO #StoreActivity
Values
    (1, CAST('12-1-2011 03:10:01' AS Datetime), 1, 2222),
    (2, CAST('12-1-2011 03:10:07' AS Datetime), 1, 1111),
    (3, CAST('12-1-2011 03:10:12' AS Datetime), 2, 2222),
    (4, CAST('12-1-2011 04:10:01' AS Datetime), 1, 2222),
    (5, CAST('12-1-2011 04:10:23' AS Datetime), 2, 2222)

Select * from #StoreActivity
DROP Table #StoreActivity

/* So with the above data, we have 2 sequences if we declare a session or visit dead if there is no activity for a minute : `1,2` (With a count of 2), and `1` (with a count of 1)*/

推荐答案

WITH    q AS
        (
        SELECT  *,
                ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY TimeStamp, Id) AS rn,
                ROW_NUMBER() OVER (PARTITION BY UserId, Item ORDER BY TimeStamp, Id) AS rnd
        FROM    mytable
        )
SELECT  *,
        rnd - rn AS sequence
FROM    q

sequence 列将在给定 UserId 的序列中的所有记录之间共享.您可以对其进行分组或做任何您喜欢的事情.

The sequence column will be shared among all records in a sequence for a given UserId. You can group on it or do whatever you like.

这篇关于TSQL 时间序列模式数据挖掘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Execute complex raw SQL query in EF6(在EF6中执行复杂的原始SQL查询)
SSIS: Model design issue causing duplications - can two fact tables be connected?(SSIS:模型设计问题导致重复-两个事实表可以连接吗?)
SQL Server Graph Database - shortest path using multiple edge types(SQL Server图形数据库-使用多种边类型的最短路径)
Invalid column name when using EF Core filtered includes(使用EF核心过滤包括时无效的列名)
How should make faster SQL Server filtering procedure with many parameters(如何让多参数的SQL Server过滤程序更快)
How can I generate an entity–relationship (ER) diagram of a database using Microsoft SQL Server Management Studio?(如何使用Microsoft SQL Server Management Studio生成数据库的实体关系(ER)图?)