问题描述
我在将 BigQuery 中的大量数据表(15 亿行)从行转换为列时遇到问题.我可以弄清楚如何在硬编码时使用少量数据来做到这一点,但数量如此之大.该表的快照如下所示:
I have a problem in transposing a large amount of data table in BigQuery (1.5 billion rows) from rows to columns. I could figure out how to do it with small amount of data when hardcoded, but with this large amount. A snapshot of the table looks like this:
<代码>+---------------------------+|CustomerID 特征值 |+-------------------------+|1 A123 3 ||1 F213 7 ||1 F231 8 ||1 B789 9.1 ||2 A123 4 ||2 U123 4 ||2 B789 12 ||.. .. .. ||.. .. .. ||400000 A123 8 ||400000 U123 7 ||400000 R231 6 |+-------------------------+
所以基本上有大约 400,000 个不同的 customerID 有 3000 个特征,并不是每个 customerID 都具有相同的特征,所以一些 customerID 可能有 2000 个特征,而有些则有 3000 个.我想得到的最终结果表是每一行呈现一个独特的 customerID,并有 3000 列显示所有功能.像这样:
So basically there are approximately 400,000 distinct customerID with 3000 features, and not every customerID has the same features, so some customerID may have 2000 features while some have 3000. The end result table I would like to get is each row presents one distinct customerID, and with 3000 columns that presents all the features. Like this:
CustomerID Feature1 Feature2 ... Feature3000
因此某些单元格可能缺少值.
So some of the cells may have missing values.
有人知道如何在 BigQuery 或 SQL 中执行此操作吗?
Anyone has idea how to do this in BigQuery or SQL?
提前致谢.
推荐答案
STEP #1
在下面的查询中,将yourTable
替换为您的表的真实名称并执行/运行它
In below query replace yourTable
with real name of your table and execute/run it
SELECT 'SELECT CustomerID, ' +
GROUP_CONCAT_UNQUOTED(
'MAX(IF(Feature = "' + STRING(Feature) + '", Value, NULL))'
)
+ ' FROM yourTable GROUP BY CustomerID'
FROM (SELECT Feature FROM yourTable GROUP BY Feature)
因此,您将获得一些字符串以供下一步使用!
As a result you will get some string to be used in next step!
STEP #2
获取从步骤 1 中得到的字符串并将其作为查询执行
输出是您提出问题的 Pivot
Take string you got from Step 1 and just execute it as a query
The output is a Pivot you asked in question
这篇关于如何在 BigQuery/SQL 中将行转换为具有大量数据的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!