如何在 jdbc 数据源中使用 dbtable 选项的子查询?

How to use a subquery for dbtable option in jdbc data source?(如何在 jdbc 数据源中使用 dbtable 选项的子查询?)
本文介绍了如何在 jdbc 数据源中使用 dbtable 选项的子查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Spark 处理来自 JDBC 源的一些数据.但是首先,我想在JDBC端运行一些查询来过滤列和连接表,而不是从JDBC读取原始表,并将查询结果作为表加载到Spark SQL中.

I want to use Spark to process some data from a JDBC source. But to begin with, instead of reading original tables from JDBC, I want to run some queries on the JDBC side to filter columns and join tables, and load the query result as a table in Spark SQL.

以下加载原始 JDBC 表的语法适用于我:

The following syntax to load raw JDBC table works for me:

df_table1 = sqlContext.read.format('jdbc').options(
    url="jdbc:mysql://foo.com:3306",
    dbtable="mydb.table1",
    user="me",
    password="******",
    driver="com.mysql.jdbc.Driver" # mysql JDBC driver 5.1.41
).load() 
df_table1.show() # succeeded

根据 Spark 文档(我使用的是 PySpark 1.6.3):

According to Spark documentation (I'm using PySpark 1.6.3):

dbtable:应该读取的 JDBC 表.请注意,任何有效的可以在 SQL 查询的 FROM 子句中使用.例如,而不是完整的表,您也可以在括号中使用子查询.

dbtable: The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.

所以只是为了实验,我尝试了一些简单的方法:

So just for experiment, I tried something simple like this:

df_table1 = sqlContext.read.format('jdbc').options(
    url="jdbc:mysql://foo.com:3306",
    dbtable="(SELECT * FROM mydb.table1) AS table1",
    user="me",
    password="******",
    driver="com.mysql.jdbc.Driver"
).load() # failed

它抛出了以下异常:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table1 WHERE 1=0' at line 1

我还尝试了其他一些语法变体(添加/删除括号、删除as"子句、切换大小写等),但都没有成功.那么正确的语法是什么?在哪里可以找到更详细的语法文档?此外,错误消息中这个奇怪的WHERE 1 = 0"来自哪里?谢谢!

I also tried a few other variations of the syntax (add / remove parentheses, remove 'as' clause, switch case, etc) without any luck. So what would be the correct syntax? Where can I find more detailed documentation for the syntax? Besides, where does this weird "WHERE 1=0" in error message come from? Thanks!

推荐答案

对于在 Spark SQL 中使用 sql 查询从 JDBC 源读取数据,您可以尝试如下操作:

For reading data from JDBC source using sql query in Spark SQL, you can try something like this:

val df_table1 = sqlContext.read.format("jdbc").options(Map(
    ("url" -> "jdbc:postgresql://localhost:5432/mydb"),
    ("dbtable" -> "(select * from table1) as table1"),
    ("user" -> "me"),
    ("password" -> "******"),
    ("driver" -> "org.postgresql.Driver"))
).load()

我用 PostgreSQL 试过了.可以根据MySQL修改.

I tried it using PostgreSQL. You can modify it according to MySQL.

这篇关于如何在 jdbc 数据源中使用 dbtable 选项的子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Hibernate reactive No Vert.x context active in aws rds(AWS RDS中的休眠反应性非Vert.x上下文处于活动状态)
Bulk insert with mysql2 and NodeJs throws 500(使用mysql2和NodeJS的大容量插入抛出500)
Flask + PyMySQL giving error no attribute #39;settimeout#39;(FlASK+PyMySQL给出错误,没有属性#39;setTimeout#39;)
auto_increment column for a group of rows?(一组行的AUTO_INCREMENT列?)
Sort by ID DESC(按ID代码排序)
SQL/MySQL: split a quantity value into multiple rows by date(SQL/MySQL:按日期将数量值拆分为多行)