问题描述
我有以下打印到PDF的代码(并且可以正常工作),并且我只使用Google Chrome进行打印。
def send_devtools(driver, command, params=None):
# pylint: disable=protected-access
if params is None:
params = {}
resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
url = driver.command_executor._url + resource
body = json.dumps({"cmd": command, "params": params})
resp = driver.command_executor._request("POST", url, body)
return resp.get("value")
def export_pdf(driver):
command = "Page.printToPDF"
params = {"format": "A4"}
result = send_devtools(driver, command, params)
data = result.get("data")
return data
如我们所见,我使用Page.printToPDF
打印到Base64,并将format
作为format
传递给params
参数。
遗憾的是,该参数似乎被忽略了。我看到一些代码正在使用木偶师使用它(格式为A4),我想这可能会对我有所帮助。
即使使用硬编码的宽度和高度(见下文),我也没有运气。
"paperWidth": 8.27, # inches
"paperHeight": 11.69, # inches
使用上面的代码,是否可以将页面设置为A4格式?
推荐答案
更新后07-17-2021
我决定使用Python包pdfminer.sixth
验证原始代码的输出from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFParser
from pdfminer.pdfpage import PDFDocument
parser = PDFParser(open('test_1.pdf', 'rb'))
doc = PDFDocument(parser)
pageSizesList = []
for page in PDFPage.create_pages(doc):
print(page.mediabox)
# output
[0, 0, 612, 792]
当我将这些磅大小转换为英寸时,我感到震惊。大小为8.5 x 11,不等于8.27 x 11.69的A4纸张大小。当我看到这一点时,我决定通过查看chromium和selenium源代码来进一步探讨这个问题。
在chromium源代码中,命令Page.printToPDF
位于文件page_handler.cc
void PageHandler::PrintToPDF(Maybe<bool> landscape,
Maybe<bool> display_header_footer,
Maybe<bool> print_background,
Maybe<double> scale,
Maybe<double> paper_width,
Maybe<double> paper_height,
Maybe<double> margin_top,
Maybe<double> margin_bottom,
Maybe<double> margin_left,
Maybe<double> margin_right,
Maybe<String> page_ranges,
Maybe<bool> ignore_invalid_page_ranges,
Maybe<String> header_template,
Maybe<String> footer_template,
Maybe<bool> prefer_css_page_size,
Maybe<String> transfer_mode,
std::unique_ptr<PrintToPDFCallback> callback)
此函数允许修改paper_width
和paper_height
参数。这些参数采用double.
C++Double是一种通用的数据类型,编译器在内部使用它来定义和保存任何数值数据类型,特别是任何面向小数的值。C++DOUBLE数据类型既可以是带值的小数,也可以是整数。
这些参数有默认值,在Chrome DevTools Protocol:
中定义- 纸张宽度:以英寸为单位的纸张宽度。默认为8.5英寸。
- 纸张高度:以英寸为单位的纸张高度。默认为11英寸
请注意之间的参数格式差异
chromium
源代码和Chrome DevTools Protocol
详细信息。
chromium
源代码中的Paper_Width- 纸张宽度
Chrome DevTools Protocol
根据chromium
源代码,Page.printToPDF
命令用SendCommandAndGetResultWithTimeout.
Status WebViewImpl::PrintToPDF(const base::DictionaryValue& params,
std::string* pdf) {
// https://bugs.chromium.org/p/chromedriver/issues/detail?id=3517
if (!browser_info_->is_headless) {
return Status(kUnknownError,
"PrintToPDF is only supported in headless mode");
}
std::unique_ptr<base::DictionaryValue> result;
Timeout timeout(base::TimeDelta::FromSeconds(10));
Status status = client_->SendCommandAndGetResultWithTimeout(
"Page.printToPDF", params, &timeout, &result);
if (status.IsError()) {
if (status.code() == kUnknownError) {
return Status(kInvalidArgument, status);
}
return status;
}
if (!result->GetString("data", pdf))
return Status(kUnknownError, "expected string 'data' in response");
return Status(kOk);
}
在我的原始答案中,我使用了send_command_and_get_result,
,即
类似于SendCommandAndGetResultWithTimeout.
命令
# stub_devtools_client.h
Status SendCommandAndGetResult(
const std::string& method,
const base::DictionaryValue& params,
std::unique_ptr<base::DictionaryValue>* result) override;
Status SendCommandAndGetResultWithTimeout(
const std::string& method,
const base::DictionaryValue& params,
const Timeout* timeout,
std::unique_ptr<base::DictionaryValue>* result) override;
查看selenium源代码后,不清楚如何正确传递send_command_and_get_result
或send_command_and_get_result_with_timeout.
我在webdriver
selenium源代码中注意到了这个函数:
def execute_cdp_cmd(self, cmd, cmd_args):
"""
Execute Chrome Devtools Protocol command and get returned result
The command and command args should follow chrome devtools protocol domains/commands, refer to link
https://chromedevtools.github.io/devtools-protocol/
:Args:
- cmd: A str, command name
- cmd_args: A dict, command args. empty dict {} if there is no command args
:Usage:
driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': requestId})
:Returns:
A dict, empty dict {} if there is no result to return.
For example to getResponseBody:
{'base64Encoded': False, 'body': 'response body string'}
"""
return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']
经过调研和测试,我发现这个函数可以用来实现您的用例。
import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFParser
from pdfminer.pdfpage import PDFDocument
chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')
browser = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
browser.get('http://www.google.com')
# use can defined additional parameters if needed
params = {'landscape': False,
'paperWidth': 8.27,
'paperHeight': 11.69}
# call the function "execute_cdp_cmd" with the command "Page.printToPDF" with
# parameters defined above
data = browser.execute_cdp_cmd("Page.printToPDF", params)
# save the output to a file.
with open('file_name.pdf', 'wb') as file:
file.write(base64.b64decode(data['data']))
browser.quit()
# verify the page size of the PDF file created
parser = PDFParser(open('file_name.pdf', 'rb'))
doc = PDFDocument(parser)
pageSizesList = []
for page in PDFPage.create_pages(doc):
print(page.mediabox)
# output
[0, 0, 594.95996, 840.95996]
输出以磅为单位,需要转换为英寸。
- 594.95996分等于8.263332777783英寸
- 840.95996分等于11.6799994445英寸
8.263332777783 x 11.6799994445是A4纸张大小。
原帖07-13-2021
调用函数时可以传递多个parametersPage.printToPDF.
其中两个参数是:
- 纸张宽度
- 纸张高度
以下代码将这些参数传递给Page.printToPDF.
import json
import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def send_devtools(driver, command, params=None):
if params is None:
params = {}
resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
url = driver.command_executor._url + resource
body = json.dumps({"cmd": command, "params": params})
resp = driver.command_executor._request("POST", url, body)
return resp.get("value")
def create_pdf(driver, file_name):
command = "Page.printToPDF"
params = {'paper_width': '8.27', 'paper_height': '11.69'}
result = send_devtools(driver, command, params)
save_pdf(result, file_name)
return
def save_pdf(data, file_name):
with open(file_name, 'wb') as file:
file.write(base64.b64decode(data['data']))
print('PDF created')
chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')
browser = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
browser.get('http://www.google.com')
create_pdf(browser, 'test_pdf_1.pdf')
----------------------------------------
My system information
----------------------------------------
Platform: maxOS
OS Version: 10.15.7
Python Version: 3.9
Selenium: 3.141.0
pdfminer.sixth: 20201018
----------------------------------------
这篇关于Selify打印A4格式的PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!