Google Chrome 扩展中的网页抓取(JavaScript + Chrome API)

2022-07-14 前端问题得得之家

Web Scraping in a Google Chrome Extension (JavaScript + Chrome APIs)(Google Chrome 扩展中的网页抓取(JavaScript + Chrome API))

本文介绍了Google Chrome 扩展中的网页抓取(JavaScript + Chrome API)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用 JavaScript 和任何其他可用技术执行 从 Google Chrome 扩展程序中对当前未打开的标签页进行网页抓取 的最佳选项是什么?也接受其他 JavaScript 库.

What are the best options for performing Web Scraping of a not currently open tab from within a Google Chrome Extension with JavaScript and whatever more technologies are available. Other JavaScript-libraries are also accepted.

重要的是掩盖抓取行为，使其表现得像正常的网络请求.没有 AJAX 或 XMLHttpRequest 的迹象，例如 X-Requested-With: XMLHttpRequest 或 Origin.

The important thing is to mask the scraping to behave like a normal web-request. No indications of AJAX or XMLHttpRequest, like X-Requested-With: XMLHttpRequest or Origin.

必须可以从 JavaScript 访问抓取的内容，以便在扩展程序中进行进一步操作和呈现，最有可能作为字符串.

The scraped content must be accessible from JavaScript for further manipulation and presentation within the extension, most probably as a string.

在任何 WebKit/Chrome 特定的 API 中是否有任何钩子可用于发出正常的网络请求并获取操作结果?

Are there any hooks in any WebKit/Chrome-specific API:s that can be used to make a normal web-request and get the results for manipulation?

var pageContent = getPageContent(url); // TODO: Implement
var items = $(pageContent).find('.item');
// Display items with further selections

使用磁盘上的本地文件进行这项工作的奖励积分，用于初始调试.但如果这是唯一的一点就是停止解决方案，那么请忽略奖励积分.

Bonus-points to make this work from a local file on disk, for initial debugging. But if that is the only point is stopping a solution, then disregard the bonus-points.

推荐答案

尝试使用 XHR2 responseType = "document" 并使用 (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type"))a href="https://gist.github.com/1129031" rel="noreferrer">我的 text/html 补丁.有关我如何检测 responseType 的示例，请参阅 https://gist.github.com/1138724= "document 支持(在从 text/html blob 创建的对象 URL 上同步检查 response === null).

Attempt to use XHR2 responseType = "document" and fall back on (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type")) with my text/html patch. See https://gist.github.com/1138724 for an example of how I detect responseType = "document support (synchronously checking response === null on an object URL created from a text/html blob).

使用 Chrome WebRequest API 隐藏 X-Requested-With 等标题.

Use the Chrome WebRequest API to hide X-Requested-With, etc. headers.

这篇关于Google Chrome 扩展中的网页抓取(JavaScript + Chrome API)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除！

上一篇：如何使用 JSONP 克服 XSS 问题? 下一篇：http请求正文是什么意思?

相关文档推荐

当Formik表单更改时更新另一个组件

Update another component when Formik form changes(当Formik表单更改时更新另一个组件)

Formik验证正在提交/isValiating未设置为True

Formik validation isSubmitting / isValidating not getting set to true(Formik验证正在提交/isValiating未设置为True)

使用Formik的Reaction验证最大范围

React Validation Max Range Using Formik(使用Formik的Reaction验证最大范围)

使用YUP检查字符串或数字长度的验证

Validation using Yup to check string or number length(使用YUP检查字符串或数字长度的验证)

更新Formik表单上的初始值属性不会更新输入值

Updating initialValues prop on Formik Form does not update input value(更新Formik表单上的初始值属性不会更新输入值)

使用YUP和Formick进行密码验证

password validation with yup and formik(使用YUP和Formick进行密码验证)

栏目导航

前端问题 php问题 Java问题 Python问题 C/C++问题 C#/.NET问题移动开发问题数据库问题

最新文章

热门文章

热门标签

html vue validate adobe dreamweaver hbuilder vscode aptana editor dedecms ckeditor 编辑器过滤规则织梦图片本地化模板缩略图图集图片删除 ajax 瀑布流无限下拉 cms 判断 sql 清除 tag 文档数 angularjs2 按钮切换效果 vue3 thinkphp yii2 css 项目列表 li go Beego Buffalo Echo Gin Iris Revel 百度云虚拟主机 pbootcms 伪静态框架排序数据库对象字段 sql语句 php 字符串分割 D3.js bootstrap 函数 svg selectAll 织梦cms 关键词解析采集长度限制日期正则表达式