如何使用 XMLHttpRequest 在后台下载 HTML 页面并从中提取文本元素?

How to use XMLHttpRequest to download an HTML page in the background and extract a text element from it?(如何使用 XMLHttpRequest 在后台下载 HTML 页面并从中提取文本元素?)
本文介绍了如何使用 XMLHttpRequest 在后台下载 HTML 页面并从中提取文本元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想制作一个 Greasemonkey 脚本,当您在 URL_1 中时,该脚本会在后台解析 URL_2 的整个 HTML 网页,以便从中提取文本元素.

I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.

具体来说,我想在后台下载整个页面的HTML代码(一个烂番茄页面)并将其存储在一个变量中,然后使用getElementsByClassName[0] 以便从类名为critic_consensus"的元素中提取我想要的文本.

To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0] in order to extract the text I want from the element with class name "critic_consensus".


我在 MDN 中找到了这个:XMLHttpRequest 中的 HTML所以,我最终得到了这个不幸的非工作代码:


I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:

var xhr = new XMLHttpRequest();
xhr.onload = function() {
  alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "http://www.rottentomatoes.com/m/godfather/",true);
xhr.responseType = "document";
xhr.send();

当我在 Firefox Scratchpad 中运行它时,它会显示此错误消息:

It shows this error message when I run it in Firefox Scratchpad:

跨域请求被阻止:同源策略不允许读取http://www.rottentomatoes.com/m/godfather/ 的远程资源.这可以通过将资源移动到同一域或启用 CORS.

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://www.rottentomatoes.com/m/godfather/. This can be fixed by moving the resource to the same domain or enabling CORS.


PS.我不使用烂番茄 API 的原因是 他们已经删除了批评者的共识.

推荐答案

对于跨域请求,获取的站点没有帮助设置许可CORS 策略,Greasemonkey 提供 GM_xmlhttpRequest() 函数.(大多数其他用户脚本引擎也提供此功能.)

For cross-origin requests, where the fetched site has not helpfully set a permissive CORS policy, Greasemonkey provides the GM_xmlhttpRequest() function. (Most other userscript engines also provide this function.)

GM_xmlhttpRequest 明确设计为允许跨域请求.

GM_xmlhttpRequest is expressly designed to allow cross-origin requests.

要获取您的目标信息,请在结果上创建一个 DOMParser.不要使用 jQuery 方法,因为这会导致加载无关的图像、脚本和对象、减慢速度或使页面崩溃.

To get your target information create a DOMParser on the result. Do not use jQuery methods as this will cause extraneous images, scripts and objects to load, slowing things down, or crashing the page.

这里有一个完整的脚本来说明这个过程:

Here's a complete script that illustrates the process:

// ==UserScript==
// @name        _Parse Ajax Response for specific nodes
// @include     http://stackoverflow.com/questions/*
// @require     http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js
// @grant       GM_xmlhttpRequest
// ==/UserScript==

GM_xmlhttpRequest ( {
    method: "GET",
    url:    "http://www.rottentomatoes.com/m/godfather/",
    onload: function (response) {
        var parser  = new DOMParser ();
        /* IMPORTANT!
            1) For Chrome, see
            https://developer.mozilla.org/en-US/docs/Web/API/DOMParser#DOMParser_HTML_extension_for_other_browsers
            for a work-around.

            2) jQuery.parseHTML() and similar are bad because it causes images, etc., to be loaded.
        */
        var doc         = parser.parseFromString (response.responseText, "text/html");
        var criticTxt   = doc.getElementsByClassName ("critic_consensus")[0].textContent;

        $("body").prepend ('<h1>' + criticTxt + '</h1>');
    },
    onerror: function (e) {
        console.error ('**** error ', e);
    },
    onabort: function (e) {
        console.error ('**** abort ', e);
    },
    ontimeout: function (e) {
        console.error ('**** timeout ', e);
    }
} );

这篇关于如何使用 XMLHttpRequest 在后台下载 HTML 页面并从中提取文本元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Update another component when Formik form changes(当Formik表单更改时更新另一个组件)
Formik validation isSubmitting / isValidating not getting set to true(Formik验证正在提交/isValiating未设置为True)
React Validation Max Range Using Formik(使用Formik的Reaction验证最大范围)
Validation using Yup to check string or number length(使用YUP检查字符串或数字长度的验证)
Updating initialValues prop on Formik Form does not update input value(更新Formik表单上的初始值属性不会更新输入值)
password validation with yup and formik(使用YUP和Formick进行密码验证)