Selenium 请求的 HTTP 标头中缺少引用者

Referer missing in HTTP header of Selenium request(Selenium 请求的 HTTP 标头中缺少引用者)
本文介绍了Selenium 请求的 HTTP 标头中缺少引用者的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Selenium 编写一些测试,并注意到标头中缺少 Referer.我编写了以下最小示例以使用 技术:这是目前唯一可以正常工作的方法是在 referer 标头中有一个您不介意的退出页面.许多网站都实现了这种方法,包括 Google 和 Facebook.如果实施正确,它不会让推荐人数据显示私人信息,而是仅显示用户来自的网站.新的引荐来源数据将显示为 http://example.com/exit?url=http%3A,而不是显示为 http://example.com/user/foobar 的引荐来源网址数据%2F%2Fexample.com.该方法的工作方式是让您网站上的所有外部链接都转到一个中间页面,然后重定向到最终页面.下面我们有一个指向网站 example.com 的链接,我们对完整的 URL 进行 URL 编码,并将其添加到退出页面的 url 参数中.

来源:

  • https://developer.mozilla.org/en-US/docs/Web/Security/Referer_header:_privacy_and_security_concerns#How_can_we_fix_this
  • https://geekthis.net/post/隐藏http-referer-headers/#exit-page-redirect
<小时>

这个用例

我已经通过 GeckoDriver/Firefox 和 ChromeDriver/Chrome 组合执行了您的代码:

代码块:

driver.get('http://www.python.org')在 driver.title 中断言Python"url = 'https://httpbin.org/headers'driver.execute_script('window.location.href = "{}";'.format(url))WebDriverWait(driver, 10).until(lambda driver: driver.current_url == url)打印(驱动程序.page_source)

观察:

  • 使用 GeckoDriver/Firefox Referer: "https://www.python.org/" 标头缺少如下:

    <代码> {标题":{"接受": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Encoding": "gzip, deflate, br","Accept-Language": "en-US,en;q=0.5","主机": "httpbin.org",升级不安全请求":1",用户代理":Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"}}

  • 使用 ChromeDriver/Chrome Referer: "https://www.python.org/" 标头 present 如下:

    <代码> {标题":{"接受": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3","Accept-Encoding": "gzip, deflate, br","Accept-Language": "en-US,en;q=0.9","主机": "httpbin.org",推荐人":https://www.python.org/",升级不安全请求":1",用户代理":Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36"}}

结论:

GeckoDriver/Firefox 在处理 Referer 标头时似乎存在问题.

<小时>

结尾

推荐人政策

I'm writing some tests with Selenium and noticed, that Referer is missing from the headers. I wrote the following minimal example to test this with https://httpbin.org/headers:

import selenium.webdriver

options = selenium.webdriver.FirefoxOptions()
options.add_argument('--headless')

profile = selenium.webdriver.FirefoxProfile()
profile.set_preference('devtools.jsonview.enabled', False)

driver = selenium.webdriver.Firefox(firefox_options=options, firefox_profile=profile)
wait = selenium.webdriver.support.ui.WebDriverWait(driver, 10)

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
wait.until(lambda driver: driver.current_url == url)
print(driver.page_source)

driver.close()

Which prints:

<html><head><link rel="alternate stylesheet" type="text/css" href="resource://content-accessible/plaintext.css" title="Wrap Long Lines"></head><body><pre>{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.5", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
  }
}
</pre></body></html>

So there is no Referer. However, if I browse to any page and manually execute

window.location.href = "https://httpbin.org/headers"

in the Firefox console, Referer does appear as expected.


As pointed out in the comments below, when using

driver.get("javascript: window.location.href = '{}'".format(url))

instead of

driver.execute_script("window.location.href = '{}';".format(url))

the request does include Referer. Also, when using Chrome instead of Firefox, both methods include Referer.

So the main question still stands: Why is Referer missing in the request when sent with Firefox as described above?

解决方案

Referer as per the MDN documentation

The Referer request header contains the address of the previous web page from which a link to the currently requested page was followed. The Referer header allows servers to identify where people are visiting them from and may use that data for analytics, logging, or optimized caching, for example.

Important: Although this header has many innocent uses it can have undesirable consequences for user security and privacy.

Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer


However:

A Referer header is not sent by browsers if:

  • The referring resource is a local "file" or "data" URI.
  • An unsecured HTTP request is used and the referring page was received with a secure protocol (HTTPS).

Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer


Privacy and security concerns

There are some privacy and security risks associated with the Referer HTTP header:

The Referer header contains the address of the previous web page from which a link to the currently requested page was followed, which can be further used for analytics, logging, or optimized caching.

Source: https://developer.mozilla.org/en-US/docs/Web/Security/Referer_header:_privacy_and_security_concerns#The_referrer_problem


Addressing the security concerns

From the Referer header perspective majority of security risks can be mitigated following the steps:

  • Referrer-Policy: Using the Referrer-Policy header on your server to control what information is sent through the Referer header. Again, a directive of no-referrer would omit the Referer header entirely.
  • The referrerpolicy attribute on HTML elements that are in danger of leaking such information (such as <img> and <a>). This can for example be set to no-referrer to stop the Referer header being sent altogether.
  • The rel attribute set to noreferrer on HTML elements that are in danger of leaking such information (such as <img> and <a>).
  • The Exit Page Redirect technique: This is the only method that should work at the moment without flaw is to have an exit page that you don’t mind having inside of the referer header. Many websites implement this method, including Google and Facebook. Instead of having the referrer data show private information, it only shows the website that the user came from, if implemented correctly. Instead of the referrer data appearing as http://example.com/user/foobar the new referrer data will appear as http://example.com/exit?url=http%3A%2F%2Fexample.com. The way the method works is by having all external links on your website go to a intermediary page that then redirects to the final page. Below we have a link to the website example.com and we URL encode the full URL and add it to the url parameter of our exit page.

Sources:

  • https://developer.mozilla.org/en-US/docs/Web/Security/Referer_header:_privacy_and_security_concerns#How_can_we_fix_this
  • https://geekthis.net/post/hide-http-referer-headers/#exit-page-redirect

This usecase

I have executed your code through both through GeckoDriver/Firefox and ChromeDriver/Chrome combination:

CodeBlock:

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
WebDriverWait(driver, 10).until(lambda driver: driver.current_url == url)
print(driver.page_source)

Observation:

  • Using GeckoDriver/Firefox Referer: "https://www.python.org/" header was missing as follows:

        {
          "headers": {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
            "Accept-Encoding": "gzip, deflate, br", 
            "Accept-Language": "en-US,en;q=0.5", 
            "Host": "httpbin.org", 
            "Upgrade-Insecure-Requests": "1", 
            "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"
          }
        }
    

  • Using ChromeDriver/Chrome Referer: "https://www.python.org/" header was present as follows:

        {
          "headers": {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 
            "Accept-Encoding": "gzip, deflate, br", 
            "Accept-Language": "en-US,en;q=0.9", 
            "Host": "httpbin.org", 
            "Referer": "https://www.python.org/", 
            "Upgrade-Insecure-Requests": "1", 
            "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36"
          }
        }
    

Conclusion:

It seems to be an issue with GeckoDriver/Firefox in handling the Referer header.


Outro

Referrer Policy

这篇关于Selenium 请求的 HTTP 标头中缺少引用者的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)