问题描述
我有一个合作伙伴创建了一些内容供我抓取.
我可以使用浏览器访问该页面,但是在尝试使用 file_get_contents
时,我得到了 403 禁止
.
I have a partner that has created some content for me to scrape.
I can access the page with my browser, but when trying to user file_get_contents
, I get a 403 forbidden
.
我尝试过使用 stream_context_create
,但这并没有帮助 - 可能是因为我不知道应该在那里输入什么.
I've tried using stream_context_create
, but that's not helping - it might be because I don't know what should go in there.
1) 我有什么方法可以抓取数据吗?
2) 如果没有,并且不允许合作伙伴配置服务器允许我访问,我该怎么办?
1) Is there any way for me to scrape the data?
2) If no, and if partner is not allowed to configure server to allow me access, what can I do then?
我尝试使用的代码:
$opts = array(
'http'=>array(
'user_agent' => 'My company name',
'method'=>"GET",
'header'=> implode("
", array(
'Content-type: text/plain;'
))
)
);
$context = stream_context_create($opts);
//Get header content
$_header = file_get_contents($partner_url,false, $context);
推荐答案
这不是您脚本中的问题,它是您合作伙伴 Web 服务器安全性中的一项功能.
This is not a problem in your script, its a feature in you partners web server security.
很难说到底是什么阻碍了你,很可能是某种阻碍抓取.如果您的合作伙伴可以访问他的网络服务器设置,这可能有助于查明.
It's hard to say exactly whats blocking you, most likely its some sort of block against scraping. If your partner has access to his web servers setup it might help pinpoint.
您可以做的是通过设置用户代理标头来伪造网络浏览器",使其模仿标准网络浏览器.
What you could do is to "fake a web browser" by setting the user-agent headers so that it imitates a standard web browser.
我建议使用 cURL 来执行此操作,并且很容易找到执行此操作的好文档.
I would recommend cURL to do this, and it will be easy to find good documentation for doing this.
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
这篇关于file_get_contents() 给我 403 Forbidden的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!