分类目录归档:python

pyspider Exception: HTTP 599: Resolving timed out after 20000 milliseconds

现象描述

初学pyspider,写demo程序,一直报以下异常

[E 170504 16:08:57 base_handler:203] HTTP 599: Resolving timed out after 20000 milliseconds
    Traceback (most recent call last):
      File "d:\application\python36-32bit\lib\site-packages\pyspider\libs\base_handler.py", line 196, in run_task
        result = self._run_task(task, response)
      File "d:\application\python36-32bit\lib\site-packages\pyspider\libs\base_handler.py", line 175, in _run_task
        response.raise_for_status()
      File "d:\application\python36-32bit\lib\site-packages\pyspider\libs\response.py", line 172, in raise_for_status
        six.reraise(Exception, Exception(self.error), Traceback.from_string(self.traceback).as_traceback())
      File "d:\application\python36-32bit\lib\site-packages\six.py", line 685, in reraise
        raise value.with_traceback(tb)
      File "d:\application\python36-32bit\lib\site-packages\pyspider\fetcher\tornado_fetcher.py", line 378, in http_fetch
        response = yield gen.maybe_future(self.http_client.fetch(request))
      File "d:\application\python36-32bit\lib\site-packages\tornado\httpclient.py", line 102, in fetch
        self._async_client.fetch, request, **kwargs))
      File "d:\application\python36-32bit\lib\site-packages\tornado\ioloop.py", line 458, in run_sync
        return future_cell[0].result()
      File "d:\application\python36-32bit\lib\site-packages\tornado\concurrent.py", line 238, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
    Exception: HTTP 599: Resolving timed out after 20000 milliseconds

以下为demo代码

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Created on 2017-05-03 22:30:09
# Project: xdzhcs_me

from pyspider.libs.base_handler import *


class Handler(BaseHandler):
    crawl_config = {

    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://xdzhcs.me/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    @config(priority=2)
    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }

百度 谷歌了N久都没找到解决方案,有的说DNS的,然而我这边校园网改了DNS就上不了网了,有说禁用IPV6的,同样没效果,绝望之际,打开QQ找了个pyspider的群,问老司机得知,加个代理就好… 遂顺手加了个翻墙代理(跟墙没关,只是刚好有翻墙的),马上可以。

crawl_config = {
      'proxy': '127.0.0.1:1080'