您现在的位置是:主页 > news > 和平网站建设优化seo/百度上怎么做推广

和平网站建设优化seo/百度上怎么做推广

admin2025/6/28 21:59:53news

简介和平网站建设优化seo,百度上怎么做推广,自己做付费网站,怎么做网页设计视频信号提供了一种机制,可以让事件发生时调用该事件的回调函数,例如,当爬虫开启,或者当抓取到了一个Item。你可以通过crawler.signals.connect()方法来把它们和回调函数关联起来。Scrapy共有11个信号,或许理解它们的最简单…

和平网站建设优化seo,百度上怎么做推广,自己做付费网站,怎么做网页设计视频信号提供了一种机制,可以让事件发生时调用该事件的回调函数,例如,当爬虫开启,或者当抓取到了一个Item。你可以通过crawler.signals.connect()方法来把它们和回调函数关联起来。Scrapy共有11个信号,或许理解它们的最简单…

信号提供了一种机制,可以让事件发生时调用该事件的回调函数,例如,当爬虫开启,或者当抓取到了一个Item。你可以通过crawler.signals.connect()方法来把它们和回调函数关联起来。Scrapy共有11个信号,或许理解它们的最简单的方式就是在实例中观察它们。这时创建了一个爬虫的工程,主要目的就是记录了每次的方法调用。爬虫本身比较简单,只是yield了两个Item然后抛出一个异常,并且在处理第二个Item时让Item Pipeline抛出一个DropItem异常:

def parse(self, response):for i in range(2):item = HooksasyncItem()item['name'] = "Hello %d" % iyield itemraise Exception("dead")

完整的爬虫工程可以找一下这里。

使用这个工程,我们可以更好地理解信号是在何时被发送的。看一下下面的执行结果,注意日志行之间的注释:

$ scrapy crawl test
... many lines ...
# First we get those two signals...
INFO: Extension, signals.spider_opened fired
INFO: Extension, signals.engine_started fired
# Then for each URL we get a request_scheduled signal
INFO: Extension, signals.request_scheduled fired
...# when download completes we get response_downloaded
INFO: Extension, signals.response_downloaded fired
INFO: DownloaderMiddlewareprocess_response called for
example.com
# Work between response_downloaded and response_received
INFO: Extension, signals.response_received fired
INFO: SpiderMiddlewareprocess_spider_input called for
example.com
# here our parse() method gets called... and then
SpiderMiddleware used
INFO: SpiderMiddlewareprocess_spider_output called for
example.com
# For every Item that goes through pipelines successfully...
INFO: Extension, signals.item_scraped fired
# For every Item that gets dropped using the DropItem
exception...
INFO: Extension, signals.item_dropped fired
# If your spider throws something else...
INFO: Extension, signals.spider_error fired
# ... the above process repeats for each URL
# ... till we run out of them. then...
INFO: Extension, signals.spider_idle fired
# by hooking spider_idle you can schedule further Requests. If
you don't
# the spider closes.
INFO: Closing spider (finished)
INFO: Extension, signals.spider_closed fired
# ... stats get printed
# and finally engine gets stopped.
INFO: Extension, signals.engine_stopped fired

只有11个信号可能有些限制,但是Scrapy所有默认的中间件都是用它们实现的,所以11个信号已经足够了。要注意的是,除了spider_idlespider_errorrequest_scheduledresponse_receivedresponse_downloaded这些信号,你都可以在其他信号中返回Deferred对象而不是实际的值。