您现在的位置是:主页 > news > 和平网站建设优化seo/百度上怎么做推广
和平网站建设优化seo/百度上怎么做推广
admin2025/6/28 21:59:53【news】
简介和平网站建设优化seo,百度上怎么做推广,自己做付费网站,怎么做网页设计视频信号提供了一种机制,可以让事件发生时调用该事件的回调函数,例如,当爬虫开启,或者当抓取到了一个Item。你可以通过crawler.signals.connect()方法来把它们和回调函数关联起来。Scrapy共有11个信号,或许理解它们的最简单…
和平网站建设优化seo,百度上怎么做推广,自己做付费网站,怎么做网页设计视频信号提供了一种机制,可以让事件发生时调用该事件的回调函数,例如,当爬虫开启,或者当抓取到了一个Item。你可以通过crawler.signals.connect()方法来把它们和回调函数关联起来。Scrapy共有11个信号,或许理解它们的最简单…
信号提供了一种机制,可以让事件发生时调用该事件的回调函数,例如,当爬虫开启,或者当抓取到了一个Item
。你可以通过crawler.signals.connect()
方法来把它们和回调函数关联起来。Scrapy共有11个信号,或许理解它们的最简单的方式就是在实例中观察它们。这时创建了一个爬虫的工程,主要目的就是记录了每次的方法调用。爬虫本身比较简单,只是yield
了两个Item
然后抛出一个异常,并且在处理第二个Item
时让Item Pipeline
抛出一个DropItem
异常:
def parse(self, response):for i in range(2):item = HooksasyncItem()item['name'] = "Hello %d" % iyield itemraise Exception("dead")
完整的爬虫工程可以找一下这里。
使用这个工程,我们可以更好地理解信号是在何时被发送的。看一下下面的执行结果,注意日志行之间的注释:
$ scrapy crawl test
... many lines ...
# First we get those two signals...
INFO: Extension, signals.spider_opened fired
INFO: Extension, signals.engine_started fired
# Then for each URL we get a request_scheduled signal
INFO: Extension, signals.request_scheduled fired
...# when download completes we get response_downloaded
INFO: Extension, signals.response_downloaded fired
INFO: DownloaderMiddlewareprocess_response called for
example.com
# Work between response_downloaded and response_received
INFO: Extension, signals.response_received fired
INFO: SpiderMiddlewareprocess_spider_input called for
example.com
# here our parse() method gets called... and then
SpiderMiddleware used
INFO: SpiderMiddlewareprocess_spider_output called for
example.com
# For every Item that goes through pipelines successfully...
INFO: Extension, signals.item_scraped fired
# For every Item that gets dropped using the DropItem
exception...
INFO: Extension, signals.item_dropped fired
# If your spider throws something else...
INFO: Extension, signals.spider_error fired
# ... the above process repeats for each URL
# ... till we run out of them. then...
INFO: Extension, signals.spider_idle fired
# by hooking spider_idle you can schedule further Requests. If
you don't
# the spider closes.
INFO: Closing spider (finished)
INFO: Extension, signals.spider_closed fired
# ... stats get printed
# and finally engine gets stopped.
INFO: Extension, signals.engine_stopped fired
只有11个信号可能有些限制,但是Scrapy所有默认的中间件都是用它们实现的,所以11个信号已经足够了。要注意的是,除了spider_idle
、spider_error
、request_scheduled
、response_received
和response_downloaded
这些信号,你都可以在其他信号中返回Deferred
对象而不是实际的值。