您现在的位置是:主页 > news > crm系统操作流程/黑帽seo优化软件
crm系统操作流程/黑帽seo优化软件
admin2025/6/17 1:57:30【news】
简介crm系统操作流程,黑帽seo优化软件,自己做网站出证书,做外包的网站有哪些问题最近用scrapy爬取美女图片时,需用到page页url进行正则提取text中的url,但是返回响应的只有这些代码,运行几次都是这样,提取的url列表为空,真是令人烦恼 <!DOCTYPE html> <html lang"zh-CN"> &l…
crm系统操作流程,黑帽seo优化软件,自己做网站出证书,做外包的网站有哪些问题最近用scrapy爬取美女图片时,需用到page页url进行正则提取text中的url,但是返回响应的只有这些代码,运行几次都是这样,提取的url列表为空,真是令人烦恼
<!DOCTYPE html>
<html lang"zh-CN">
&l…
最近用scrapy爬取美女图片时,需用到page页url进行正则提取text中的url,但是返回响应的只有这些代码,运行几次都是这样,提取的url列表为空,真是令人烦恼
<!DOCTYPE html>
<html lang="zh-CN">
<head><meta charset="utf-8"><title>百度安全验证</title><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="apple-mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-status-bar-style" content="black"><meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximu
m-scale=1.0"><meta name="format-detection" content="telephone=no, email=no"><link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon"><link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests"><link rel="stylesheet" href="https://ppui-static-wap.cdn.bcebos.com/static/touch/css/api/mkdjump_0635445.css" />
</head>
<body><div class="timeout hide"><div class="timeout-img"></div><div class="timeout-title">网络不给力,请稍后重试</div><button type="button" class="timeout-button">返回首页</button></div><div class="timeout-feedback hide"><div class="timeout-feedback-icon"></div><p class="timeout-feedback-title">问题反馈</p></div><script src="https://wappass.baidu.com/static/machine/js/api/mkd.js"></script>
<script src="https://ppui-static-wap.cdn.bcebos.com/static/touch/js/mkdjump_fbb9952.js"></script>
</body>
</html>
躺下后无意间看见一个真假url的博客,这才明白哪里有问题了
解决:
一开始我用的一号链接就出现上述问题
当改用2号链接时问题就解决了!!!!
利用管道存储图片
import scrapy
import reimport os
from ..items import BdimgItemclass BaiduimageSpider(scrapy.Spider):name = 'baiduimage'# allowed_domains = ['xxx']start_urls = ['https://image.baidu.com/search/acjson?tn=resultjson_com&logid=11464945410389410565&ipn=rj&ct=201326592&is=&fp=result&fr=ala&word=%E7%8C%AB%E5%92%AA&queryWord=%E7%8C%AB%E5%92%AA&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&hd=&latest=©right=&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=&expermode=&nojc=&isAsync=&pn=30&rn=30&gsm=1e&1641168838251=']page_url="https://image.baidu.com/search/acjson?tn=resultjson_com&logid=11464945410389410565&ipn=rj&ct=201326592&is=&fp=result&fr=ala&word=%E7%8C%AB%E5%92%AA&queryWord=%E7%8C%AB%E5%92%AA&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&hd=&latest=©right=&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=&expermode=&nojc=&isAsync=&pn={}&rn=30&gsm=5a&1641254836978="num=0page_num=0def parse(self, response):# print(response.text)img_urls=re.findall('"thumbURL":"(.*?)"',response.text)# print(img_urls)for index,img_url in enumerate(img_urls):yield scrapy.Request(img_url,callback=self.get_img)self.page_num+=1if self.page_num==4:returnpage_url=self.page_url.format(self.page_num*30)yield scrapy.Request(page_url,callback=self.parse)def get_img(self,response):img_data=response.body #图片二进制数据# if not os.path.exists(r"F:\dj\django20\bdimg\dirspider"):# os.mkdir(r"F:\dj\django20\bdimg\dirspider")# filename=r"F:\dj\django20\bdimg\dirspider\%s.jpg"%str(self.num)# self.num+=1# with open(filename,"wb") as f:# f.write(img_data)item=BdimgItem()item["img_data"]=img_datayield item
pipelines.py# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html# useful for handling different item types with a single interface
from itemadapter import ItemAdapterimport osclass BdimgPipeline:num=0def process_item(self, item, spider):if not os.path.exists(r"F:\dj\django20\bdimg\dirspider_猫咪"):os.mkdir(r"F:\dj\django20\bdimg\dirspider_猫咪")filename=r"F:\dj\django20\bdimg\dirspider_猫咪\%s.jpg"%str(self.num)self.num+=1with open(filename,"wb") as f:f.write(item["img_data"])return item
items.py# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.htmlimport scrapyclass BdimgItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()img_data=scrapy.Field()