您现在的位置是：主页 > news > wordpress图片转内/开封seo公司

wordpress图片转内/开封seo公司

admin2025/6/15 15:42:51【news】

简介wordpress图片转内,开封seo公司,婚纱网站html源码,长沙php网站建设day4-selenium 一、selenium基础 from selenium.webdriver import Chrome1.创建浏览器对象 b Chrome()2.打开网页(需要爬那个页面的数据，就打开那个页面对应的网页地址) b.get(https://movie.douban.com/top250?start0&filter)3.获取网页源代码(注意&…

wordpress图片转内,开封seo公司,婚纱网站html源码,长沙php网站建设day4-selenium 一、selenium基础 from selenium.webdriver import Chrome1.创建浏览器对象 b Chrome()2.打开网页(需要爬那个页面的数据，就打开那个页面对应的网页地址) b.get(https://movie.douban.com/top250?start0&filter)3.获取网页源代码(注意&…

day4-selenium

一、selenium基础

from selenium.webdriver import Chrome

1.创建浏览器对象

b = Chrome()

2.打开网页(需要爬那个页面的数据，就打开那个页面对应的网页地址)

b.get('https://movie.douban.com/top250?start=0&filter=')

3.获取网页源代码(注意：不管以什么样的方式更新了界面内容，page_source的内容也会更新)

print(b.page_source)        # 获取豆瓣电影top250的网页源代码

二、selenium翻页

selenium获取多页数据翻页的方法：

1.找到不同页的地址的变化规律，利用循环实现多页数据的请求

#找不同页详情页的数据
b = Chrome()
#
# for x in range(0, 76, 25):
#     b.get(f'https://movie.douban.com/top250?start={x}&filter=')
#     print(b.page_source)

2.点击翻页按钮，刷新页面内容，在刷新后获取网页源代码

from selenium.webdriver.common.by import Byb = Chrome()
b.get('https://movie.douban.com/top250?start=0&filter=')for _ in range(5):print(b.page_source)# 点击下一页按钮next = b.find_element(By.CLASS_NAME, 'next')# 点击按钮next.click()

方法2涉及的知识点

1. selenium获取标签

浏览器对象.b.find_element(获取方式, 数据) — 返回符合条件的第一个标签，结果是标签对象
浏览器对象.b.find_elements(获取方式, 数据) — 返回符合条件的所有标签，结果是列表，列表中的元素是标签对象

1)获取方式:

By.ID - 通过ID属性值获取标签
By.CLASS_NAME - 通过class属性值获取标签
By.CSS_SELECTOR - 通过css选择器获取标签
By.LINK_TEXT - 通过a标签的标签内容获取标签
By.PARTIAL_LINK_TEXT - 通过a标签的标签内容获取标签

2. 操作标签

1）输入框输入内容：输入框对应的标签.send_keys(‘内容’)
2）点击标签：标签对象.click()

三、获取中国知网的数据

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoupdef analysis_data(html):soup = BeautifulSoup(html, 'lxml')digest = soup.select_one('#ChDivSummary').textprint(digest)def get_net_data():# 1.创建浏览器b = Chrome()# 2.打开中国知网b.get('https://www.cnki.net/')# 3.获取输入框，输入"数据分析"search = b.find_element(By.ID, 'txt_SearchText')search.send_keys('数据分析\n')time.sleep(1)for _ in range(3):# 4.获取搜索结果所有论文的标题标签titles = b.find_elements(By.CLASS_NAME, 'fz14')for x in titles:# 点击一个搜索结果x.click()time.sleep(1)# 切换选项卡，让浏览器对象指向详情页b.switch_to.window(b.window_handles[-1])# 获取详情页数据, 解析数据# print(b.page_source)analysis_data(b.page_source)# 关闭当前窗口b.close()# 将选项卡切换回第一个页面b.switch_to.window(b.window_handles[0])print('--------------------一页数据获取完成--------------------------')b.find_element(By.ID, 'PageNext').click()time.sleep(4)input()if __name__ == '__main__':get_net_data()

四、滚动

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
import timeb = Chrome()
b.get('https://search.jd.com/Search?keyword=%E7%94%B5%E9%A5%AD%E9%94%85&enc=utf-8&wq=%E7%94%B5%E9%A5%AD%E9%94%85&pvid=058303d3cd58499fb8f5f3459afd4d6b')
time.sleep(2)# -----------------------用代码控制浏览器滚动--------------------------
# js中页面鼓动的代码：window.scrollBy(x方向的偏移量, y方向的偏移量)
# b.execute_script('window.scrollBy(0, 8000)')
for x in range(10):b.execute_script('window.scrollBy(0, 800)')time.sleep(1)time.sleep(2)
result = b.find_elements(By.CSS_SELECTOR, '#J_goodsList>ul>li')
print(len(result))input('结束:')

五、作业：

爬取京东（电饭锅）数据

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import csv
import timeb = Chrome()
b.get('https://search.jd.com/Search?keyword=%E7%94%B5%E9%A5%AD%E9%94%85&enc=utf-8&wq=%E7%94%B5%E9%A5%AD%E9%94%85&pvid=20d97125d00a409fb95d2735aeb0a7c6')
# 2.解析数据
soup = BeautifulSoup(b.page_source, 'lxml')
def goods_information():for page in range(2):# 获取每页数据商品列表li_list = soup.select('#J_goodsList>ul>li')for x in li_list:prince = x.select_one('.gl-i-wrap>.p-price>strong>i').texttype_name = x.select_one('.gl-i-wrap>.p-name>a>em').textcommit = x.select_one('.gl-i-wrap>.p-commit>strong>a').textshop = x.select_one('.gl-i-wrap>.p-shop>span').textw2.writerow({'商品名':type_name, '价格':prince, '店铺':shop, '评论':commit})# 点击获取详情页onclick_img = b.find_element(By.CSS_SELECTOR, '.gl-i-wrap>.p-img>a')onclick_img.click()b.switch_to.window(b.window_handles[-1])# 获取详情页数据# print(b.page_source)soup_1 = BeautifulSoup(b.page_source, 'lxml')detail_information_list = soup_1.select('.parameter2>li')# 获取详情要求电饭锅其他详细信息# print(detail_information_list)detail_information = []for x in detail_information_list:de_x= (x.text.split('：'))detail_information.append(de_x)# detail_information.setdefault((eval(x.text)))detail_dict = dict(detail_information)w2.writerow(detail_dict)#前5条评论# goods_commits_ = b.find_element(By.PARTIAL_LINK_TEXT, '商品评价')# goods_commits_.click()# input()# 关闭当前窗口b.close()# 将选项卡切换回第一个页面b.switch_to.window(b.window_handles[0])#点击下一页换页next_page_button = b.find_element(By.CSS_SELECTOR, '#J_bottomPage>span>.pn-next>em')next_page_button.click()print('--------------------------下一页-------------------------')
if __name__ == '__main__':w2 = csv.DictWriter(open('files/京东数据.csv', 'w', encoding='utf-8', newline=''),['商品名', '价格', '店铺', '评论', '商品名称', '商品编号', '店铺', '商品毛重', '商品产地', '操控方式', '容量', '适用人数', '加热方式', '功能', '货号'])w2.writeheader()# w2.writerow(['商品名', '价格', '店铺', '评论'])goods_information()