失效链接处理 |
融合ChatGPT的智能化 Selenium|络爬虫设计与实?PDF 下蝲
相关截图Q?/strong>
![]() 主要内容Q?/strong>
2.2 自动化爬虫系l的设计
下面是该pȝ实现自动化爬虫功能的详细步骤?/span>
相应?nbsp;Python 代码?/span>
导入E序中所用到?nbsp;Python 标准库以及第三方
库代码说?/span>Q?/span>
SeleniumQ?/span>用于自动化浏览器操作Q?/span>可以模拟?/span>
户在览器中的各U行?/span>Q?/span>如点?/span>?/span>输入{?/span>Q?/span>常用?/span>
爬虫?/span>试和自动化d?/span>
SSLQ?/span>用于处理 SSL 证书Q?/span>通过 ssl._create_default_
https_context = ssl._create_unverified_context 解决 SSL
证书问题报错?/span>
keyboardQ?/span>用于监听热键Q?/span>实现功能函数的实?/span>
调用?/span>
threadingQ?/span>用于创徏U程Q?/span>实现多线E执?/span>?/span>
reQ?/span>Python 的正则表辑ּ?/span>Q?/span>用于字符串的模式
匚w和处?/span>?/span>
bs4Q?/span>BeautifulSoupQ:用于解析 HTML ?nbsp;XML
文档Q?/span>方便C|页中提取数?/span>?/span>
timeQ?/span>用于旉相关的操?/span>Q?/span>比如{待?/span>计时{?/span>?/span>
undetected_chromedriverQ?/span>是对 selenium 的扩?/span>Q?/span>
用于l过自动化试的脚本而运?/span>Chrome览?/span>?/span>
KeysQ?/span>Selenium 中的模块Q?/span>用于模拟键盘按键?/span>
WebDriverWait ?nbsp;expected_conditionsQ?/span>selenium
中的模块Q?/span>用于{待面元素加蝲?/span>
sysQ?/span>Python 标准?/span>Q?/span>提供?nbsp;Python q行时环?/span>
的访?/span>?/span>
atexitQ?/span>用于注册在程序退Z前执行的函数?/span>
coloramaQ?/span>一个用于在l端输出中添加颜色的?/span>Q?/span>
可以让输出更加丰富和醒目
库的导入部分代码如图 2 所C?/span>?/span>
?2 库的导入
??nbsp;Python ?nbsp;Selenium 库扩?nbsp;undetected_
chromedriver 来启?nbsp;Chrome 览?/span>Q?/span>q监听热?/span>Q?/span>F8
?/span>Q?/span>来触发一个功能回调函?/span>Q?/span>hotkey_callbackQ?/span>
同时Q?/span>注册一个在E序退出时关闭览器的回调函数
Q?/span>close_browserQ?/span>
代码说明Q?/span>使用 Python ?nbsp;Selenium 库的扩展
undetected_chromedriver 来启?nbsp;Chrome 览器以?/span>
现绕q爬虫目标网늚“反爬虫机?/span>”[6]?/span>
通过 driver.get() Ҏ(gu)D到百度网?/span>Q?/span>https://
www.baidu.comQ?/span>方便用户操作?/span>
使用 keyboard ?/span>Q?/span>前提是已l导?nbsp;keyboard ?/span>Q?/span>
监听热键 F8Q?/span>q在按下 F8 时触?nbsp;hotkey_callback ?/span>
调函?/span>Q?/span>主要功能函数Q?/span>获取?/span>解析|页源码以及
ChatGPT 交互{功能都包含其中Q?/span>下文详细介绍Q?/span>
创徏 close_browser 函数用于关闭览?/span>?/span>在程?/span>
退出时Q?/span>通过 atexit.register() Ҏ(gu)注册 close_browser
函数Q?/span>保在程序退出前关闭览?/span>?/span>
输出提示信息Q?/span>告知用户?nbsp;F8 键开始工?/span>Q?/span>q?/span>
提醒不要手动关闭E序的窗?/span>Q?/span>因ؓ览器会在需?/span>
时自动退?/span>?/span>
创徏字典Q?/span>命名?nbsp;window_dictQ?/span>来存储已打开
标签늚标题和句?/span>Q?/span>方便标签늚控制切换?/span>
接下来是一?nbsp;while 循环Q?/span>该@环会持箋q行Q?/span>?/span>
臛_调函数因错误中断Q,用于监测新标{N的操?/span>?/span>
跛_循环?/span>Q?/span>代码通过 time.sleep(1) {待 1 U?/span>Q?/span>
然后调用 sys.exit() 来退出程?/span>?/span>
具体代码如图 3 所C?/span>?/span>
|