首页上一页 1 下一页尾页 1 条记录 1/1页
按照书上敲得代码,一直报错
发表在Python图书答疑
2021-03-20
《Python网络爬虫从入门到实践》第5章 请求模块requests 94页-0页
是否精华
是
否
版块置顶:
是
否
import requests
from lxml import etree
import pandas as pd
ip_list = []
def get_ip(usl, fl):
response = requests.get(usl, headers=fl)
response.encoding = 'utf-8'
if response.status_code == 200:
html = etree.HTML(response.text)
li_all = html.xpath('//li[@class="f-list col-lg-12 col-md-12 col-sm-12 col-xs-12"]')
for j in li_all:
ip = j.xpath('span[@class="f-address"]/text()')
port = j.xpath('span[@class="f-port"]/text()')
ip_list.append(ip + ':' + port)
print('代理IP为:', ip, '对应端口为:', port)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (HTML, like Gecko) '
'Chrome/89.0.4389.82 Safari/537.36'}
if __name__ == '__main__':
ip_table = pd.DataFrame(columns=['ip'])
for i in range(1, 5):
url = 'https://www.dieniao.com/FreeProxy/{page}.html'.format(page=i)
get_ip(url, headers)
ip_table['ip'] = ip_list
ip_table.to_excel('ip.xlsx', sheet_name='data')麻烦大佬们解决下
于2021-03-20 09:10:45编辑

购物车
发表新帖
立即签到







