本文共 4349 字,大约阅读时间需要 14 分钟。
没有成功,价格没有搜索到
没有找到相关的价格信息
关于单个股票的信息
从图上就可以看出,是在对应的a标签中,可以采用正则表达式进行筛选r’[zh|sh]\d{6}’
import requestsimport refrom bs4 import BeautifulSoup# 获取相关网页的html文件def getHTMLText(url): return''# 获取所有股票的序号def getStockList(lst,stockURL): return ''# 获取所有股票的价格def getStockInfo(lst,stockURL,fpath): return ''def main(): stock_list_url = 'http://quote.eastmony.com/stocklist.html' # 获取股票信息序号的网站 stock_info_url = 'https://gupiao.baidu.com/stock/' # 获取单个股票的价格的网页 output_file = 'D://SpiderTest//BaiduStockInfo.txt' # 文件保存的连接 slist = [] getStockList(slist,stock_info_url) getStockInfo(slist,stock_info_url,output_file) main()
第二步,逐步完善各个步骤
def getHTMLText(url): try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return''
这里有值得学习的东西,先获取所有的a标签,然后再逐个进行排除,如果是a标签,那就添加,不是那就跳过当前的循环,学会用try——except语句
def getStockList(lst,stockURL): html = getHTMLText(stockURL) soup = BeautifulSoup(html,'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs['href'] lst.append(re.findall(r'[s][hz]\d{6}',href)) except: continue
def getStockInfo(lst,stockURL,fpath): count = 0 for stock in lst: # 遍历所有的股票编码列表 url = stockURL + stock + '.html' # 生成么一个股票特定的网页的信息 html = getHTMLText(url) try : if html == '': # 如果没有获取到相关网页信息,说明网页没有意义,那就跳过当前的循环 continue infoDict = { } # 创建对应的字典容器,将存储股票序号——价格的键值对信息 soup = BeautifulSoup(html,'html.parser') # 获取对应的网页的信息 stockInfo = soup.find('div',attrs={ 'class':'bets-name'}) # 结合属性和名称,获取特定的标签 name = stockInfo.find_all(attrs={ 'class':'bets-name'})[0] infoDict.update({ '股票名称':name.text.split()[0]}) keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList[i].text val = valueList[i].text infoDict[key] = val with open(fpath,'a',encoding = 'utf-8') as f: f.write(str(infoDict) + '\n') except: traceback.print_exc() continue
stockInfo = soup.find('div',attrs={ 'class':'stock-bets'}) # 结合属性和名称,获取特定的标签
name = stockInfo.find_all(attrs={ 'class':'bets-name'})[0] infoDict.update({ '股票名称':name.text.split()[0]})
keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList[i].text val = valueList[i].text infoDict[key] = val
with open(fpath,'a',encoding = 'utf-8') as f: f.write(str(infoDict) + '\n')
except: traceback.print_exc() continue
count = count + 1 print('\r 当前进度:{:.2f}%'.format(count * 100 / len(lst)),end = "")
‘\r’会将输出的光标移到开头,覆盖原来的输出