编码输出问题

问题描述

有个同学在抓取这个网页 http://www.ikanchai.com/ 的时候,出现乱码,如下:


import requests
from bs4 import BeautifulSoup as bs

web_data = requests.get('http://www.ikanchai.com/')
soup = bs(web_data.text,'lxml')
dates = soup.select('div.sort.channel.clearfix > ul > li > a')

for data in dates:
    print (data)
-----输出结果-----
<a href="http://www.ikanchai.com/view/">观点</a>
<a href="http://www.ikanchai.com/start/">创投</a>
<a href="http://www.ikanchai.com/evaluation/">评测</a>
<a href="http://www.ikanchai.com/vr/">VR</a>
<a href="http://www.ikanchai.com/push/">ä¸“æ </a>
<a href="http://app.ikanchai.com/roll.php">动态</a>

那么其实这是编码问题,加一句web_data.encoding = 'utf-8'即可以了,具体的编码选择在html文档开头有说明。


问题解决

更正之后的代码:

import requests
from bs4 import BeautifulSoup as bs

web_data = requests.get('http://www.ikanchai.com/')
web_data.encoding = 'utf-8'
soup = bs(web_data.text,'lxml')
dates = soup.select('div.sort.channel.clearfix > ul > li > a')

for data in dates:
    print (data)

-----输出结果------
<a href="http://www.ikanchai.com/view/">观点</a>
<a href="http://www.ikanchai.com/start/">创投</a>
<a href="http://www.ikanchai.com/evaluation/">评测</a>
<a href="http://www.ikanchai.com/vr/">VR</a>
<a href="http://www.ikanchai.com/push/">专栏</a>
<a href="http://app.ikanchai.com/roll.php">动态</a>

results matching ""

    No results matching ""