python爬虫问题及解决办法1

爬虫中遇到的问题及解决办法如下:

中文乱码问题:

1
2
3
4
5
6
7
8
9
10
11
import re
import urllib.request
from bs4 import BeautifulSoup
import requests

url = 'http://www.biqu6.com/1_1257/'
uaheader = {'User-Agent':'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'}
html = requests.get(url,headers=uaheader)
html.encoding = 'utf-8'
bsob = BeautifulSoup(html.content,'lxml')
print(bsob)

如上代码,加上html.encoding = ‘utf-8’这一句,utf-8是指所爬网页的编码格式。

但是这样可能导致TypeError: object of type ‘Response’ has no len()这个错误。

要解决这个错误就如上代码bsob = BeautifulSoup(html.content,’lxml’)把html改为html.content