文章目录

处理Web网页时需要的一些模块

urlparse模块对url进行解析

from urlparse import urlparse


print urlparse("https://www.youtube.com/results?sp=CAM%253D&q=metasploitable+2")

结果：

1	ParseResult(scheme='https', netloc='www.youtube.com', path='/results', params='', query='sp=CAM%253D&q=metasploitable+2', fragment='')

urllib2

urllib2模块处理打开url的问题

import urllib2


f = urllib2.urlopen("http://cuiqingcai.com/1319.html")
print f.getcode() # 状态码
print f.geturl() # 实际得到网页所属的url
print f.info() # 得到meta information
print f.read() # 网页内容

一些例子

HTTP 验证

import urllib2
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
                          uri='https://mahler:8092/site-updates.py',
                          user='klem',
                          passwd='kadidd!ehopper')
opener = urllib2.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib2.install_opener(opener)
urllib2.urlopen('http://www.example.com/login.html')

添加HTTP头

import urllib2
req = urllib2.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
# Customize the default User-Agent header value:
req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
r = urllib2.urlopen(req)

Beautiful Soup

Beautiful Soup库用来解析网页

参考：http://cuiqingcai.com/1319.html

python web programming urllib2 Beautiful Soup

urllib2

一些例子

HTTP 验证

添加HTTP头

Beautiful Soup

近期文章

近期评论

标签

热门

文章归档

分类目录

功能