.net爬虫

准备第三方插件

HtmlAgilityPack 在nuget中可以找到并下载 然后在项目中引用

代码步骤

从指定网站把网页的html获得

WebRequest request = WebRequest.Create("http://www.hnzbcg.com.cn/hnzbcg/cgxx/cggg/A080302index_1.htm");
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312"));
string s = reader.ReadToEnd();

将获得的html字符串传入 这个插件的具体使用可以参考http://www.cnblogs.com/GmrBrian/p/6201237.html

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(s);
HtmlNode div = doc.DocumentNode.SelectSingleNode("//a[@class='GrayLink12']");
HtmlNodeCollection hrefList = doc.DocumentNode.SelectNodes("//a[@class='GrayLink12']");
List<string> list = new List<string>();