使用C#爬小说

时间：2019-02-15 18:59:46 阅读：188 评论：0 收藏：0 [点我收藏+]

最近因朋友需要在研究如何从网站上爬小说，说到爬，很多人首先想到的是Python，但是因为没有用过Python，加上时程比较紧，就直接使用C#。

其原理也很简单，就是利用HttpWebRequest对象从网站获取HTML数据包再解析

 HttpWebRequest httpReq = (HttpWebRequest)WebRequest.Create(httpURL);
 httpReq.Method = "GET";
 httpReq.ContentType = "text/html;charset=utf-8";

 HttpWebResponse httpResp = (HttpWebResponse)httpReq.GetResponse(); HttpWebRequest htt

View Code

实际操作过程中发现有些问题，特意记录下

1、返回的HTML数据包是乱码，这个问题有两种解法，首先是要确保StreamReader的编码格式与网站URL的一致，如下

respStreamReader = new StreamReader(respStream, Encoding.UTF8);

另外就是要看服务器传回的流是否使用了gzip方法压缩，如果用了gzip方法压缩，则要用解压才行

string header = httpResp.GetResponseHeader("Content-Encoding");

StreamReader respStreamReader;
if (header == "gzip")
{
    respStreamReader = new StreamReader(new GZipStream(respStream, CompressionMode.Decompress), Encoding.UTF8);
}

使用C#爬小说

原文：https://www.cnblogs.com/dimg/p/10384936.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)