首页 > Web开发 > 详细

C#网页采集

时间:2014-07-22 23:09:25      阅读:569      评论:0      收藏:0      [点我收藏+]
bubuko.com,布布扣
  /// <summary>
        /// 返回提取数组
        /// </summary>
        /// <param name="rex">正则</param>
        /// <param name="urlValue">字符串</param>
        /// <returns></returns>
        private string[] rexID(string rex, string urlValue)
        {
            ArrayList al = new ArrayList();
            string strRegex = rex;
            Regex r = new Regex(strRegex, RegexOptions.IgnoreCase);
            MatchCollection m = r.Matches(urlValue);
            for (int i = 0; i <= m.Count - 1; i++)
            {
                bool rep = false;
                string strNew = m[i].ToString();
                string zregexStr = rex;
                Regex l = new Regex(zregexStr, RegexOptions.None);
                Match mc = l.Match(strNew);
                string dataStr = mc.Groups["key"].Value;
                // 过滤重复的URL 
                foreach (string str in al)
                {
                    if (strNew == str)
                    {
                        rep = true;
                        break;
                    }
                }
                if (!rep)
                {
                    al.Add(dataStr);
                }
            }
            string[] shuzu = new string[al.Count];
            int id = 0;
            foreach (string item in al)
            {
                shuzu[id] = item;
                id++;
            }
            return shuzu;
        }
bubuko.com,布布扣

C#网页采集

原文:http://www.cnblogs.com/vienna/p/3514856.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!