首页 > 数据库技术 > 详细

python爬虫学习(2)__抓取糗百段子,与存入mysql数据库

时间:2016-08-12 20:00:37      阅读:241      评论:0      收藏:0      [点我收藏+]
import pymysql
import requests
from bs4 import BeautifulSoup
#pymysql链接数据库 conn
=pymysql.connect(host=127.0.1,unix_socket=/tmp/mysql.sock,user=root,passwd=19950311,db=mysql) cur=conn.cursor() cur.execute("USE scraping")
#存储段子标题,内容
def store(title,content): cur.execute("insert into pages(title,content) values(\"%s\",\"%s\")",(title,content)) cur.connection.commit() global links class QiuShi(object): def __init__(self,start_url): self.url=start_url def crawing(self): try: html=requests.get(self.url,lxml) return html.content except ConnectionError as e: return ‘‘ def extract(self,htmlContent): if len(htmlContent)>0: bsobj=BeautifulSoup(htmlContent,lxml) #print bsobj jokes=bsobj.findAll(div,{class:article block untagged mb15}) for j in jokes: text=j.find(h2).text content=j.find(div,{class:content}).string if text != None and content != None: # print text,content,数据库编码为utf-8 store(text.encode(utf-8),content.encode(utf-8)) print text.encode(utf-8),content.encode(utf-8) print ------------------------------------------------------------------------------ else: print ‘‘ def main(self): text=self.crawing() self.extract(text) try: qiushi=QiuShi(http://www.qiushibaike.com/) qiushi.main() finally:
#关闭cursor,connection cur.close() conn.close()

 

python爬虫学习(2)__抓取糗百段子,与存入mysql数据库

原文:http://www.cnblogs.com/yunwuzhan/p/5765963.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!