学习 准备 尝试 谨慎小心

0%

python 爬虫公众号历史消息

准备:

  • 公众号历史消息的请求地址
  • cookie

2种方法获取地址和cookie

  1. 用浏览器打开公众号历史消息页面,使用开发工具获取请求地址和cookie
  2. 用抓包工具获取请求地址和cookie,比如 Fidder、charles

源码如下:

1
# -*- coding: utf-8 -*-
2
import requests
3
import jsonpath
4
import json
5
 
6
headers = {
7
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
8
    "Host": "mp.weixin.qq.com",
9
    "Referer": "https://mp.weixin.qq.com",
10
	# 设置好cookie
11
    "Cookie": "RK=c+zMAuktP8; ptcz=005f33a36542502454b119382853de0d9ea6aa693367c6ae312a1c34c0dcfebe; pgv_pvi=4737076224; ptui_loginuin=1254428526; pgv_pvid=3106019708; wxuin=1411706915; devicetype=Windows7; version=62070152; lang=zh_CN; pass_ticket=FyE/xFBG3nyqQokgb6OoN9VFXaZVJPK53op9NWOsmqB2HZm8CUhy5Hz9+fgVo+PA; wap_sid2=CKPgk6EFElxuY2VlWndUVWJjT1d2YnVzcXMxTk4xcldfQ3hVQUYzUnB1LTVZTDlyQkVCb2ZPZHQ2S3hXbUdsMEJ0VkNVdDBZUmJXUC0wb0ZUb1N6U0JVSlYybHdGZ29FQUFBfjCJsKjuBTgNQJVO"
12
           }
13
 
14
for i in range(10):
15
    # 设置请求地址
16
	url = "https://mp.weixin.qq.com/mp/profile_ext?action=getmsg&__biz=MjM5Mjg3MTIzMQ==&f=json&offset={}&count=10&is_ok=1&scene=124&uin=777&key=777&pass_ticket=&wxtoken=&appmsg_token=1034_N22Qb3TiIEjqcdGLa-1KO9dkAZgO1e2zBcGB5w~~&x5=0&f=json".format(str(i * 10))
17
	
18
	response = requests.get(url, headers = headers)
19
 
20
	res = response.json()
21
	
22
	# 此处要根据具体的json结构进行解析
23
	jsonRes = json.loads(res['general_msg_list'])
24
	titleList = jsonpath.jsonpath(jsonRes, "$..title")
25
	urlList = jsonpath.jsonpath(jsonRes, "$..content_url")
26
27
	
28
	# 遍历 构造可存储字符串·
29
	for index in range(len(titleList)):
30
		title = titleList[index]
31
		url = urlList[index]
32
 
33
		scvStr = "%s,%s,\n" % (title, url)
34
		with open("info.csv", "a+", encoding="gbk", newline='') as f:
35
			f.write(scvStr)

代码会将历史消息标题和连接存储到 csv 文件中