(八)爬虫之js调试(登陆知乎)

Wesley13
• 阅读 868

  上次爬取网易云音乐,折腾js调试了好久,难受。。。。今天继续练练手,研究下知乎登陆,让痛苦更猛烈些。

1.简单分析

  很容易就发现登陆的url=“https://www.zhihu.com/api/v3/oauth/sign\_in”,post方法提交,需要的请求头和表单数据如下两图,请求头中有一个特殊的x-xsrftoken,表单数据为加密后的一长串字符窜,因此需要构造这两个值即可。

(八)爬虫之js调试(登陆知乎)

 (八)爬虫之js调试(登陆知乎)

2. 获取 x-xsrftoken值

   首先是这个特殊的x-xsrftoken,发现通过访问url="https://www.zhihu.com/",返回的cookies里面能拿到(会自动重定向,需要禁止重定向拿到requests.get(url,headers=headers,allow\_redirects=False)),代码如下:

headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36",}def get_xsrf(headers):
    url="https://www.zhihu.com/"
    r = requests.get(url,headers=headers,allow_redirects=False)  #禁止重定向时,cookies里面有xsrf参数
    xsrf = r.cookies["_xsrf"]

3. 构造表单数据

  然后就是表单数据的构造和加密了, 根据前辈们的提示,提交url为url=“https://www.zhihu.com/api/v3/oauth/sign\_in”,一般表单会和末尾部分/oauth/sign\_in有关系,于是在js文件里面搜索sign\_in时,发现了下面的API,通过添加断点,点击登录,然后执行到断点处时,看到了如下的表单数据。变换不同账号试了下,发现只有captcha,signature,timestamp三个值在改变,其他参数不变。很明显,captcha为验证码,timestamp为时间戳(注意是13位),signature也像一个加密值,接下来就是寻找这几个值构造表单了。

表单数据:

(八)爬虫之js调试(登陆知乎)

  3.1 构造signature

    对于signature,在js文件中搜索signature,发现了下面的signature关键字,同样打断点,发现signature是采用hmac对四个数据加密后的结果;加密方  法为sha1,salt值如下,然后加密的参数依次是e=“password”, u="c3cef7c66a1843f8b3a9e6a1e3160e20"(就是clientId), source="com.zhihu.web", n为13为  时间戳。用python实现代码如下:

def get_signature(grantType,clientId,source,timestamp):
    h = hmac.new("d1b964811afb40118a12068ff74a12f4","",hashlib.sha1)
    h.update(grantType+clientId+source+str(timestamp))
    return h.hexdigest()

(八)爬虫之js调试(登陆知乎)

  3.2 构造captcha

    然后是处理验证码captcha参数,发现有三种情况:

    1. 不需要验证码,captcha=""

    2. 请求验证码的url为"https://www.zhihu.com/api/v3/oauth/captcha?lang=cn",返回为汉字图片,需要点击图片中倒立的汉字,captcha为坐标值

               3.请求验证码的url为"https://www.zhihu.com/api/v3/oauth/captcha?lang=en",返回英文字母图片,输入图片中英文字符即可,captcha为英文字符

   验证码的请求和处理流程如下:

    首先向上述两个验证码请求url中任一个发送get请求,如果返回{show_captcha:False},不需要验证码,captcha="",直接返回即可;如果返回{show_captcha:True},则需要验证码,继续向该url发送put请求(需要第一步的cookie),服务器会返回base64编码的验证码图片,利用base64解码写入文件即为验证码图片。打开图片,根据要求输入验证码或点击图片即为captcha的值,这里需要先携带cookie和验证码值,向服务器发送post请求,返回success才表示验证成功。比较特殊的是中文验证码处理,验证码的值为几组坐标值,如下第二张图片所示,可以利用matplotlib.pyplot模块来获取图片点击的坐标值(注意提交结果为实际点击坐标的一半)。

验证码:

(八)爬虫之js调试(登陆知乎)

 验证码结果返回:

(八)爬虫之js调试(登陆知乎)

验证码处理的代码如下:

(八)爬虫之js调试(登陆知乎) (八)爬虫之js调试(登陆知乎)

def get_captcha(lang,headers):
    if lang=="cn":
        api = "https://www.zhihu.com/api/v3/oauth/captcha?lang=cn"
    else:
        api = "https://www.zhihu.com/api/v3/oauth/captcha?lang=en"
        
    ret = requests.get(api,headers=headers)
    cookies = ret.cookies
    show_captcha = re.search("true",ret.text)
    captcha=""
    if show_captcha:
        img_res = requests.put(api,headers=headers,cookies=cookies)  #得带上第一步的cookie,否则返回,{u'code': 120002, u'name': u'ERR_CAPSION_TICKET_NOT_FOUND'}
        img_json = json.loads(img_res.text)
        img_data = img_json["img_base64"].replace("\n","")
        with open("captcha.jpg","wb") as i:
            i.write(base64.b64decode(img_data))
        img = Image.open("captcha.jpg")
        if lang=="cn":
            plt.imshow(img)
            print("点击图片中所有倒立的汉字,在命令行中按回车键提交")
            points = plt.ginput(7) #阻塞点击七次后返回(或者中途点击回车键返回),返回包含坐标组的列表,格式:[(44.661290322580641, 49.951612903225794)]
            captcha = json.dumps({
                "img_size":[200,44],
                "input_points":[[i[0]/2,i[1]/2] for i in points]  #获取的坐标得除2
            })
            
        else:
            img_thread = threading.Thread(target=img.show)
            img_thread.setDaemon(True)
            img_thread.start()
            captcha = raw_input("请输入图片里的验证码:")  #python 2.7
        r = requests.post(api,headers=headers,data={"input_text":captcha},cookies=cookies)  #先提交验证码结果
        print(r.text)
    return captcha,cookies

验证码处理

4. 加密表单数据

  拿到上述表单需要的值后,剩下的就是对表单数据进行加密了,搜索了下encrypt,找到了如下的js代码,通过打断点,看到了下图中e的值,和表单中的参数一模一样,可以确定为加密方法,简单研究了下js代码,实在看不懂。。。。谷歌了下大佬们的解决方案(见文末参考),发现需要将加密方法(28853行大括号截止处,对应function)拷贝出来,利用execjs模块在python中执行js代码即可。需要注意的是,拷贝出来的加密方法是在浏览器中运行的,需要去掉window,document等对象处理成node.js环境下运行的js代码,然后安装node.js,将execjs模块的运行环境设置为node.js即可以运行了。下面为处理后的encrypt代码和python加密方法:

(八)爬虫之js调试(登陆知乎) (八)爬虫之js调试(登陆知乎)

function s(e) {
    return (s = "function" == typeof Symbol && "symbol" == typeof Symbol.t ? function(e) {
        return typeof e
    }
    : function(e) {
        return e && "function" == typeof Symbol && e.constructor === Symbol && e !== Symbol.prototype ? "symbol" : typeof e
    }
    )(e)
}
function i() {}
function h(e) {
    this.s = (2048 & e) >> 11,
    this.i = (1536 & e) >> 9,
    this.h = 511 & e,
    this.A = 511 & e
}
function A(e) {
    this.i = (3072 & e) >> 10,
    this.A = 1023 & e
}
function n(e) {
    this.n = (3072 & e) >> 10,
    this.e = (768 & e) >> 8,
    this.a = (192 & e) >> 6,
    this.s = 63 & e
}
function e(e) {
    this.i = e >> 10 & 3,
    this.h = 1023 & e
}
function a() {}
function c(e) {
    this.n = (3072 & e) >> 10,
    this.e = (768 & e) >> 8,
    this.a = (192 & e) >> 6,
    this.s = 63 & e
}
function o(e) {
    this.A = (4095 & e) >> 2,
    this.s = 3 & e
}
function r(e) {
    this.i = e >> 10 & 3,
    this.h = e >> 2 & 255,
    this.s = 3 & e
}
function k(e) {
    this.s = (4095 & e) >> 10,
    this.i = (1023 & e) >> 8,
    this.h = 1023 & e,
    this.A = 63 & e
}
function B(e) {
    this.s = (4095 & e) >> 10,
    this.n = (1023 & e) >> 8,
    this.e = (255 & e) >> 6
}
function f(e) {
    this.i = (3072 & e) >> 10,
    this.A = 1023 & e
}
function u(e) {
    this.A = 4095 & e
}
function C(e) {
    this.i = (3072 & e) >> 10
}
function b(e) {
    this.A = 4095 & e
}
function g(e) {
    this.s = (3840 & e) >> 8,
    this.i = (192 & e) >> 6,
    this.h = 63 & e
}
function G() {
    this.c = [0, 0, 0, 0],
    this.o = 0,
    this.r = [],
    this.k = [],
    this.B = [],
    this.f = [],
    this.u = [],
    this.C = !1,
    this.b = [],
    this.g = [],
    this.G = !1,
    this.Q = null,
    this.R = null,
    this.w = [],
    this.x = 0,
    this.D = {
        0: i,
        1: h,
        2: A,
        3: n,
        4: e,
        5: a,
        6: c,
        7: o,
        8: r,
        9: k,
        10: B,
        11: f,
        12: u,
        13: C,
        14: b,
        15: g
    }
}
Object.defineProperty(exports, "__esModule", {
    value: !0
});
var t = "1.1"
  , __g = {};
i.prototype.M = function(e) {
    e.G = !1
}
,
h.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.c[this.i] = this.h;
        break;
    case 1:
        e.c[this.i] = e.k[this.A]
    }
}
,
A.prototype.M = function(e) {
    e.k[this.A] = e.c[this.i]
}
,
n.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.c[this.n] = e.c[this.e] + e.c[this.a];
        break;
    case 1:
        e.c[this.n] = e.c[this.e] - e.c[this.a];
        break;
    case 2:
        e.c[this.n] = e.c[this.e] * e.c[this.a];
        break;
    case 3:
        e.c[this.n] = e.c[this.e] / e.c[this.a];
        break;
    case 4:
        e.c[this.n] = e.c[this.e] % e.c[this.a];
        break;
    case 5:
        e.c[this.n] = e.c[this.e] == e.c[this.a];
        break;
    case 6:
        e.c[this.n] = e.c[this.e] >= e.c[this.a];
        break;
    case 7:
        e.c[this.n] = e.c[this.e] || e.c[this.a];
        break;
    case 8:
        e.c[this.n] = e.c[this.e] && e.c[this.a];
        break;
    case 9:
        e.c[this.n] = e.c[this.e] !== e.c[this.a];
        break;
    case 10:
        e.c[this.n] = s(e.c[this.e]);
        break;
    case 11:
        e.c[this.n] = e.c[this.e]in e.c[this.a];
        break;
    case 12:
        e.c[this.n] = e.c[this.e] > e.c[this.a];
        break;
    case 13:
        e.c[this.n] = -e.c[this.e];
        break;
    case 14:
        e.c[this.n] = e.c[this.e] < e.c[this.a];
        break;
    case 15:
        e.c[this.n] = e.c[this.e] & e.c[this.a];
        break;
    case 16:
        e.c[this.n] = e.c[this.e] ^ e.c[this.a];
        break;
    case 17:
        e.c[this.n] = e.c[this.e] << e.c[this.a];
        break;
    case 18:
        e.c[this.n] = e.c[this.e] >>> e.c[this.a];
        break;
    case 19:
        e.c[this.n] = e.c[this.e] | e.c[this.a]
    }
}
,
e.prototype.M = function(e) {
    e.r.push(e.o),
    e.B.push(e.k),
    e.o = e.c[this.i],
    e.k = [];
    for (var t = 0; t < this.h; t++)
        e.k.unshift(e.f.pop());
    e.u.push(e.f),
    e.f = []
}
,
a.prototype.M = function(e) {
    e.o = e.r.pop(),
    e.k = e.B.pop(),
    e.f = e.u.pop()
}
,
c.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.C = e.c[this.n] >= e.c[this.e];
        break;
    case 1:
        e.C = e.c[this.n] <= e.c[this.e];
        break;
    case 2:
        e.C = e.c[this.n] > e.c[this.e];
        break;
    case 3:
        e.C = e.c[this.n] < e.c[this.e];
        break;
    case 4:
        e.C = e.c[this.n] == e.c[this.e];
        break;
    case 5:
        e.C = e.c[this.n] != e.c[this.e];
        break;
    case 6:
        e.C = e.c[this.n];
        break;
    case 7:
        e.C = !e.c[this.n]
    }
}
,
o.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.o = this.A;
        break;
    case 1:
        e.C && (e.o = this.A);
        break;
    case 2:
        e.C || (e.o = this.A);
        break;
    case 3:
        e.o = this.A,
        e.Q = null
    }
    e.C = !1
}
,
r.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        for (var t = [], n = 0; n < this.h; n++)
            t.unshift(e.f.pop());
        e.c[3] = e.c[this.i](t[0], t[1]);
        break;
    case 1:
        for (var r = e.f.pop(), o = [], i = 0; i < this.h; i++)
            o.unshift(e.f.pop());
        e.c[3] = e.c[this.i][r](o[0], o[1]);
        break;
    case 2:
        for (var a = [], c = 0; c < this.h; c++)
            a.unshift(e.f.pop());
        e.c[3] = new e.c[this.i](a[0],a[1])
    }
}
,
k.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.f.push(e.c[this.i]);
        break;
    case 1:
        e.f.push(this.h);
        break;
    case 2:
        e.f.push(e.k[this.A]);
        break;
    case 3:
        e.f.push(e.g[this.A])
    }
}
,
B.prototype.M = function(t) {
    switch (this.s) {
    case 0:
        var s = t.f.pop();
        t.c[this.n] = t.c[this.e][s];
        break;
    case 1:
        var i = t.f.pop()
          , h = t.f.pop();
        t.c[this.e][i] = h;
        break;
    case 2:
        var A = t.f.pop();
        if(A === 'window') {
            A = {
                encodeURIComponent: function (url) {
                    return encodeURIComponent(url)
                }
            }
        } else if (A === 'navigator') {
            A = {
                'userAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
                    '(KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
            }
        }
        t.c[this.n] = eval(A)
    }
}
,
f.prototype.M = function(e) {
    e.c[this.i] = e.g[this.A]
}
,
u.prototype.M = function(e) {
    e.Q = this.A
}
,
C.prototype.M = function(e) {
    throw e.c[this.i]
}
,
b.prototype.M = function(e) {
    var t = this
      , n = [0];
    e.k.forEach(function(e) {
        n.push(e)
    });
    var r = function(r) {
        var o = new G;
        return o.k = n,
        o.k[0] = r,
        o.J(e.b, t.A, e.g, e.w),
        o.c[3]
    };
    r.toString = function() {
        return "() { [native code] }"
    }
    ,
    e.c[3] = r
}
,
g.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        for (var t = {}, n = 0; n < this.h; n++) {
            var r = e.f.pop();
            t[e.f.pop()] = r
        }
        e.c[this.i] = t;
        break;
    case 1:
        for (var o = [], i = 0; i < this.h; i++)
            o.unshift(e.f.pop());
        e.c[this.i] = o
    }
}
,
G.prototype.v = function(e) {
    for (var t = Buffer.from(e, 'base64').toString('binary'), n = [], r = 0; r < t.length - 1; r += 2)
        n.push(t.charCodeAt(r) << 8 | t.charCodeAt(r + 1));
    this.b = n
}
,
G.prototype.y = function(e) {
    for (var t = Buffer.from(e, 'base64').toString('binary'), n = 66, r = [], o = 0; o < t.length; o++) {
        var i = 24 ^ t.charCodeAt(o) ^ n;
        r.push(String.fromCharCode(i)),
        n = i
    }
    return r.join("")
}
,
G.prototype.F = function(e) {
    var t = this;
    this.g = e.map(function(e) {
        return "string" == typeof e ? t.y(e) : e
    })
}
,
G.prototype.J = function(e, t, n) {
    for (t = t || 0,
    n = n || [],
    this.o = t,
    "string" == typeof e ? (this.F(n),
    this.v(e)) : (this.b = e,
    this.g = n),
    this.G = !0,
    this.x = Date.now(); this.G; ) {
        var r = this.b[this.o++];
        if ("number" != typeof r)
            break;
        var o = Date.now();
        if (500 < o - this.x)
            return;
        this.x = o;
        try {
            this.M(r)
        } catch (e) {
            if (this.R = e,
            !this.Q)
                throw "execption at " + this.o + ": " + e;
            this.o = this.Q
        }
    }
}
,
G.prototype.M = function(e) {
    var t = (61440 & e) >> 12;
    new this.D[t](e).M(this)
}
,
(new G).J("4AeTAJwAqACcAaQAAAAYAJAAnAKoAJwDgAWTACwAnAKoACACGAESOTRHkQAkAbAEIAMYAJwFoAASAzREJAQYBBIBNEVkBnCiGAC0BjRAJAAYBBICNEVkBnDGGAC0BzRAJACwCJAAnAmoAJwKoACcC4ABnAyMBRAAMwZgBnESsA0aADRAkQAkABgCnA6gABoCnA+hQDRHGAKcEKAAMQdgBnFasBEaADRAkQAkABgCnBKgABoCnBOhQDRHZAZxkrAUGgA0QJEAJAAYApwVoABgBnG6sBYaADRAkQAkABgCnBegAGAGceKwGBoANECRACQAnAmoAJwZoABgBnIOsBoaADRAkQAkABgCnBugABoCnByhQDRHZAZyRrAdGgA0QJEAJAAQACAFsB4gBhgAnAWgABIBNEEkBxgHEgA0RmQGdJoQCBoFFAE5gCgFFAQ5hDSCJAgYB5AAGACcH4AFGAEaCDRSEP8xDzMQIAkQCBoFFAE5gCgFFAQ5hDSCkQAkCBgBGgg0UhD/MQ+QACAIGAkaBxQBOYGSABoAnB+EBRoIN1AUCDmRNJMkCRAIGgUUATmAKAUUBDmENIKRACQIGAEaCDRSEP8xD5AAIAgYCRoHFAI5gZIAGgCcH4QFGgg3UBQQOZE0kyQJGAMaCRQ/OY+SABoGnCCEBTTAJAMYAxoJFAY5khI/Nk+RABoGnCCEBTTAJAMYAxoJFAw5khI/Nk+RABoGnCCEBTTAJAMYAxoJFBI5khI/Nk+RABoGnCCEBTTAJAMYBxIDNEEkB3JsHgNQAA==", 0, ["BRgg", "BSITFQkTERw=", "LQYfEhMA", "PxMVFBMZKB8DEjQaBQcZExMC", "", "NhETEQsE", "Whg=", "Wg==", "MhUcHRARDhg=", "NBcPBxYeDQMF", "Lx4ODys+GhMC", "LgM7OwAKDyk6Cg4=", "Mx8SGQUvMQ==", "SA==", "ORoVGCQgERcCAxo=", "BTcAERcCAxo=", "BRg3ABEXAgMaFAo=", "SQ==", "OA8LGBsP", "GC8LGBsP", "Tg==", "PxAcBQ==", "Tw==", "KRsJDgE=", "TA==", "LQofHg4DBwsP", "TQ==", "PhMaNCwZAxoUDQUeGQ==", "PhMaNCwZAxoUDQUeGTU0GQIeBRsYEQ8=", "Qg==", "BWpUGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZG1MbGR8ZGxkXGRFpGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZGw==", "ORMRCyk0Exk8LQ==", "ORMRCyst"]);
var Q = function(e) {
    return __g._encrypt(e)
};

encrypt.js

(八)爬虫之js调试(登陆知乎) (八)爬虫之js调试(登陆知乎)

#准备表单数据
    timestamp = int(1000*time.time())
    data_dict = {
        "captcha": "",
        "client_id": "c3cef7c66a1843f8b3a9e6a1e3160e20",
        "grant_type": "password",
        "lang": "en",
        "password": "你的密码",
        "ref_source": "homepage",
        "signature": "",
        "source": "com.zhihu.web",
        "timestamp": timestamp,
        "username": "你的用户名",
        "utm_source": "",
    }
#将表单数据加密
    with open("encrypt.js",'r') as f:
        #os.environ["EXECJS_RUNTIME"] = "Node"
        # os.environ["NODE_PATH"] = r"D:\nodejs\node_modules"
        #print execjs.get().name
        js = execjs.compile(f.read().decode("utf-8"))  #传入unicode字符
        data = js.call(u'Q',urlencode(data_dict)) #data_dict为表单数据

表单加密

(八)爬虫之js调试(登陆知乎)

5. 加密数据提交

  拿到所有数据后,可以提交post请求了,需要注意的有三个地方:

    1.表单中参数的大小写和拼写要注意了 (我开始将client_id写成了clientId,报错找不到client_id参数)

    2. 请求头headers必须需要"content-type":",'x-zse-83',"x-xsrftoken"三个参数

    3. 需要带上cookie,最主要的是cookie中的cookies["capsion_ticket"]不能少,可以利用获取验证码时返回的cookie

  最后完整代码如下:

(八)爬虫之js调试(登陆知乎) (八)爬虫之js调试(登陆知乎)

#coding:utf-8

#登陆并爬取知乎

import requests
import time 
import hmac
import hashlib
from urllib import urlencode
import execjs #安装PyExecJS模块
import os
import json
import re
import base64
from PIL import Image
import matplotlib.pyplot as plt
import threading



def get_signature(grantType,clientId,source,timestamp):
    h = hmac.new("d1b964811afb40118a12068ff74a12f4","",hashlib.sha1)
    h.update(grantType+clientId+source+str(timestamp))
    return h.hexdigest()

def get_captcha(lang,headers):
    if lang=="cn":
        api = "https://www.zhihu.com/api/v3/oauth/captcha?lang=cn"
    else:
        api = "https://www.zhihu.com/api/v3/oauth/captcha?lang=en"
        
    ret = requests.get(api,headers=headers)
    cookies = ret.cookies
    show_captcha = re.search("true",ret.text)
    captcha=""
    if show_captcha:
        img_res = requests.put(api,headers=headers,cookies=cookies)  #得带上第一步的cookie,否则返回,{u'code': 120002, u'name': u'ERR_CAPSION_TICKET_NOT_FOUND'}
        img_json = json.loads(img_res.text)
        img_data = img_json["img_base64"].replace("\n","")
        with open("captcha.jpg","wb") as i:
            i.write(base64.b64decode(img_data))
        img = Image.open("captcha.jpg")
        if lang=="cn":
            plt.imshow(img)
            print("点击图片中所有倒立的汉字,在命令行中按回车键提交")
            points = plt.ginput(7) #阻塞点击七次后返回(或者中途点击回车键返回),返回包含坐标组的列表,格式:[(44.661290322580641, 49.951612903225794)]
            captcha = json.dumps({
                "img_size":[200,44],
                "input_points":[[i[0]/2,i[1]/2] for i in points]  #获取的坐标得除2
            })
            
        else:
            img_thread = threading.Thread(target=img.show)
            img_thread.setDaemon(True)
            img_thread.start()
            captcha = raw_input("请输入图片里的验证码:")  #python 2.7
        r = requests.post(api,headers=headers,data={"input_text":captcha},cookies=cookies)  #先提交验证码结果
        print(r.text)
    return captcha,cookies
def get_xsrf(headers):
    url="https://www.zhihu.com/"
    r = requests.get(url,headers=headers,allow_redirects=False)  #禁止重定向时,cookies里面有xsrf参数
    xsrf = r.cookies["_xsrf"]    
    
def login(lang):
    headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36",}
    
    #准备表单数据
    timestamp = int(1000*time.time())
    data_dict = {
        "captcha": "",
        "client_id": "c3cef7c66a1843f8b3a9e6a1e3160e20",
        "grant_type": "password",
        "lang": "en",
        "password": "xxxxx",
        "ref_source": "homepage",
        "signature": "",
        "source": "com.zhihu.web",
        "timestamp": timestamp,
        "username": "xxxxxxxx",
        "utm_source": "",
    }
    data_dict["signature"] = get_signature(data_dict["grant_type"],data_dict["client_id"],data_dict["source"],timestamp)
    data_dict["captcha"],cookies = get_captcha(lang,headers)
    
    #将表单数据加密
    with open("encrypt.js",'r') as f:
        #os.environ["EXECJS_RUNTIME"] = "Node"
        # os.environ["NODE_PATH"] = r"D:\nodejs\node_modules"
        #print execjs.get().name
        js = execjs.compile(f.read().decode("utf-8"))  #传入unicode字符
        data = js.call(u'Q',urlencode(data_dict)) #data_dict为表单数据
        print(data)
    
    #准备请求头
    xsrf = get_xsrf(headers)
    header={
        "content-type":"application/x-www-form-urlencoded",
        #"Referer":"https://www.zhihu.com/signin",
        'x-zse-83': '3_1.1',
        "x-xsrftoken":xsrf,    
    }
    headers.update(header)
    sign_url = "https://www.zhihu.com/api/v3/oauth/sign_in"
    response = requests.post(url=sign_url,headers=headers,data=data,cookies=cookies)   #cookies["capsion_ticket"]不能少
    print(response.status_code)
    print(response.text)
    
if __name__=="__main__":
    login("cn")  #也可以为en

知乎登陆

(八)爬虫之js调试(登陆知乎) (八)爬虫之js调试(登陆知乎)

function s(e) {
    return (s = "function" == typeof Symbol && "symbol" == typeof Symbol.t ? function(e) {
        return typeof e
    }
    : function(e) {
        return e && "function" == typeof Symbol && e.constructor === Symbol && e !== Symbol.prototype ? "symbol" : typeof e
    }
    )(e)
}
function i() {}
function h(e) {
    this.s = (2048 & e) >> 11,
    this.i = (1536 & e) >> 9,
    this.h = 511 & e,
    this.A = 511 & e
}
function A(e) {
    this.i = (3072 & e) >> 10,
    this.A = 1023 & e
}
function n(e) {
    this.n = (3072 & e) >> 10,
    this.e = (768 & e) >> 8,
    this.a = (192 & e) >> 6,
    this.s = 63 & e
}
function e(e) {
    this.i = e >> 10 & 3,
    this.h = 1023 & e
}
function a() {}
function c(e) {
    this.n = (3072 & e) >> 10,
    this.e = (768 & e) >> 8,
    this.a = (192 & e) >> 6,
    this.s = 63 & e
}
function o(e) {
    this.A = (4095 & e) >> 2,
    this.s = 3 & e
}
function r(e) {
    this.i = e >> 10 & 3,
    this.h = e >> 2 & 255,
    this.s = 3 & e
}
function k(e) {
    this.s = (4095 & e) >> 10,
    this.i = (1023 & e) >> 8,
    this.h = 1023 & e,
    this.A = 63 & e
}
function B(e) {
    this.s = (4095 & e) >> 10,
    this.n = (1023 & e) >> 8,
    this.e = (255 & e) >> 6
}
function f(e) {
    this.i = (3072 & e) >> 10,
    this.A = 1023 & e
}
function u(e) {
    this.A = 4095 & e
}
function C(e) {
    this.i = (3072 & e) >> 10
}
function b(e) {
    this.A = 4095 & e
}
function g(e) {
    this.s = (3840 & e) >> 8,
    this.i = (192 & e) >> 6,
    this.h = 63 & e
}
function G() {
    this.c = [0, 0, 0, 0],
    this.o = 0,
    this.r = [],
    this.k = [],
    this.B = [],
    this.f = [],
    this.u = [],
    this.C = !1,
    this.b = [],
    this.g = [],
    this.G = !1,
    this.Q = null,
    this.R = null,
    this.w = [],
    this.x = 0,
    this.D = {
        0: i,
        1: h,
        2: A,
        3: n,
        4: e,
        5: a,
        6: c,
        7: o,
        8: r,
        9: k,
        10: B,
        11: f,
        12: u,
        13: C,
        14: b,
        15: g
    }
}
Object.defineProperty(exports, "__esModule", {
    value: !0
});
var t = "1.1"
  , __g = {};
i.prototype.M = function(e) {
    e.G = !1
}
,
h.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.c[this.i] = this.h;
        break;
    case 1:
        e.c[this.i] = e.k[this.A]
    }
}
,
A.prototype.M = function(e) {
    e.k[this.A] = e.c[this.i]
}
,
n.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.c[this.n] = e.c[this.e] + e.c[this.a];
        break;
    case 1:
        e.c[this.n] = e.c[this.e] - e.c[this.a];
        break;
    case 2:
        e.c[this.n] = e.c[this.e] * e.c[this.a];
        break;
    case 3:
        e.c[this.n] = e.c[this.e] / e.c[this.a];
        break;
    case 4:
        e.c[this.n] = e.c[this.e] % e.c[this.a];
        break;
    case 5:
        e.c[this.n] = e.c[this.e] == e.c[this.a];
        break;
    case 6:
        e.c[this.n] = e.c[this.e] >= e.c[this.a];
        break;
    case 7:
        e.c[this.n] = e.c[this.e] || e.c[this.a];
        break;
    case 8:
        e.c[this.n] = e.c[this.e] && e.c[this.a];
        break;
    case 9:
        e.c[this.n] = e.c[this.e] !== e.c[this.a];
        break;
    case 10:
        e.c[this.n] = s(e.c[this.e]);
        break;
    case 11:
        e.c[this.n] = e.c[this.e]in e.c[this.a];
        break;
    case 12:
        e.c[this.n] = e.c[this.e] > e.c[this.a];
        break;
    case 13:
        e.c[this.n] = -e.c[this.e];
        break;
    case 14:
        e.c[this.n] = e.c[this.e] < e.c[this.a];
        break;
    case 15:
        e.c[this.n] = e.c[this.e] & e.c[this.a];
        break;
    case 16:
        e.c[this.n] = e.c[this.e] ^ e.c[this.a];
        break;
    case 17:
        e.c[this.n] = e.c[this.e] << e.c[this.a];
        break;
    case 18:
        e.c[this.n] = e.c[this.e] >>> e.c[this.a];
        break;
    case 19:
        e.c[this.n] = e.c[this.e] | e.c[this.a]
    }
}
,
e.prototype.M = function(e) {
    e.r.push(e.o),
    e.B.push(e.k),
    e.o = e.c[this.i],
    e.k = [];
    for (var t = 0; t < this.h; t++)
        e.k.unshift(e.f.pop());
    e.u.push(e.f),
    e.f = []
}
,
a.prototype.M = function(e) {
    e.o = e.r.pop(),
    e.k = e.B.pop(),
    e.f = e.u.pop()
}
,
c.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.C = e.c[this.n] >= e.c[this.e];
        break;
    case 1:
        e.C = e.c[this.n] <= e.c[this.e];
        break;
    case 2:
        e.C = e.c[this.n] > e.c[this.e];
        break;
    case 3:
        e.C = e.c[this.n] < e.c[this.e];
        break;
    case 4:
        e.C = e.c[this.n] == e.c[this.e];
        break;
    case 5:
        e.C = e.c[this.n] != e.c[this.e];
        break;
    case 6:
        e.C = e.c[this.n];
        break;
    case 7:
        e.C = !e.c[this.n]
    }
}
,
o.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.o = this.A;
        break;
    case 1:
        e.C && (e.o = this.A);
        break;
    case 2:
        e.C || (e.o = this.A);
        break;
    case 3:
        e.o = this.A,
        e.Q = null
    }
    e.C = !1
}
,
r.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        for (var t = [], n = 0; n < this.h; n++)
            t.unshift(e.f.pop());
        e.c[3] = e.c[this.i](t[0], t[1]);
        break;
    case 1:
        for (var r = e.f.pop(), o = [], i = 0; i < this.h; i++)
            o.unshift(e.f.pop());
        e.c[3] = e.c[this.i][r](o[0], o[1]);
        break;
    case 2:
        for (var a = [], c = 0; c < this.h; c++)
            a.unshift(e.f.pop());
        e.c[3] = new e.c[this.i](a[0],a[1])
    }
}
,
k.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.f.push(e.c[this.i]);
        break;
    case 1:
        e.f.push(this.h);
        break;
    case 2:
        e.f.push(e.k[this.A]);
        break;
    case 3:
        e.f.push(e.g[this.A])
    }
}
,
B.prototype.M = function(t) {
    switch (this.s) {
    case 0:
        var s = t.f.pop();
        t.c[this.n] = t.c[this.e][s];
        break;
    case 1:
        var i = t.f.pop()
          , h = t.f.pop();
        t.c[this.e][i] = h;
        break;
    case 2:
        var A = t.f.pop();
        if(A === 'window') {
            A = {
                encodeURIComponent: function (url) {
                    return encodeURIComponent(url)
                }
            }
        } else if (A === 'navigator') {
            A = {
                'userAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
                    '(KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
            }
        }
        t.c[this.n] = eval(A)
    }
}
,
f.prototype.M = function(e) {
    e.c[this.i] = e.g[this.A]
}
,
u.prototype.M = function(e) {
    e.Q = this.A
}
,
C.prototype.M = function(e) {
    throw e.c[this.i]
}
,
b.prototype.M = function(e) {
    var t = this
      , n = [0];
    e.k.forEach(function(e) {
        n.push(e)
    });
    var r = function(r) {
        var o = new G;
        return o.k = n,
        o.k[0] = r,
        o.J(e.b, t.A, e.g, e.w),
        o.c[3]
    };
    r.toString = function() {
        return "() { [native code] }"
    }
    ,
    e.c[3] = r
}
,
g.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        for (var t = {}, n = 0; n < this.h; n++) {
            var r = e.f.pop();
            t[e.f.pop()] = r
        }
        e.c[this.i] = t;
        break;
    case 1:
        for (var o = [], i = 0; i < this.h; i++)
            o.unshift(e.f.pop());
        e.c[this.i] = o
    }
}
,
G.prototype.v = function(e) {
    for (var t = Buffer.from(e, 'base64').toString('binary'), n = [], r = 0; r < t.length - 1; r += 2)
        n.push(t.charCodeAt(r) << 8 | t.charCodeAt(r + 1));
    this.b = n
}
,
G.prototype.y = function(e) {
    for (var t = Buffer.from(e, 'base64').toString('binary'), n = 66, r = [], o = 0; o < t.length; o++) {
        var i = 24 ^ t.charCodeAt(o) ^ n;
        r.push(String.fromCharCode(i)),
        n = i
    }
    return r.join("")
}
,
G.prototype.F = function(e) {
    var t = this;
    this.g = e.map(function(e) {
        return "string" == typeof e ? t.y(e) : e
    })
}
,
G.prototype.J = function(e, t, n) {
    for (t = t || 0,
    n = n || [],
    this.o = t,
    "string" == typeof e ? (this.F(n),
    this.v(e)) : (this.b = e,
    this.g = n),
    this.G = !0,
    this.x = Date.now(); this.G; ) {
        var r = this.b[this.o++];
        if ("number" != typeof r)
            break;
        var o = Date.now();
        if (500 < o - this.x)
            return;
        this.x = o;
        try {
            this.M(r)
        } catch (e) {
            if (this.R = e,
            !this.Q)
                throw "execption at " + this.o + ": " + e;
            this.o = this.Q
        }
    }
}
,
G.prototype.M = function(e) {
    var t = (61440 & e) >> 12;
    new this.D[t](e).M(this)
}
,
(new G).J("4AeTAJwAqACcAaQAAAAYAJAAnAKoAJwDgAWTACwAnAKoACACGAESOTRHkQAkAbAEIAMYAJwFoAASAzREJAQYBBIBNEVkBnCiGAC0BjRAJAAYBBICNEVkBnDGGAC0BzRAJACwCJAAnAmoAJwKoACcC4ABnAyMBRAAMwZgBnESsA0aADRAkQAkABgCnA6gABoCnA+hQDRHGAKcEKAAMQdgBnFasBEaADRAkQAkABgCnBKgABoCnBOhQDRHZAZxkrAUGgA0QJEAJAAYApwVoABgBnG6sBYaADRAkQAkABgCnBegAGAGceKwGBoANECRACQAnAmoAJwZoABgBnIOsBoaADRAkQAkABgCnBugABoCnByhQDRHZAZyRrAdGgA0QJEAJAAQACAFsB4gBhgAnAWgABIBNEEkBxgHEgA0RmQGdJoQCBoFFAE5gCgFFAQ5hDSCJAgYB5AAGACcH4AFGAEaCDRSEP8xDzMQIAkQCBoFFAE5gCgFFAQ5hDSCkQAkCBgBGgg0UhD/MQ+QACAIGAkaBxQBOYGSABoAnB+EBRoIN1AUCDmRNJMkCRAIGgUUATmAKAUUBDmENIKRACQIGAEaCDRSEP8xD5AAIAgYCRoHFAI5gZIAGgCcH4QFGgg3UBQQOZE0kyQJGAMaCRQ/OY+SABoGnCCEBTTAJAMYAxoJFAY5khI/Nk+RABoGnCCEBTTAJAMYAxoJFAw5khI/Nk+RABoGnCCEBTTAJAMYAxoJFBI5khI/Nk+RABoGnCCEBTTAJAMYBxIDNEEkB3JsHgNQAA==", 0, ["BRgg", "BSITFQkTERw=", "LQYfEhMA", "PxMVFBMZKB8DEjQaBQcZExMC", "", "NhETEQsE", "Whg=", "Wg==", "MhUcHRARDhg=", "NBcPBxYeDQMF", "Lx4ODys+GhMC", "LgM7OwAKDyk6Cg4=", "Mx8SGQUvMQ==", "SA==", "ORoVGCQgERcCAxo=", "BTcAERcCAxo=", "BRg3ABEXAgMaFAo=", "SQ==", "OA8LGBsP", "GC8LGBsP", "Tg==", "PxAcBQ==", "Tw==", "KRsJDgE=", "TA==", "LQofHg4DBwsP", "TQ==", "PhMaNCwZAxoUDQUeGQ==", "PhMaNCwZAxoUDQUeGTU0GQIeBRsYEQ8=", "Qg==", "BWpUGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZG1MbGR8ZGxkXGRFpGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZGw==", "ORMRCyk0Exk8LQ==", "ORMRCyst"]);
var Q = function(e) {
    return __g._encrypt(e)
};

encrypt.js

参考: https://zhuanlan.zhihu.com/p/57375111

   https://zhuanlan.zhihu.com/p/34073256

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
Wesley13 Wesley13
3年前
java将前端的json数组字符串转换为列表
记录下在前端通过ajax提交了一个json数组的字符串,在后端如何转换为列表。前端数据转化与请求varcontracts{id:'1',name:'yanggb合同1'},{id:'2',name:'yanggb合同2'},{id:'3',name:'yang
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
待兔 待兔
6个月前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Stella981 Stella981
3年前
Python之time模块的时间戳、时间字符串格式化与转换
Python处理时间和时间戳的内置模块就有time,和datetime两个,本文先说time模块。关于时间戳的几个概念时间戳,根据1970年1月1日00:00:00开始按秒计算的偏移量。时间元组(struct_time),包含9个元素。 time.struct_time(tm_y
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Wesley13 Wesley13
3年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
Python进阶者 Python进阶者
1年前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这