前言

  好几天没有写博客了,最近事情也有点多,又要准备党员之星的评选,花城汇的项目还有很多未完成的工作。
  其实Pluralsight下载器只用了三天就做好了,采用和青年之声爬虫相同的方法。
  先找到相应的接口地址,然后爬取需要的数据。
  如此一来就解决了之前的遗憾,那么既然那么早就做好了,为什么不尽早记录下来呢?
  都是Lynda下载器的锅,到目前为止也没成功用Python获取出相关的登录信息,因此搞了好几天。
  事到如今我佛了,我放弃了。

爬取方法

  以前没有办法爬取Pluralsight,关键原因在于我认为需要爬取网页而不是爬取接口数据。
  但是网页上的数据都是动态生成的,或是经过php甚至是js进行了动态处理。
  如此一来,如果无法模拟用户去浏览网页,就无法让一些特定的脚本运行。
  青年之声爬虫也遇到了相关的情况。
  青年之声的评论是通过瀑布流生成的,那么滚动网页数据就不会继续加载,那就无法爬取出网页中的内容了。
  但是接口地址就不同,全部都是后台调出的数据库数据,这就直接跳过了js的处理,获取了最原始的数据。

  Pluralsight也是使用相同的思路进行爬取。
  首先就是要找到相关的接口地址。

  方法也和青年之声的时候是一样的,我就不在进行截图了。
  打开浏览器调试窗口,找到network相关的链接,找xhr的链接地址
  xhr即时动态网页ajax数据地址传输的方式,通常数据都是用js的ajax来获取。

https://app.pluralsight.com/library/courses/python-desktop-application-development/table-of-contents
  这个就是Pluralsight一般情况下的课程地址,无需登录,任何用户都可以进行访问。
  在这里可以看到它首先链接一个很重要的地址
https://app.pluralsight.com/learner/content/courses/python-desktop-application-development
  这个地址会返回课程相关的json数据,当然也是无需登录的,毕竟课程的界面就是让js处理这些数据生成的。
课程json数据
  可以看到网页返回的json数据并没有经过排版,但是没关系VScode可以很方便地进行排版。
  一般来说就是先 ctrl + a 全选所有内容,在按 ctrl + k 和 ctrl + f,就可以自动排版内容。
  如果需要相关的插件,VScode会提醒。
  json可是VScode默认就可以进行排版。
json数据排版
  这样看前来就舒服多了。
  json数据还有一个好处就是可以用Python的json包处理数据,获取数据更加方便准确。(最重要的是不需要用正则表达式)

  后面获取数据的道路就非常艰辛了。
  最大的困难在于如何才能获取出用户登录的验证信息。
  当然这部分就是先登录,然后打开视频,在看后台的数据反馈。
  在这里我找打了关键链接
https://app.pluralsight.com/player/api/graphql
  然而这个链接连接了好几次,返回的数据也是截然不同。
首先获取到的数据
其次获取到的数据
  这就让我非常困惑。
  经过postman的测试,直接访问链接是获取不到任何数据的。
  那就需要找到链接之间的不同之处。
  我首先留意到了reqeust payload 的不同之处。
psotman Header 信息
postman raw json 信息
  经过一番搜查,我才知道如何在 postman 中添加相关的数据处理。
加入cookie获取到了数据
PsJwt-production
  但是可以出图片看到,并没有获取到想要的数据,而是返回了403。
  这又是为啥呢?
  经过了好一番的研究,我发现加入了cookie中的信息就可以获取到数据了
加入cookie获取到了数据
  所有的视频链接就获取出来了。
  那cookie当中肯定隐含了用户登录的信息验证,从而让服务器通过验证返回相关的数据。
  经过我删除cookie的数据进行测试,最后我发现了关键值 PsJwt-production
PsJwt-production
  那么只要获取到这个值,我就可以获取到所有的视频链接地址,从而实现下载。
  想到是用户信息的值所以想必也是在用户登录界面中获取的。
  于是去到登录界面,找到登录相关的地址
https://app.pluralsight.com/id/
form 表单
  可以看到用户登录的用户名和密码是通过 form 表单传输的
  在cookie中可以看到看到服务器返回的cookie信息
PsJwt
  那么我只需要用python截取这个信息就可以下载所有的教程。


  后来软件写得七七八八之后,我才想起来字幕也是可以获取的。
  那是我大一的时候梦寐以求的东西啊。
  因此决心已定要弄出可以用的字幕信息出来。
  同样在视频链接的地方可以找到相关的地址
https://app.pluralsight.com/transcript/api/v1/caption/json/5be6c21d-4d43-48b5-ad80-20cf9e0504c5/zh
字幕地址
  通过后缀可以获取不同翻译版本的字幕,具体后缀信息可以通过https://app.pluralsight.com/player/api/graphql 链接查询不同的数据返回出来。
  具体操作方法和上面一样,只是获取不同的payload就不再赘述
字幕地址
  最大的问题是字幕上的编号,经过对比查询这个编号是clipID,课程信息上面可以获取出来。
课程ID
  如此一来就可以获取相关的字幕信息。
  问题是字幕是以json的形式返回的,没有办法导入的视频软件中。
字幕json
  需要将它转换成srt格式。
srt格式
  其实这样转换也不是不行,毕竟json的数据还是非常好获取的。
  关键是要将秒转换为小时分钟秒的形式。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import requests
import json
import re
import os

DownloadURL = "https://app.pluralsight.com/library/courses/angular-fundamentals/table-of-contents"

reg = r'courses/(.*)'
courseIdReg = re.compile(reg)
courseId = re.findall(courseIdReg,DownloadURL)[0].split('/')[0]

url = "https://app.pluralsight.com/learner/content/courses/%s" % courseId

headers = {
'Cache-Control': "no-cache",
'Postman-Token': "ca8bb1e7-99b7-4b41-b7d2-2813bf834e2f"
}

response = requests.request("GET", url, headers=headers)

jsonFile = json.loads(response.text)


url = "https://app.pluralsight.com/transcript/api/v1/caption/json/%s/zh" % jsonFile['modules'][3]['clips'][8]['clipId']
url = "" % jsonFile['modules'][3]['clips'][8]['clipId']

headers = {
'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
'Cache-Control': "no-cache",
'Postman-Token': "9a592511-4c25-4a31-879b-eb3ae0ac6cca"
}

response = requests.request("GET", url, headers=headers)

CaptionFile = json.loads(response.text)

captionCount = 0
srtFIle = ""
for caption in CaptionFile:

# 如果当前获取到的是数字
if type(caption['displayTimeOffset']).__name__ == 'float':
m, s = divmod(caption['displayTimeOffset'], 60)
else:
# 如果当前获取的不是数字 就将获取前后数字的平均值
duration = (CaptionFile[captionCount-1]['displayTimeOffset'] + CaptionFile[captionCount+1]['displayTimeOffset'])/2
m, s = divmod(duration, 60)
h, m = divmod(m, 60)
startTime = "%02d:%02d:%02d" % (h, m, s)

# 如果是最后一个时间(json数据没有返回视频结束时间)
# 截取视频的长度,让字幕一致延伸到结束
if len(CaptionFile) == captionCount+1:
duration = float(re.findall(r"\d+\.?\d*",jsonFile['modules'][3]['clips'][8]['duration'])[0])
m, s = divmod(duration, 60)
else:
# 如果下一个时间是空的,进行相同的平均值处理
if type(CaptionFile[captionCount+1]['displayTimeOffset']).__name__ == 'float':
m, s = divmod(CaptionFile[captionCount+1]['displayTimeOffset'], 60)
else:
duration = (CaptionFile[captionCount]['displayTimeOffset'] + CaptionFile[captionCount+2]['displayTimeOffset'])/2
m, s = divmod(duration, 60)
h, m = divmod(m, 60)
endTime = "%02d:%02d:%02d" % (h, m, s)

srtFIle += "%s\n" % captionCount
srtFIle += "%s --> %s\n" % (startTime,endTime)
srtFIle += "%s\n\n" % caption['text']

captionCount += 1

dirname = os.path.dirname(__file__)
SRT_PATH = os.path.join(dirname, 'test.srt')

with open(SRT_PATH,'w', encoding='UTF-8') as f:
f.write(srtFIle)

print("complete")

  本来代码不应该如此复杂的,但是获取的过程中发现居然有些字幕有BUG。
  返回的时间居然有null值,可能是官方的BUG吧,这个问题将整个下载截停了。
  为了解决这个问题有写了好多代码进行判断取平均值才最终完成了结果。
  上面只是字幕下载Demo,下面是下载器的源码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4 import uic
import sys
import re
import xlwt
import requests
import time
import os
import traceback
import random
import json
import subprocess

dirname = os.path.dirname(__file__)
UI_PATH = os.path.join(dirname, 'PluralsightDownloader.ui')
TOKEN_PATH = os.path.join(dirname, 'token.verify')
form_class , base_class = uic.loadUiType(UI_PATH)

class PluralsightDownloader(base_class,form_class ):

def __init__(self):
super(PluralsightDownloader,self).__init__()
self.setupUi(self)

# self.Save_Location.setText(dirname)

self.Login_Btn.clicked.connect(self.login)
self.Svae_Btn.clicked.connect(self.browse_file)
self.Download_Btn.clicked.connect(self.download)
self.OpenDirectory_Btn.clicked.connect(self.openDirectory)
self.DownloadComboBox.currentIndexChanged.connect(self.changeDownLoad)
self.CaptionWidget.setVisible(False)
self.DownloadWidget.setVisible(True)

def changeDownLoad(self):
self.Save_Location.setText("")
type = self.DownloadComboBox.currentIndex()
if type == 0:
self.DownloadWidget.setVisible(True)
self.CaptionWidget.setVisible(False)
elif type == 3:
self.DownloadWidget.setVisible(False)
self.CaptionWidget.setVisible(True)
else:
self.DownloadWidget.setVisible(False)
self.CaptionWidget.setVisible(False)

def openDirectory(self):
Save_Location = self.Save_Location.text()
if Save_Location == "":
Save_Location = os.getcwd()
else:
Save_Location = os.path.split(Save_Location)[0]

if os.path.exists(Save_Location):
subprocess.call("explorer %s" % Save_Location, shell=True)
else:
QMessageBox.warning(self, "Warning", "路径不存在\n检查路径是否正确")

def login(self):
Username = self.Username.text()
Password = self.Password.text()

url = "https://app.pluralsight.com/id/"

payload = "------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"Username\"\r\n\r\n%s\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"Password\"\r\n\r\n%s\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW--" % (Username,Password)

headers = {
'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
'Cache-Control': "no-cache",
'Postman-Token': "cb0555f3-21dc-4833-8511-a64ee304df32"
}

response = requests.request("POST", url, data=payload, headers=headers)

try:
PsJwt = response.request.headers['Cookie'].split('; ')[1]
except Exception:
traceback.print_exc()
QMessageBox.warning(self, "Warning", "数据获取失败\n检查用户名和密码是否输入正确")
return

PsJwtEncrypt = ""
randEncrpt = ""

rand = random.randint(1,10000)

for letter in PsJwt:
PsJwtEncrypt += chr(ord(letter)+rand)
for letter in str(rand):
randEncrpt += chr(ord(letter)+100)

with open(TOKEN_PATH,'w', encoding='UTF-8') as f:
f.write(PsJwtEncrypt+"\n"+randEncrpt+"\n文件已经加密,请勿擅自修改")

QMessageBox.information(self, "Information", "数据写入完成")

def browse_file(self):
type = self.DownloadComboBox.currentIndex()

# 根据当前选择切换保存方案
if type == 0:
save_file = QFileDialog.getExistingDirectory(self, caption="保存文件到", directory=".")
elif type == 1:
save_file = QFileDialog.getSaveFileName(self, caption="保存文件到", directory=".",filter="jpg (*.jpg)")
elif type == 2:
save_file = QFileDialog.getSaveFileName(self, caption="保存文件到", directory=".",filter="Excel (*.xls)")
elif type == 3:
save_file = QFileDialog.getExistingDirectory(self, caption="保存文件到", directory=".")

self.Save_Location.setText(QDir.toNativeSeparators(save_file))

def download(self):
type = self.DownloadComboBox.currentIndex()
PsJwtEncrypt = []
try:

with open(TOKEN_PATH,'r',encoding='UTF-8') as f:
for line in f.readlines():
PsJwtEncrypt.append(line.strip('\n'))
except Exception:
traceback.print_exc()
QMessageBox.warning(self, "Warning", "秘钥文件打开失败\n请先在登录设置获取秘钥")
return

PsJwt = ""
rand = ""

try:

for letter in PsJwtEncrypt[1]:
rand += chr(ord(letter)-100)

for letter in PsJwtEncrypt[0].split('\n')[0]:
PsJwt += chr(ord(letter)-int(rand))

except Exception:
traceback.print_exc()
QMessageBox.warning(self, "Warning", "秘钥文件解码失败\n请重新生成新的秘钥")
return



Save_Location = self.Save_Location.text()
DownloadURL = self.DownloadURL.text()

if DownloadURL == "":
QMessageBox.warning(self, "Warning", "请输入网页URL")
return

reg = r'courses/(.*)'
courseIdReg = re.compile(reg)
courseId = re.findall(courseIdReg,DownloadURL)[0].split('/')[0]

url = "https://app.pluralsight.com/learner/content/courses/%s" % courseId

headers = {
'Cache-Control': "no-cache",
'Postman-Token': "ca8bb1e7-99b7-4b41-b7d2-2813bf834e2f"
}

response = requests.request("GET", url, headers=headers)

jsonFile = json.loads(response.text)

if type == 0:
self.Download_Speed.setText("")
Save_Location = self.Save_Location.text()

self.File_Path.setText(Save_Location)

# 根据教程的名称创建根目录
rstr = r"[\/\\\:\*\?\"\<\>\|]" # '/ \ : * ? " < > |'
courseTitle = re.sub(rstr, "_", jsonFile['title']) # 替换为下划线
courseTitle = courseTitle.strip()
directory = os.path.join(Save_Location,courseTitle)
if not os.path.exists(directory):
os.mkdir(directory)

# 获取教程图片
ImgCheckBox = self.ImgCheckBox.isChecked()
if ImgCheckBox:
courseImageUrl = jsonFile['courseImageUrl']
ImgDirectory = os.path.join(directory,courseTitle+".jpg")
self.File_Name.setText(courseTitle+".jpg")
self.ImgWrite(courseImageUrl,ImgDirectory)

# 获取Excel信息
ExcelCheckBox = self.ExcelCheckBox.isChecked()
if ExcelCheckBox:
ExcelDirectory = os.path.join(directory,courseTitle+".xls")
self.File_Name.setText(courseTitle+".xls")
self.ExcelWrite(jsonFile,PsJwt,ExcelDirectory)

# 获取教程字幕
ZH_Caption = self.ZH_Caption.isChecked()
EN_Caption = self.EN_Caption.isChecked()
if ZH_Caption or ZH_Caption:
self.CaptionWrite(jsonFile,directory,EN=EN_Caption,ZH=ZH_Caption)

modules = jsonFile['modules']
count = 0
moduleCount = 0
clipCount = 0
print("开始下载")
for module in modules:
# 创建模块文件夹
rstr = r"[\/\\\:\*\?\"\<\>\|]" # '/ \ : * ? " < > |'
moduleTitle = re.sub(rstr, "_", modules[moduleCount]['title']) # 替换为下划线
moduleTitle = moduleTitle.strip()
moduleDirectory = os.path.join(directory,"%s_%s" % (moduleCount+1,moduleTitle))
if not os.path.exists(moduleDirectory):
os.mkdir(moduleDirectory)

clipCount = 0
clips = module['clips']
for clip in clips:
start_time= time.time()
count += 1
rstr = r"[\/\\\:\*\?\"\<\>\|]" # '/ \ : * ? " < > |'
clipTitle = re.sub(rstr, "_", clip['title']).strip() # 替换为下划线
clipName = "%s_%s%s" % (count,clipTitle,".mp4")
self.File_Name.setText(clipName)
fileDirectory = os.path.join(moduleDirectory,clipName)
if not os.path.exists(fileDirectory):
# 解析真实下载地址
courseName = jsonFile['modules'][moduleCount]['id'].split('|')[0]
author = jsonFile['modules'][moduleCount]['id'].split('|')[1]
moduleName = jsonFile['modules'][moduleCount]['id'].split('|')[2]
clipIndex = jsonFile['modules'][moduleCount]['clips'][clipCount]['ordering']

# 获取 视频链接
url = "https://app.pluralsight.com/player/api/graphql"

# cookie 存放账号信息
headers = {
'Content-Type': "application/json",
'Cookie': PsJwt,
'Cache-Control': "no-cache",
'Postman-Token': "6609e6d1-132d-4f7c-b4ab-d8b392f39f9b"
}

payload = "{\"query\":\"\\n query viewClip {\\n viewClip(input: {\\n author: \\\"%s\\\", \\n clipIndex: %s, \\n courseName: \\\"%s\\\", \\n includeCaptions: false, \\n locale: \\\"en\\\", \\n mediaType: \\\"mp4\\\", \\n moduleName: \\\"%s\\\", \\n quality: \\\"1024x768\\\"\\n }) {\\n urls {\\n url\\n cdn\\n rank\\n source\\n },\\n status\\n }\\n }\\n \",\"variables\":{}}" % (author,clipIndex,courseName,moduleName)

try:
response = requests.request("POST", url, data=payload, headers=headers)
except Exception:
traceback.print_exc()
QMessageBox.warning(self, "Warning", "链接失败")
return

urlList = json.loads(response.text)

DownLoadChannel = self.DownLoadChannelComboBox.currentIndex()
r = requests.get(urlList['data']['viewClip']['urls'][DownLoadChannel]['url'],stream=True)

content_size = int(r.headers['content-length']) # 内容体总大小
downloadCount = 0
chunk_size = 1024
with open(fileDirectory, "wb") as video:
for chunk in r.iter_content(chunk_size=chunk_size):
if chunk:
downloadCount += 1
dural_time= round(time.time() - start_time,2)
speed = downloadCount * (chunk_size/1024)/dural_time
video.write(chunk)
percent = downloadCount * chunk_size/content_size * 100
print(int(percent))
self.Download_Progress.setValue(int(percent))
self.Download_Speed.setText(str(round(speed,2))+"kb/s")

clipCount += 1
print("%s - 下载完成" % clipName)

else:
print("%s 视频文件已存在- 跳过" % clipName)



moduleCount += 1

QMessageBox.warning(self, "Warning", "下载完成")
print("下载完成")

# 下载教程图片
elif type == 1:
Save_Location = self.Save_Location.text()
self.File_Path.setText(os.path.split(Save_Location)[0])
self.File_Name.setText(os.path.split(Save_Location)[1])
courseImageUrl = jsonFile['courseImageUrl']
self.ImgWrite(courseImageUrl,Save_Location)
QMessageBox.warning(self, "Warning", "下载完成")



# 保存教程信息到xls中
elif type == 2:
Save_Location = self.Save_Location.text()
self.Download_Speed.setText("")
self.File_Path.setText(os.path.split(Save_Location)[0])
self.File_Name.setText(os.path.split(Save_Location)[1])
self.ExcelWrite(jsonFile,PsJwt,Save_Location)
QMessageBox.information(self, "Information", "数据写入完成")

# 下载教程字幕
elif type == 3:
Save_Location = self.Save_Location.text()
ZH_Caption_1 = self.ZH_Caption_1.isChecked()
EN_Caption_1 = self.EN_Caption_1.isChecked()

if ZH_Caption_1 == False and EN_Caption_1 == False:
QMessageBox.information(self, "Information", "请选择一种字幕进行下载")
return

self.CaptionWrite(jsonFile,Save_Location,EN=EN_Caption_1,ZH=ZH_Caption_1)
QMessageBox.information(self, "Information", "数据写入完成")

def ImgWrite(self,courseImageUrl,Save_Location):
if not os.path.exists(Save_Location):
r = requests.get(courseImageUrl,stream=True)
start_time = time.time()
content_size = int(r.headers['content-length']) # 内容体总大小
count = 0
chunk_size = 1024
print("开始下载")
with open(Save_Location, "wb") as img:
for chunk in r.iter_content(chunk_size=chunk_size):
if chunk:
count += 1
dural_time= round(time.time() - start_time,2)
speed = count * (chunk_size/1024)/dural_time
img.write(chunk)
percent = count * chunk_size/content_size * 100
print(int(percent))
self.Download_Progress.setValue(int(percent))
self.Download_Speed.setText(str(round(speed,2))+"kb/s")

print("下载完成")
else:
print("%s 图片文件已存在 - 跳过"%Save_Location)

def ExcelWrite(self,jsonFile,PsJwt,Save_Location):
if not os.path.exists(Save_Location):
print("开始写入")

# 设置字体样式
font0 = xlwt.Font()
font0.name = '微软雅黑'

alignment = xlwt.Alignment()
alignment.horz = xlwt.Alignment.HORZ_CENTER
alignment.vert = xlwt.Alignment.VERT_CENTER

style0 = xlwt.XFStyle()
style0.font = font0
style0.alignment = alignment

# 创建 excel 对象
book = xlwt.Workbook(encoding='utf-8', style_compression=0)
# 创建 excel 表
sheet = book.add_sheet('test', cell_overwrite_ok=True)

row = 0
column = 0

sheet.write(row, column , "序号" , style0 )
sheet.write(row, column + 1 , "视频标题" , style0 )
sheet.write(row, column + 2 , "序号加视频标题" , style0 )
sheet.write(row, column + 3 , "视频播放地址" , style0 )
sheet.write(row, column + 4 , "模块标题" , style0 )

sheet.write(row, column + 5 , "下载通道1" , style0 )
sheet.write(row, column + 6 , "下载通道2" , style0 )
sheet.write(row, column + 7 , "下载通道3" , style0 )
sheet.write(row, column + 8 , "下载通道4" , style0 )

# 取消居中
alignment = xlwt.Alignment()
style0.alignment = alignment

modules = jsonFile['modules']
count = 0

for module in modules:
clips = module['clips']
for clip in clips:
count += 1

totalCount = count
count = 0
moduleCount = 0
clipCount = 0

for module in modules:
clipCount = 0
clips = module['clips']
for clip in clips:
count += 1
sheet.write(row + count, column , count , style0 )
sheet.write(row + count, column + 1 , clip['title'] , style0 )
sheet.write(row + count, column + 2 , "%s_%s" % (count,clip['title']) , style0 )
sheet.write(row + count, column + 3 , "https://app.pluralsight.com/%s" % clip['playerUrl'] , style0 )
sheet.write(row + count, column + 4 , clip['moduleTitle'] , style0 )

# 解析真实下载地址
courseName = jsonFile['modules'][moduleCount]['id'].split('|')[0]
author = jsonFile['modules'][moduleCount]['id'].split('|')[1]
moduleName = jsonFile['modules'][moduleCount]['id'].split('|')[2]
clipIndex = jsonFile['modules'][moduleCount]['clips'][clipCount]['ordering']

# 获取 视频链接
url = "https://app.pluralsight.com/player/api/graphql"

# cookie 存放账号信息
headers = {
'Content-Type': "application/json",
'Cookie': PsJwt,
'Cache-Control': "no-cache",
'Postman-Token': "6609e6d1-132d-4f7c-b4ab-d8b392f39f9b"
}

payload = "{\"query\":\"\\n query viewClip {\\n viewClip(input: {\\n author: \\\"%s\\\", \\n clipIndex: %s, \\n courseName: \\\"%s\\\", \\n includeCaptions: false, \\n locale: \\\"en\\\", \\n mediaType: \\\"mp4\\\", \\n moduleName: \\\"%s\\\", \\n quality: \\\"1024x768\\\"\\n }) {\\n urls {\\n url\\n cdn\\n rank\\n source\\n },\\n status\\n }\\n }\\n \",\"variables\":{}}" % (author,clipIndex,courseName,moduleName)

try:
response = requests.request("POST", url, data=payload, headers=headers)
except Exception:
traceback.print_exc()
# book.save(Save_Location)
QMessageBox.warning(self, "Warning", "链接失败 跳过当前项")
break

urlList = json.loads(response.text)

sheet.write(row + count, column + 5 , urlList['data']['viewClip']['urls'][0]['url'] , style0 )
sheet.write(row + count, column + 6 , urlList['data']['viewClip']['urls'][1]['url'] , style0 )
sheet.write(row + count, column + 7 , urlList['data']['viewClip']['urls'][2]['url'] , style0 )
sheet.write(row + count, column + 8 , urlList['data']['viewClip']['urls'][3]['url'] , style0 )

percent = count * 100 / totalCount
self.Download_Progress.setValue(int(percent))
print(int(percent))
clipCount += 1

moduleCount += 1

book.save(Save_Location)
print("写入完成")
else:
print("%s Excel文件已存在 - 跳过" % Save_Location)

def CaptionWrite(self,jsonFile,Save_Location,EN=True,ZH=False):
# 根据教程的名称创建根目录
CaptionDirectory = os.path.join(Save_Location,"EN_caption")
if not os.path.exists(CaptionDirectory):
os.mkdir(CaptionDirectory)

count = 0
modules = jsonFile['modules']
for module in modules:
clips = module['clips']
for clip in clips:

rstr = r"[\/\\\:\*\?\"\<\>\|]" # '/ \ : * ? " < > |'
clipTitle = re.sub(rstr, "_", clip['title']).strip() # 替换为下划线
SRT_PATH = os.path.join(CaptionDirectory, '%s_%s.srt' % (count+1,clipTitle))

# 判断文件是否存在
if not os.path.exists(SRT_PATH):
url = "https://app.pluralsight.com/transcript/api/v1/caption/json/%s" % clip['clipId']

headers = {
'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
'Cache-Control': "no-cache",
'Postman-Token': "c491c561-8e66-4cb8-b717-2bfd9b5b0d9a"
}

if EN == True:
EN_url = url + "/en"
try:
EN_response = requests.request("GET", EN_url, headers=headers)
EN_CaptionFile = json.loads(EN_response.text)
except Exception:
traceback.print_exc()
print ("链接超时,跳过当前下载")
break

captionCount = 0
srtFIle = ""
for caption in EN_CaptionFile:
typeName = type(caption['displayTimeOffset']).__name__
if typeName == 'float' or typeName == 'int':
m, s = divmod(caption['displayTimeOffset'], 60)
else:
duration = (EN_CaptionFile[captionCount-1]['displayTimeOffset'] + EN_CaptionFile[captionCount+1]['displayTimeOffset'])/2
m, s = divmod(duration, 60)
h, m = divmod(m, 60)
startTime = "%02d:%02d:%02d" % (h, m, s)

if len(EN_CaptionFile) == captionCount+1:
duration = float(re.findall(r"\d+\.?\d*",clip['duration'])[0])
m, s = divmod(duration, 60)
else:
typeName = type(EN_CaptionFile[captionCount+1]['displayTimeOffset']).__name__
if typeName == 'float' or typeName == 'int' :
m, s = divmod(EN_CaptionFile[captionCount+1]['displayTimeOffset'], 60)
else:
duration = (EN_CaptionFile[captionCount]['displayTimeOffset'] + EN_CaptionFile[captionCount+2]['displayTimeOffset'])/2
m, s = divmod(duration, 60)
h, m = divmod(m, 60)
endTime = "%02d:%02d:%02d" % (h, m, s)

srtFIle += "%s\n" % captionCount
srtFIle += "%s --> %s\n" % (startTime,endTime)
srtFIle += "%s\n\n" % caption['text']

captionCount += 1

with open(SRT_PATH,'w', encoding='UTF-8') as f:
f.write(srtFIle)

print ('%s_%s.srt - 写入完成' % (count+1,clip['title']))


if ZH == True:
ZH_url = url + "/zh"
try:
ZH_response = requests.request("GET", ZH_url, headers=headers)
ZH_CaptionFile = json.loads(ZH_response.text)
except Exception:
traceback.print_exc()
print ("链接超时,跳过当前下载")
break

captionCount = 0
srtFIle = ""
for caption in ZH_CaptionFile:
typeName = type(caption['displayTimeOffset']).__name__
if typeName == 'float' or typeName == 'int':
m, s = divmod(caption['displayTimeOffset'], 60)
else:
duration = (ZH_CaptionFile[captionCount-1]['displayTimeOffset'] + ZH_CaptionFile[captionCount+1]['displayTimeOffset'])/2
m, s = divmod(duration, 60)
h, m = divmod(m, 60)
startTime = "%02d:%02d:%02d" % (h, m, s)

if len(ZH_CaptionFile) == captionCount+1:
duration = float(re.findall(r"\d+\.?\d*",clip['duration'])[0])
m, s = divmod(duration, 60)
else:
typeName = type(ZH_CaptionFile[captionCount+1]['displayTimeOffset']).__name__
if typeName == 'float' or typeName == 'int':
m, s = divmod(ZH_CaptionFile[captionCount+1]['displayTimeOffset'], 60)
else:
duration = (ZH_CaptionFile[captionCount]['displayTimeOffset'] + ZH_CaptionFile[captionCount+2]['displayTimeOffset'])/2
m, s = divmod(duration, 60)
h, m = divmod(m, 60)
endTime = "%02d:%02d:%02d" % (h, m, s)

srtFIle += "%s\n" % captionCount
srtFIle += "%s --> %s\n" % (startTime,endTime)
srtFIle += "%s\n\n" % caption['text']

captionCount += 1

# 根据教程的名称创建根目录
CaptionDirectory = os.path.join(Save_Location,"ZH_caption")
if not os.path.exists(CaptionDirectory):
os.mkdir(CaptionDirectory)
rstr = r"[\/\\\:\*\?\"\<\>\|]" # '/ \ : * ? " < > |'
clipTitle = re.sub(rstr, "_", clip['title']).strip() # 替换为下划线
SRT_PATH = os.path.join(CaptionDirectory, '%s_%s.srt' % (count+1,clipTitle))

with open(SRT_PATH,'w', encoding='UTF-8') as f:
f.write(srtFIle)

print ('%s_%s.srt - 写入完成' % (count+1,clip['title']))

else:
print("%s srt文件已存在 - 跳过" % SRT_PATH)

count += 1




app = QApplication(sys.argv)
dl = PluralsightDownloader()
dl.show()
app.exec_()

srt格式
srt格式
  软件界面是这样的,当你输入账号密码之后会生成一个 token.verify 文件
srt格式
  文件已经经过加密,存储的就是PsJwt的值
  加密方法可以参考我的源码,我就不再赘述。加密参考
  生成之后软件就会自动读取文件,然后进行下载。

总结

  最近一直尝试攻克Lynda,制作Lynda下载器,然而却一直不成功。
  Lynda的验证信息已经找到了,是token的值
  但是我却无法通过Python获取到相关的token值,已经测试了很多方法,依旧无能为力。
  毕竟Python的方法是依托于图书馆之下的账号,而不是Pluralsight那样的个人账号,验证更加繁琐。
  经过了差不多三天的研究,我放弃了。

  下周就要去深圳,很快就迎来我的实习岗位了,有点小激动。
  希望自己再接再厉,努力取得公司的认可。