BilibiliDanmu2Ass: Convert Bilibili's Danmu to ASS Subtitle

This article will introduce Bilibili's bullet screen mechanism and the method of converting bullet screens into ASS subtitles.

前言

本文将介绍Bilibili的弹幕机制与将弹幕转换为ASS字幕的方法

项目地址:🚀Github

Bilibili的弹幕机制

获取弹幕

Bilibili的弹幕储存于’https://comment.bilibili.com/视频的cid.xml'

在Bilibili上,每个视频拥有自己的av编号即所谓aid,而如果是合集的视频其aid会重合,不便于区分,所以Bilibili为每个视频(包括合集中的分视频)创建了其独有的cid

至于cid如何获取,只需要在视频播放页面的源代码中,找到cid字段后面的值即可

get-cid-from-browser.png

1
2
3
4
5
6
7
def get_cid(url):
    headers = {"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"}
    html = requests.get(url, headers = headers).content.decode('utf-8')
    if "cid" in html:
        return re.search(r'"cid":(\d*)', html).group(1)
    else:
        return "x"

弹幕格式

danmu-xml.png

我们把这一条弹幕再抽象一下,也就是:

1
danmu = '<d p="弹幕出现的时间,弹幕的模式,弹幕的字体大小,弹幕的颜色,Unix时间戳,弹幕池,弹幕发送者的ID,弹幕在弹幕数据库中的rowID">弹幕的内容</d>'
  • 弹幕出现的时间,以秒为单位
  • 弹幕的模式:1~3 滚动弹幕 4 底端弹幕 5 顶端弹幕 6 逆向弹幕 7 精准定位 8 高级弹幕
  • 弹幕的字体大小:12 非常小 16 特小 18 小 25 中 36 大 45 很大 64 特别大
  • 弹幕的颜色:将 HTML 六位十六进制颜色转为十进制表示,例如 #FFFFFF 会被存储为 16777215
  • Unix 时间戳,以毫秒为单位,基准时间为 1970-1-1 08:00:00
  • 弹幕池:0 普通池 1 字幕池 2 特殊池(注:目前特殊池为高级弹幕专用)
  • 弹幕发送者的ID,用于『屏蔽此弹幕的发送者』功能
  • 弹幕在弹幕数据库中的rowID,用于『历史弹幕』功能

感谢**@张书樵**对弹幕数据格式的解析—— https://zhangshuqiao.org/2018-03/Bilibili%E5%BC%B9%E5%B9%95%E6%96%87%E4%BB%B6%E7%9A%84%E8%A7%A3%E6%9E%90/

将弹幕转换为ASS字幕

数据结构

要制作字幕的话,我们需要的有效数据有:弹幕出现的时间+弹幕的模式+弹幕的颜色+弹幕的内容

1
2
3
4
5
6
7
8
9
class Danmu:
    def __init__(self, appear_time, disappear_time, mode, color, text):
        self.appear_time = appear_time
        self.disappear_time = disappear_time
        self.mode = mode
        self.color = color
        if self.color == r"\c&HFFFFFF": # 默认白色则删去,压缩空间
            self.color = ""
        self.text = text

数据处理&封装数据

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def format_time(time_raw): # 把时间格式化为ASS字幕中的时间格式
    hour = str(int(time_raw / 3600))
    if hour != '0':
        time_raw %= 3600
    minute = str(int(time_raw / 60))
    if minute != '0':
        time_raw %= 60
    if len(minute) == 1:
        minute = "0" + minute
    second = str(round(time_raw, 2))
    if '.' in second:
        if len(second.split('.')[0]) == 1:
            second = "0" + second
        if len(second.split('.')[1]) == 1:
            second = second + "0"
    else:
        if len(second) == 1:
            second = "0" + second + ".00"
        else:
            second = second + ".00"
    return hour + ':' + minute + ':' + second

def get_danmu_list(danmu_url):
    danmu_raw = requests.get(danmu_url, headers = headers).content.decode('utf-8')
    danmu_raw_list = re.findall(r"<d .*?</d>", danmu_raw)
    danmu_list = []
    for item in danmu_raw_list:
        m = re.match(r'<d p="(.*?),(.*?),(.*?),(.*?),(.*?)>(.*?)</d>', item) # group(3)和group(5)不需要
        danmu_list.append(Danmu(format_time(float(m.group(1))), format_time(float(m.group(1)) + 8), int(m.group(2)), r"\c&H" + str(hex(int(m.group(4))))[2:].upper(), m.group(6)))
    return danmu_list

def generate_ass(danmu_list):
    ass = """[Script Info]
Title: Bilibili弹幕转ASS字幕
Original Script: 由 https://github.com/dreamwalkerxz 制作
ScriptType: v4.00+
Collisions: Normal
PlayResX: 560
PlayResY: 420
Timer: 10.0000

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Fix,Microsoft YaHei UI,25,&H66FFFFFF,&H66FFFFFF,&H66000000,&H66000000,0,0,0,0,100,100,0,0,1,1,0,2,20,20,2,0
Style: R2L,Microsoft YaHei UI,25,&H66FFFFFF,&H66FFFFFF,&H66000000,&H66000000,0,0,0,0,100,100,0,0,1,1,0,2,20,20,2,0

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
"""
    currentY = 1 # currentY * 25 -> Y .. currentY = 1 -> 16
    for danmu in danmu_list:
        style = ""
        startX = 620 # 接近于平均值
        endX = -(startX - 560)
        if 1 <= danmu.mode and danmu.mode <= 3:
            style = "R2L"
        else:
            style = "FIX"
            startX = 280
            endX = 280
        Y = currentY * 25
        currentY += 1
        if currentY == 17:
            currentY = 1
        ass += "Dialogue: 0,%s,%s,%s,,20,20,2,,{\\move(%d,%d,%d,%d)%s}%s\n" % (danmu.appear_time, danmu.disappear_time, style, startX, Y, endX, Y, danmu.color, danmu.text)
    return ass

成果展示

项目地址:🚀Github

主程序

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
import requests
import re

headers = {"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"}

class Danmu:
    def __init__(self, appear_time, disappear_time, mode, color, text):
        self.appear_time = appear_time
        self.disappear_time = disappear_time
        self.mode = mode
        self.color = color
        if self.color == r"\c&HFFFFFF": # 默认白色则删去,压缩空间
            self.color = ""
        self.text = text

def get_cid(url):
    html = requests.get(url, headers = headers).content.decode('utf-8')
    if "cid" in html:
        return re.search(r'"cid":(\d*)', html).group(1)
    else:
        return "x"

def format_time(time_raw):
    hour = str(int(time_raw / 3600))
    if hour != '0':
        time_raw %= 3600
    minute = str(int(time_raw / 60))
    if minute != '0':
        time_raw %= 60
    if len(minute) == 1:
        minute = "0" + minute
    second = str(round(time_raw, 2))
    if '.' in second:
        if len(second.split('.')[0]) == 1:
            second = "0" + second
        if len(second.split('.')[1]) == 1:
            second = second + "0"
    else:
        if len(second) == 1:
            second = "0" + second + ".00"
        else:
            second = second + ".00"
    return hour + ':' + minute + ':' + second

def get_danmu_list(danmu_url):
    danmu_raw = requests.get(danmu_url, headers = headers).content.decode('utf-8')
    danmu_raw_list = re.findall(r"<d .*?</d>", danmu_raw)
    danmu_list = []
    for item in danmu_raw_list:
        m = re.match(r'<d p="(.*?),(.*?),(.*?),(.*?),(.*?)>(.*?)</d>', item) # group(3和5)废弃
        danmu_list.append(Danmu(format_time(float(m.group(1))), format_time(float(m.group(1)) + 8), int(m.group(2)), r"\c&H" + str(hex(int(m.group(4))))[2:].upper(), m.group(6)))
    return danmu_list

def generate_ass(danmu_list):
    ass = """[Script Info]
Title: Bilibili弹幕转ASS字幕
Original Script: 由 https://github.com/dreamwalkerxz 制作
ScriptType: v4.00+
Collisions: Normal
PlayResX: 560
PlayResY: 420
Timer: 10.0000

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Fix,Microsoft YaHei UI,25,&H66FFFFFF,&H66FFFFFF,&H66000000,&H66000000,0,0,0,0,100,100,0,0,1,1,0,2,20,20,2,0
Style: R2L,Microsoft YaHei UI,25,&H66FFFFFF,&H66FFFFFF,&H66000000,&H66000000,0,0,0,0,100,100,0,0,1,1,0,2,20,20,2,0

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
"""
    currentY = 1 # currentY * 25 -> Y .. currentY = 1 -> 16
    for danmu in danmu_list:
        style = ""
        startX = 620 # likely to be the average
        endX = -(startX - 560)
        if 1 <= danmu.mode and danmu.mode <= 3:
            style = "R2L"
        else:
            style = "FIX"
            startX = 280
            endX = 280
        Y = currentY * 25
        currentY += 1
        if currentY == 17:
            currentY = 1
        ass += "Dialogue: 0,%s,%s,%s,,20,20,2,,{\\move(%d,%d,%d,%d)%s}%s\n" % (danmu.appear_time, danmu.disappear_time, style, startX, Y, endX, Y, danmu.color, danmu.text)
    return ass

def main():
    url = input("url:")
    cid = get_cid(url)
    if cid == "x":
        print("cid not found")
        exit(1)
    danmu_url = "https://comment.bilibili.com/" + cid + ".xml"
    danmu_list = get_danmu_list(danmu_url)
    ass = generate_ass(danmu_list)
    with open(cid + ".ass", "wb+") as ass_file:
        ass_file.write(ass.encode("utf-8"))

if __name__ == "__main__":
    main()

使用方法

1
2
3
git clone https://github.com/DreamWalkerXZ/BilibiliDanmu2Ass.git
cd ./BilibiliDanmu2Ass
python3 main.py # 输入为B站视频地址, 输出为该视频cid.ass字幕文件(在当前目录)
Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy