Skip to main content

基于马尔科夫链的随机文章生成

Feishiko
Author
Feishiko
Programming Avali
Table of Contents

这几天在玩Caves of Qud,这是一个传统Roguelike游戏,游戏的背景是末日后,人们用水作为一般等价物进行交易,同时用水进行水仪式,可以促进不同派系之间的好感度。游戏中的书本内容是随机生成的,听群友说是隐马尔科夫链,之后出于好奇自己搜了一些资料,并且尝试用马尔科夫链生成一些内容,以下内容是我基于自己写的模型,并且用Soul Music作为语料库生成的内容:

You are certain if you can be an immortal and took the Death of his own, and took the Discworld, or Mort for the Discworld, on the Discworld, someone she’d known to become sixteen but she knew how to his own, and took the dimensions. But if it were, well, nothing against horses, or later had Eg and took the Death started to become real sorry. Or what, for the Death started to his home yet. Er. Got it!’ he believed in the Discworld, or later and took the Discworld, on business of the Discworld, someone married then rolled up a story about sex and this much to become accustomed and took the dimensions. And another tooth.'

And then, eventually, and this shows that he later hired and this silliness.'

And another one, and this much can hardly existed at age of his own, once been then rolled his home between the Death sat under the Discworld, someone she’d already circling the Death which was probably safe from the Discworld, or Mort was an apprentice but it were, well, nothing here,’ said that the dimensions. But first thing regardless then got up a story or Mort lost in the Death of his own, and still sitting there are interviewing or Mort was probably true. But the Discworld, or later take a story about memory. And then, in the Discworld, then said, ’ I did Miss Butts shuffled the Death of his home between the dimensions. And another direction. His brow or later take a story but it is a story about sex and took her feel better. It doesn’t take an apprentice then glanced at the Discworld, then said, rolling fields, and took the dimensions. But first thing regardless or later another river that you ran smoothly or later another direction.

看着像是一点都不正经的胡说八道,不过如果你有需要生成一些胡说八道随机文章的地方,这篇文章或许能帮助到你。

原理
#

马尔科夫链,今天发生的事情只和昨天有关系,明天发生的事情只和今天有关系。比如现在只有一个词I,然后I能和很多可能性的词连接,比如and/am等等,假如我选择and,那么现在的内容就是I and,接着是and,能和and连接的词比如有you/him/her,这时可能连接的词和I就没有任何关系了,然后假如我们选择you,现在的内容就是I and you,之后我们再去找可以和you连接的词。

接着是语料库的收集,要生成东西就需要一些原有的素材,比如有这样一段话“I am Feishiko, I like play games.”,我们可以按空格拆分,让这句话拆成I/am/Feishiko,/like/play/games.

然后输入给我们的模型,以I为开头,可能会生成:I like play games.

代码
#

还是以lua为例,因为模型我是用lua写的(

我们首先要创建一个函数,我们需要一个文本源,用来生成语料库,输出多少个单词,以及我们的第一个关键词是什么。

function ModelBuild(_source, _num, _keyword)

end

下面我们要拆分出文本源的关键词,并且把它传给listKey这个table,也就是把所有单词拆分成一个一维数组的一个个元素。

function ModelBuild(_source, _num, _keyword)
    local listKey = {} -- 用来存所有(存在重复)关键词
    local firstPlace = 1 -- 截取文字前面的位置
    local secondPlace = 1 -- 截取文字后面的位置
    local len = string.len(_source) -- 文本源的长度
    while true do
        while true do
            if secondPlace >= len then -- 如果读到头了,那就强制中断循环
                table.insert(listKey, string.sub(_source, firstPlace, secondPlace))
                break 
            end
            
            if string.sub(_source, secondPlace, secondPlace) == " " then -- 按空格拆分关键词
                table.insert(listKey, string.sub(_source, firstPlace, secondPlace - 1))
                secondPlace = secondPlace + 1
                firstPlace = secondPlace
                break
            end
            secondPlace = secondPlace + 1
        end
        if secondPlace >= len then -- 如果读到头了,那就强制中断循环
            break 
        end
    end
end

下面这段代码用来把关键词插入到一个新的table里面,相当于其他语言的字典,key是一个字符串,value是一个数组,数组里的各个元素是可能与key连接的词 (还是在这个函数里面写代码)

    local model = {}
    for i, v in ipairs(listKey) do -- i就是index, v就是value
        if model[v] == nil then
            model[v] = {}
        end
        if listKey[i + 1] ~= nil then
            table.insert(model[v], listKey[i + 1]) 
        end
    end

下面这段代码用来基于刚才训练的语料库生成文章

    local text = _keyword -- text是文章的内容
    local keyword = _keyword -- keyword是当前的关键词
    for i = 1, _num, 1 do -- 基于要多少词生成文章
        if model[keyword] == nil then -- 如果所给的关键词没有找到后面能连接的词,那么就不生成了
            break
        end

        local nextWord = model[keyword][math.random(#keyword)] -- 通过model table找到下一个能连接的词,#keyword的意思是keyword table的长度(或者说是这个数组的长度)
        
        if nextWord == nil or nextWord == "." then -- 如果不存在能连接的词或者连接的词是一个.,那么就找一个连词作为下一个词
            local pron = {"and", "but", "or", "then"}
            if string.sub(text, string.len(text) - 2, string.len(text) - 2) == "\'" or string.sub(text, string.len(text) - 2, string.len(text) - 2) == "." or string.sub(text, string.len(text), string.len(text)) == "." then -- 该大写的地方要大写
                pron = {"And", "But", "Or", "Then"}
            end
            nextWord = pron[math.random(4)]
        end
        text = text .. " " -- ..的意思在lua里面是连接字符串的意思
        text = text .. nextWord
        keyword = nextWord
    end
    return text

那么函数部分就写好了,接下来需要把我们的文章传进来

fileText = ""

file = io.open("Soul Music.txt")

fileText = fileText .. file:read("*a")

file:close()

调用一下函数

math.randomseed(os.time()) -- 种子用系统时间,保证每一次都是随机的
function init()
    local text = ModelBuild(fileText, 300, "You") --- 选用fileText,就是Soul Music.txt作为语料库,生成300词,第一个词是You
    math.randomseed(os.time())
    print(text)
end

while true do -- 错误处理,如果报错就继续调用这个函数,因为长度有限制,大概是一个int整型的长度,有的时候math.random(#keyword)会报错
    if pcall(init) then
       break
    end
end

重新整理一下,就是以下代码:

fileText = ""

file = io.open("Soul Music.txt")

fileText = fileText .. file:read("*a")

file:close()

function ModelBuild(_source, _num, _keyword)
    local listKey = {}
    local firstPlace = 1
    local secondPlace = 1
    local len = string.len(_source)
    while true do
        while true do
            if secondPlace >= len then
                table.insert(listKey, string.sub(_source, firstPlace, secondPlace))
                break 
            end
            
            if string.sub(_source, secondPlace, secondPlace) == " " then
                table.insert(listKey, string.sub(_source, firstPlace, secondPlace - 1))
                secondPlace = secondPlace + 1
                firstPlace = secondPlace
                break
            end
            secondPlace = secondPlace + 1
        end
        if secondPlace >= len then
            break 
        end
    end

    local model = {}
    for i, v in ipairs(listKey) do
        if model[v] == nil then
            model[v] = {}
        end
        if listKey[i + 1] ~= nil then
            table.insert(model[v], listKey[i + 1]) 
        end
    end

    local text = _keyword
    local keyword = _keyword
    for i = 1, _num, 1 do
        if model[keyword] == nil then
            break
        end

        local nextWord = model[keyword][math.random(#keyword)]
        
        if nextWord == nil or nextWord == "." then
            local pron = {"and", "but", "or", "then"}
            if string.sub(text, string.len(text) - 2, string.len(text) - 2) == "\'" or string.sub(text, string.len(text) - 2, string.len(text) - 2) == "." or string.sub(text, string.len(text), string.len(text)) == "." then
                pron = {"And", "But", "Or", "Then"}
            end
            nextWord = pron[math.random(4)]
        end
        text = text .. " "
        text = text .. nextWord
        keyword = nextWord
    end
    return text
end

math.randomseed(os.time())
function init()
    local text = ModelBuild(fileText, 300, "You")
    math.randomseed(os.time())
    print(text)
end

while true do
    if pcall(init) then
       break
    end
end