The Will Will Web

記載著 Will 在網路世界的學習心得與技術分享

語義人工智慧的 Schillace 法則 (Schillace Laws of Semantic AI)

我最近看到一篇 Schillace Laws of Semantic AI 文章,它讓我重新思考了「軟體工程」的未來走向,或許這種「語義人工智慧」與「軟體工程」的結合,真的可能成為一種未來的趨勢。所以我打算翻譯整篇文章,希望能夠讓更多人認識 Semantic AI (語義人工智慧),並且能夠對這個領域有更多的了解。

futuristic-robot-artificial-intelligence-concept2

語義人工智慧的 Schillace 法則

  1. 如果模型能夠完成任務,就不要寫程式;模型會不斷進步,但程式碼不會。

    Don't write code if the model can do it; the model will get better, but the code won't.

    系統的整體目標是利用大型語言模型(LLM)的計畫和理解能力,建立具有極高槓桿效應的程式。事實上,你很容易落入傳統程式開發的命令式思維,凡事都想自己寫 Code 來解決問題。你要抵制這種誘惑,盡量讓目前的模型可靠地執行任務,隨著模型的發展,它將變得更好且更穩健。

    The overall goal of the system is to build very high leverage programs using the LLM's capacity to plan and understand intent. It's very easy to slide back into a more imperative mode of thinking and write code for aspects of a program. Resist this temptation - to the degree that you can get the model to do something reliably now, it will be that much better and more robust as the model develops.

    譯注: 槓桿效應(leverage)應該是指如何將程式產生的價值最大化。

  2. 以槓桿效應換取精確性;使用互動來緩解。

    Trade leverage for precision; use interaction to mitigate.

    與上述觀點相關,使用大型語言模型(LLM)編寫程式的正確心態不是「看看我們能讓跳舞的熊做點什麼」,而是從系統中獲取盡可能多的槓桿效應。例如你可以建立非常通用的模式,如「從資料庫產生報表」或「教授特定主題一年」,可以使用文字提示進行參數化,輕鬆產生極具價值且有差異化的結果。

    Related to the above, the right mindset when coding with an LLM is not "let's see what we can get the dancing bear to do," it's to get as much leverage from the system as possible. For example, it's possible to build very general patterns, like "build a report from a database" or "teach a year of a subject" that can be parameterized with plain text prompts to produce enormously valuable and differentiated results easily.

  3. 程式碼用於語法和流程;模型用於語義和意圖。

    Code is for syntax and process; models are for semantics and intent.

    有很多不同的說法,但基本上,當模型被要求思考意義和目標時,它們會更強大,當被要求執行具體計算和流程時,它們會變得較弱。例如,對進階的模型來說,寫一個通用的數獨解題程式很容易,但讓它們自己解數獨很難。不同類型的程式碼具有不同的優勢,重要的是在解決問題時使用適當類型的程式碼。語法語義之間的界線是這些程式的難點之一。

    There are lots of different ways to say this, but fundamentally, the models are stronger when they are being asked to reason about meaning and goals, and weaker when they are being asked to perform specific calculations and processes. For example, it's easy for advanced models to write code to solve a sudoku generally, but hard for them to solve a sudoku themselves. Each kind of code has different strengths and it's important to use the right kind of code for the right kind of problem. The boundaries between syntax and semantics are the hard parts of these programs.

  4. 系統將像它最脆弱的部分一樣脆弱。

    The system will be as brittle as its most brittle part.

    這適用於任何類型的程式碼。因為我們追求靈活性和高槓桿效應,所以非不得已千萬不要寫死(hard code)任何程式碼。在提示(Prompts)中加入盡可能多的推理彈性,最小程度地使用程式碼來解決問題,以賦能大型語言模型(LLM)。

    This goes for either kind of code. Because we are striving for flexibility and high leverage, it's important to not hard code anything unnecessarily. Put as much reasoning and flexibility into the prompts and use imperative code minimally to enable the LLM.

  5. 聰明提問以獲取聰明答案。

    Ask Smart to Get Smart.

    新興的大型語言模型能力非常強大且學富五車,但它們缺乏上下文主動性。如果您問它們一個簡單開放性的問題,您將獲得簡單通用的答案。如果您想要更多細節和精煉,提出的問題必須更有智慧。這是 AI 時代 "Garbage in, Garbage out" (垃圾進,垃圾出) 的展現。

    Emerging LLM AI models are incredibly capable and "well educated" but they lacks context and initiative. If you ask them a simple or open-ended question, you will get a simple or generic answer back. If you want more detail and refinement, the question has to be more intelligent. This is an echo of "Garbage in, Garbage out" for the AI age.

  6. 不確定性是一種例外狀況。

    Uncertainty is an exception throw.

    由於我們正在把「精確性」換成「槓桿效應」,所以當模型對意圖感到不確定時,需要依靠與用戶的互動。因此,當我們在程式中有一系列嵌套(nested)的提示時,其中一個在結果上不確定,正確的做法是“拋出例外”- 將不確定性傳播到上層,直到一個能夠澄清或與用戶互動的層次。

    Because we are trading precision for leverage, we need to lean on interaction with the user when the model is uncertain about intent. Thus, when we have a nested set of prompts in a program, and one of them is uncertain in its result ("One possible way...") the correct thing to do is the equivalent of an "exception throw" - propagate that uncertainty up the stack until a level that can either clarify or interact with the user.

  7. 文本是通用的傳輸協議。

    Text is the universal wire protocol.

    由於大型語言模型(LLM)擅長解析自然語言、意圖以及語義,因此文字是一種使用自然語言的格式在提示、模組和基於 LLM 服務之間傳遞指令。對於某些用途,自然語言不夠精確,你可以使用結構化語言(如 XML),但一般而言,傳遞自然語言在提示之間非常有效,且比結構化語言對於大多數用途更不脆弱。隨著這些基於模型的程式不斷增長,這是一種自然的“未來證明”,可以使不同的提示能夠理解彼此,就像人類一樣。

    Since the LLMs are adept at parsing natural language and intent as well as semantics, text is a natural format for passing instructions between prompts, modules and LLM based services. Natural language is less precise for some uses, and it is possible to use structured language like XML sparingly, but generally speaking, passing natural language between prompts works very well, and is less fragile than more structured language for most uses. Over time, as these model-based programs proliferate, this is a natural "future proofing" that will make disparate prompts able to understand each other, the same way humans do.

  8. 對您來說困難的事情,對模型來說也是困難的。

    Hard for you is hard for the model.

    當給模型一個具有挑戰性的任務時,一個常見的模式是它需要“大聲說出自己的想法”。這很有趣,但在程式的一部分中使用提示時,只需要推理出結果就足夠了。然而,使用一個 "元" (Meta) 提示,該提示提供問題和冗長的答案,並可以很好的提取(extract)答案。這是一項對於人來說可能更容易的認知任務 (很容易想像能夠給予某人一個通用任務,即“閱讀這個並提取出答案”,並且這在許多領域中都可以工作,用戶不需要專業知識,僅因為自然語言非常強大)。因此,在編寫程式時,請記住,對於一個人來說困難的事對模型來說可能也很困難,將模式分解為更簡單的步驟通常會產生更穩定的結果

    One common pattern when giving the model a challenging task is that it needs to "reason out loud." This is fun to watch and very interesting, but it's problematic when using a prompt as part of a program, where all that is needed is the result of the reasoning. However, using a "meta" prompt that is given the question and the verbose answer and asked to extract just the answer works quite well. This is a cognitive task that would be easier for a person (it's easy to imagine being able to give someone the general task of "read this and pull out whatever the answer is" and have that work across many domains where the user had no expertise, just because natural language is so powerful). So, when writing programs, remember that something that would be hard for a person is likely to be hard for the model, and breaking patterns down into easier steps often gives a more stable result.

  9. 謹防“意識的幻覺”;模型可以用於對抗自身。

    Beware "pareidolia of consciousness"; the model can be used against itself.

    我們很容易想像一個存在於大型語言模型(LLM)內部的思維(mind)。但人類思維模型思維之間存在重大差異。可以利用其中一個有意義的差異,即模型目前無法記住這一分鐘到下一分鐘的互動。因此,雖然我們永遠不會要求人類查找他們剛剛自己編寫的代碼中的錯誤或惡意代碼,但我們可以讓模型這樣做。它可能會在兩個地方犯相同的錯誤,但它無法對我們撒謊,因為它不知道程式碼的來源。這意味著我們可以在某些地方利用模型對抗模型自己 - 它可以用作程式碼的安全監視器、測試策略的一部分、生成內容的內容篩選器等。

    It is very easy to imagine a "mind" inside an LLM. But there are meaningful differences between human thinking and the model. An important one that can be exploited is that the models currently don't remember interactions from one minute to the next. So, while we would never ask a human to look for bugs or malicious code in something they had just personally written, we can do that for the model. It might make the same kind of mistake in both places, but it's not capable of "lying" to us because it doesn't know where the code came from to begin with. _This means we can "use the model against itself" in some places – it can be used as a safety monitor for code, a component of the testing strategy, a content filter on generated content, etc.

  10. 對您容易,對它容易;對您難,對它難。

    Easy for you, easy for it; hard for you, hard for it.

    工程師經常只想要結果或答案,不想要過程或過程中產生的文字,但 LLM 不是這樣設計的,人類也不是這樣運作的。對人類來說,遇到每一個問題其實都需要先經過思考,透過自然語言的方式處理,然後才能想出答案。相對的,對 LLM 來說也是一樣,你越想要「直接」得到答案,就只能越容易得到一個「錯誤的答案」。因此,對你來說「容易」的事情,其實對 LLM 來說也是容易的;對你來說「困難」的事情,對 LLM 來說也是相對困難。所以當你有個「大問題」要提問時,最好能夠透先將問題先切小一點,透過不同的技能(Skills)來處理這些簡單的問題,最後再將這些結果組成一個最終的、可靠的答案。

    譯注:這是我在這部訪談影片中聽到作者 Sam Schillace 自己添加的第 10 點法則。

相關連結

留言評論