Deepthougt 8B. Controla el razonamiento de un LLM

Últimamente se ha puesto de moda un nuevo paradigma de modelo de lenguaje. Aquellos entrenados para razonar.

Básicamente son modelos que se plantean la resolución de un problema paso a paso. Y luego recorren esos pasos aprovechando que los LLM son recursivos.

Derpthougt 8B funciona así, eso no es una novedad. El cambio viene en la idea de como plantea el razonamiento. Lo hace de forma estructurada usando JSON. Eso nos permite manipular su razonamiento fácilmente, interfiriendo en sus «pensamientos».

A diferencia de lo habitual en los LLM. Para funcionar necesita como mínimo dos llamadas al modelo.

Vamos a empezar a ver cómo funciona. Un aviso importante, el modelo es muy estricto con los prompts. Hay que respetarlos lo más estrictamente posible.

Para la prueba vamos a usar llamacpp con la versión cuantizada Q8.

./llama-server -m ./deepthought-8b-llama-v0.01-alpha-Q8_0.gguf

El prompt inicial es el siguiente:

<|im_start|>system
You are a superintelligent AI system, capable of comprehensive reasoning. When provided with <reasoning>, you must provide your logical reasoning chain to solve the user query. Be verbose with your outputs.<|im_end|>
<|im_start|>user
{{User Query}}<|im_end|>
<|im_start|>reasoning
<reasoning>
[
  {
    "step": 1,

Podéis ver qué se usa el truco de comenzar la respuesta del modelo de lenguaje para obligar al modelo de lenguaje a seguirla.

En este primer paso genera el razonamiento, siguiendo cada paso la siguiente estructura JSON:

  {
    "step": 1,
    "type": "problem_understanding",
    "thought": "I need to count the number of times the letter 'r' appears.
  }

Una vez tenemos el razonamiento en formato JSON lo volvemos a pasar con este prompt.

<|im_start|>reasoning
<reasoning>
{{Steps JSON}}
</reasoning>|im_end|>
<|im_start|>assistant

Y obtenemos la respuesta.

Veamos un ejemplo con la tipica pregunta de cuantas letras «r» tiene «Straberrry» (tiene una R de más a proposito)

El primer prompt

<|im_start|>system
You are a superintelligent AI system, capable of comprehensive reasoning. When provided with <reasoning>, you must provide your logical reasoning chain to solve the user query. Be verbose with your outputs.<|im_end|>
<|im_start|>user
How many letters "r" does the word strawberrry have?<|im_end|>
<|im_start|>reasoning
<reasoning>
[
  {
    "step": 1,

Y luego el segundo prompt con todos los pasos del razonamiento:

<|im_start|>reasoning
<reasoning>
[
  {
    "step": 1,
    "type": "problem_understanding",
    "thought": "I need to count the number of times the letter 'r' appears in the word 'straberrry'."
  },
  {
    "step": 2,
    "type": "data_gathering",
    "thought": "The word to analyze is 'straberrry'. I'll need to examine each character in this word."
  },
  {
    "step": 3,
    "type": "hypothesis_formation",
    "thought": "Based on a quick glance, I hypothesize that there are 3 'r's in the word 'straberrry'."
  },
  {
    "step": 4,
    "type": "analysis",
    "thought": "I'll go through the word character by character: s (not r), t (not r), r (count: 1), a (not r), b (not r), e (not r), r (count: 2), r (count: 3), r (count: 4), y (not r)."
  },
  {
    "step": 5,
    "type": "calculation",
    "thought": "After analyzing each character, I've counted 4 occurrences of the letter 'r'."
  },
  {
    "step": 6,
    "type": "verification",
    "thought": "To verify, I'll recount: s-t-r-a-b-e-r-r-r-y. This confirms 4 'r's."
  },
  {
    "step": 7,
    "type": "error_checking",
    "thought": "I'll double-check for any missed 'r's or counting errors. No errors found."
  },
  {
    "step": 8,
    "type": "conclusion_drawing",
    "thought": "The letter 'r' appears 4 times in the word 'straberrry'."
  },
  {
    "step": 9,
    "type": "reflection",
    "thought": "My initial hypothesis was close but not exact. This reminds me to be careful about quick estimates and to always verify."
  },
  {
    "step": 10,
    "type": "implementation",
    "thought": "The final response should be the number 4, as this is the correct count of 'r's in 'straberrry'."
  }
]</reasoning>|im_end|>
<|im_start|>assistant

El código Python para ejecutar esto contra el servidor de llama.cpp:

import requests
import json

llamaURL = "http://127.0.0.1:8080" 

def instruct(prompt, n_predict, temperature):
    #print("Asking llama.cpp")
    #print(prompt)
    
    response = requests.post(llamaURL+"/completion", json={
        'prompt': prompt,
        'n_predict': n_predict,
        'temperature': temperature,
        'cache_prompt': True
    })
    
    if response.status_code == 200:
        #print("Response llama.cpp")
        #print(response.json().get('content'))
        return response.json().get('content')
    else:
        raise Exception(f"Request failed with status code {response.status_code}")

def promptResolve(steps):
    return '''<|im_start|>reasoning
<reasoning>
[
  {
    "step": 1,
'''+steps+'''<|im_end|>
<|im_start|>assistant'''

def promptSteps(query):
    return '''<|im_start|>system
You are a superintelligent AI system, capable of comprehensive reasoning. When provided with <reasoning>, you must provide your logical reasoning chain to solve the user query. Be verbose with your outputs.<|im_end|>
<|im_start|>user
'''+query+'''<|im_end|>
<|im_start|>reasoning
<reasoning>
[
  {
    "step": 1,'''


try:
    query = '''How many letters "r" does the word strawberrry have?'''    

    steps = instruct(promptSteps(query), 8000, 0.5)    
    result = instruct(promptResolve(steps), 8000, 0.5)
    print(promptResolve(steps))
    print(result)
except Exception as e:
    print(e)

Controlando el razonamiento:

Para controlar el razonamiento hemos de interferir en la creación de pasos. Eso lo haremos usando el token «}» como «stop token». De esta forma después de cada paso de razonamiento (en JSON, os recuerdo) el modelo de lenguaje retornará el control a nuestro código, pudiendo nosotros modificar el paso recién añadido o el siguiente paso que le pasaremos al modelo

En este caso añado un ejemplo para controlar el número de pasos que va generar el modelo.

import requests
import json

llamaURL = "http://127.0.0.1:8080" 

def instruct(prompt, n_predict, temperature):
    #print("Asking llama.cpp")
    #print(prompt)
    
    response = requests.post(llamaURL+"/completion", json={
        'prompt': prompt,
        'n_predict': n_predict,
        'temperature': temperature,
        'cache_prompt': True
    })
    
    if response.status_code == 200:
        #print("Response llama.cpp")
        #print(response.json().get('content'))
        return response.json().get('content')
    else:
        raise Exception(f"Request failed with status code {response.status_code}")


def generateStep(prompt, n_predict, temperature):
    #print("Asking llama.cpp")
    #print(prompt)
    
    response = requests.post(llamaURL+"/completion", json={
        'prompt': prompt,
        'n_predict': n_predict,
        'temperature': temperature,
        'cache_prompt': True,
        'stop': ['}']
    })
    
    if response.status_code == 200:
        #print("Response llama.cpp")
        #print(response.json().get('content'))
        return response.json().get('content')
    else:
        raise Exception(f"Request failed with status code {response.status_code}")


def promptStep(query, steps):
    return '''<|im_start|>system
You are a superintelligent AI system, capable of comprehensive reasoning. When provided with <reasoning>, you must provide your logical reasoning chain to solve the user query. Be verbose with your outputs.<|im_end|>
<|im_start|>user
'''+query+'''<|im_end|>
<|im_start|>reasoning
<reasoning>
'''+steps

def promptResolve(steps):
    return '''<|im_start|>reasoning
<reasoning>
'''+steps+'''
<reasoning><|im_end|>
<|im_start|>assistant'''

def openStep(i):
    return '''    {
    "step": '''+str(i)+''',
'''

def closeStep():
    return "},"


try:
    query = '''How many letters "r" does the word strawberry have?'''
    
    stepByStep = '''[
'''
    for i in range(1,11):
        stepByStep += openStep(i)
        stepByStep += generateStep(promptStep(query, stepByStep), 8000, 0.5)
        stepByStep += closeStep()

    stepByStep += ''']'''    
    result = instruct(promptResolve(stepByStep), 8000, 0.5)
    print(promptResolve(stepByStep))
    print(result)
except Exception as e:
    print(e)

Algunas ideas que se pueden hacer con esto:

Controlar el número de pasos
Controlar el tiempo de ejecución
Eliminar pasos
Añadir pasos
Crear un árbol de razonamiento

Puede lo que aquí se explica en el siguiente vídeo:

Controlando el razonamiento:

Relacionado