langchain简单学习(2)-chatHistory

文档: https://python.langchain.com/docs/use_cases/question_answering/chat_history/

背景

QA过程中, 如果我们需要模型能够依据上下文来回答问题, 比如: "你可以详细描述你刚刚说的第二点吗?". 这种情况就需要给模型加上背景信息.

首先给出最基础的RAG. 这是不涉及背景问答的RAG.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.llms import Ollama
from langchain_community.chat_models.ollama import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()

# split and save to chroma
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OllamaEmbeddings(model="nomic-embed-text"))

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
llm = Ollama(model="llama3", keep_alive=-1)


def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)

Contextualizing the question

要实现chat history, 首先得在前面加一条contextualizing链, 这条链的作用就是将问题情境化, 从而让最新的问题能够脱离背景, 进而可以连接另一条链来对问题进行回答. chat history的具体实现就是通过contextualizing chain来情境化问题, 比如我们要求模型:

Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is.

这样返回的第二个问题就是可以回答的了. 然后这个问题会结合retriever返回相关doc, 附一个调试结果

MessagePlaceholder

字面意思, Message占位符, 可以通过对应key传入信息. 这个会作为chat history的占位符使用.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from langchain_core.prompts import MessagesPlaceholder

prompt = MessagesPlaceholder("history")
prompt.format_messages() # raises KeyError

prompt = MessagesPlaceholder("history", optional=True)
prompt.format_messages() # returns empty list []

prompt.format_messages(
history=[
("system", "You are an AI assistant."),
("human", "Hello!"),
]
)
# -> [
# SystemMessage(content="You are an AI assistant."),
# HumanMessage(content="Hello!"),
# ]

history aware retriever

源码

有history就prompt转llm转parser然后retriever, 否则直接丢retriever.

1
2
3
4
5
6
7
8
9
10
retrieve_documents: RetrieverOutputLike = RunnableBranch(
(
# Both empty string and empty list evaluate to False
lambda x: not x.get("chat_history", False),
# If no chat history, then we just pass input to retriever
(lambda x: x["input"]) | retriever,
),
# If chat history, then we pass inputs to LLM chain, then to retriever
prompt | llm | StrOutputParser() | retriever,
).with_config(run_name="chat_retriever_chain")

实际代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
[
("system", contextualize_q_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
history_aware_retriever = create_history_aware_retriever(
llm, retriever, contextualize_q_prompt
)

QA链

contextualizing结束, 连接QA链

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
[
("system", qa_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)


question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
# contextualizing_chain + QA_chain
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

create_stuff_documents_chain就是单纯的连接 dictionary | prompt | llm | outputparser. 其中output_parser不存在即为StrOutputParser.

final result

总流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from langchain_core.messages import HumanMessage

chat_history = []

question = "What is Task Decomposition?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question), ai_msg_1["answer"]])

second_question = "What are common ways of doing it?"
ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

print(ai_msg_1["answer"])
print()
print(ai_msg_2["answer"])

return source

这部分是由create_retrieval_chain做到的

1
2
3
for document in ai_msg_2["context"]:
print(document)
print()

持久化

刚刚我们的history是通过手动extend一个list来实现的, 实际应用中可能会有持久化和自动化的需求.

这里可以用到

那么前面一切照旧, 最后state加以处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
### Statefully manage chat history ###
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
rag_chain,
get_session_history,
input_messages_key="input",
history_messages_key="chat_history",
output_messages_key="answer",
)

conversational_rag_chain.invoke(
{"input": "What is Task Decomposition?"},
config={
"configurable": {"session_id": "abc123"}
}, # constructs a key "abc123" in `store`.
)["answer"]
# "Task decomposition is a technique that breaks down complex tasks into smaller and simpler steps. It involves decomposing big tasks into multiple manageable tasks, making it easier to understand the model's thinking process and interpret the results. This approach is often used in prompting techniques like Chain of Thought (CoT) to enhance model performance on complex tasks."


conversational_rag_chain.invoke(
{"input": "What are common ways of doing it?"},
config={"configurable": {"session_id": "abc123"}},
)["answer"]
# 'Task decomposition can be achieved through various methods, including:\n\n* Chain of Thought (CoT): Instructing the model to "think step by step" to decompose hard tasks into smaller and simpler steps.\n* Behavioral cloning over actions: Using a set of source policies trained for specific tasks to generate learning histories and distill them into a neural network.\n\nI don\'t know about other common ways.'

顺便一提, 这里session_id是指对话的session, 同一session下有记忆, 否则无记忆, 官方给出的示例就很有代表性.