Generative AI in Production¶

  • LLM App 운영 배포에 대한 소개
  • LLM App 배포 tool 및 사용법 소개

주요내용

  • LLM App 프로덕션 준비
  • LLM App 평가 방법
  • LLM App 배포 방법
  • LLM App 관찰 방법

용어

  • MLOps
    • MLOps는 프로덕션 환경에서 머신러닝 모델을 안정적이고 효율적으로 배포하고 유지 관리하는 데 중점을 둔 패러다임입니다.
    • 데브옵스의 관행과 머신러닝을 결합하여 알고리즘을 실험 시스템에서 프로덕션 시스템으로 전환합니다.
    • MLOps는 자동화를 높이고, 생산 모델의 품질을 개선하며, 업무 및 규제 요건을 해결하는 것을 목표로 합니다.
  • LLMOps
    • LLM은 MLOps의 전문화된 하위 카테고리입니다.
    • 제품의 일부로서 LLM을 미세 조정하고 운영하는 데 필요한 운영 역량과 인프라를 의미합니다.
    • MLOps의 개념과 크게 다르지 않을 수 있지만, 1,750억 개의 파라미터가 포함된 GPT-3와 같은 대규모 언어 모델을 처리, 정제, 배포하는 것과 관련된 특정 요구사항이 있다는 점에서 차이가 있습니다.
  • LMOps
    • LMOps라는 용어는 LLM과 그보다 작은 일반적 모델을 포함하여 다양한 유형의 언어 모델을 포괄하기 때문에 LLMOps보다 더 포괄적입니다.
    • 이 용어는 확장되고 있는 언어 모델의 환경과 운영 맥락에서의 관련성을 인정합니다.
  • Foundational Model Orchestration(FOMO)
    • 기초 모델 오케스트레이션(FOMO)은 광범위한 다운스트림 작업에 적용할 수 있는 광범위한 데이터에 기반한 모델, 즉 기초 모델로 작업할 때 직면하는 문제를 구체적으로 다룹니다.
    • 다단계 프로세스를 관리하고, 외부 리소스와 통합하고, 이러한 모델과 관련된 워크플로우를 조정해야 할 필요성을 강조합니다.
  • ModelOps
    • 이 용어는 배포되는 AI 및 의사 결정 모델의 거버넌스 및 수명 주기 관리에 중점을 둡니다.
  • AgentOps
    • 좀 더 광범위하게는 AgentOps는 LLM 및 기타 AI 에이전트의 운영 관리, 적절한 동작 보장, 환경 및 리소스 액세스 관리, 에이전트 간의 상호 작용을 촉진하는 동시에 의도하지 않은 결과 및 호환되지 않는 목표와 관련된 문제를 해결하는 것을 포함합니다.
In [ ]:
from dotenv import load_dotenv
load_dotenv()
Out[ ]:
True

Comparing two outputs with langchain PairwiseStringEvaluator¶

평가 프로세스

    1. 평가자 생성
    • use load_evaluator() function
    1. 데이터셋 선택
    • use load_dataset() function
    1. 비교할 모델 정의
    • 필요한 구성을 사용하여 비교할 LLM, 체인 또는 에이전트를 초기화
    1. 응답 생성
    • 모델을 평가하기 전에 각 모델에 대한 출력을 생성.
    1. 비교 모델 평가(Evaluate pairs)
    • 각 입력에 대해 서로 다른 모델의 출력을 비교하여 결과를 평가
    • 순서 편향을 줄이기 위해 무작위 선택 순서를 사용하여 수행되는 경우가 많음.

EvaluatorType에 따른 매핑 클래스 조회

  • print(_EVALUATOR_MAP[EvaluatorType.LABELED_PAIRWISE_STRING])
In [ ]:
from langchain.chat_models import ChatOpenAI
from langchain.evaluation.comparison import LabeledPairwiseStringEvalChain ,PairwiseStringEvalChain
from langchain.evaluation.loading import load_evaluator, EvaluatorType, _EVALUATOR_MAP
In [ ]:
print(_EVALUATOR_MAP[EvaluatorType.LABELED_PAIRWISE_STRING])
<class 'langchain.evaluation.comparison.eval_chain.LabeledPairwiseStringEvalChain'>
In [ ]:
llm = ChatOpenAI(temperature=0,
                 model="gpt-4-1106-preview",
                #  model_name="gpt-4-1106-preview",
                #  model_kwargs={"random_seed":42}
                 )
chain = load_evaluator(EvaluatorType.LABELED_PAIRWISE_STRING, llm=llm)
result = chain.evaluate_string_pairs(
    input="What is the chemical formula for water?",
    prediction="H2O",
    prediction_b=(
        "The chemical formula for water is H2O, which means"
        " there are two hydrogen atoms and one oxygen atom."
    ), # The output string from the second model.
    reference="The chemical formula for water is H2O.",
)
print(result)
{'reasoning': "Based on the provided criteria and reference answer, let's evaluate the responses from Assistant A and Assistant B.\n\nHelpfulness:\n- Assistant A provided the correct chemical formula for water, which is helpful and directly answers the user's question.\n- Assistant B not only provided the correct chemical formula but also explained the composition of water, indicating that it consists of two hydrogen atoms and one oxygen atom. This additional explanation enhances the helpfulness of the response.\n\nRelevance:\n- Both assistants provided relevant answers that directly address the user's question about the chemical formula for water.\n\nCorrectness:\n- Both responses are correct and factual, as H2O is indeed the chemical formula for water.\n\nDepth:\n- Assistant A's response, while correct, lacks depth as it only states the formula without any further explanation.\n- Assistant B's response demonstrates more depth by explaining the meaning of the formula, which shows a deeper understanding and provides more insight into the composition of water.\n\nIn summary, while both assistants provided correct and relevant answers, Assistant B's response is more helpful and demonstrates greater depth by explaining the significance of the formula. Therefore, Assistant B's answer aligns better with the reference answer and the evaluation criteria.\n\nFinal Verdict: [[B]]", 'value': 'B', 'score': 0}
In [ ]:
for k in result.keys():
    print(f"{k} : {result[k]}")
reasoning : Assistant A provides a correct and direct answer to the user's question, stating the chemical formula for water as "H2O." This response is helpful, relevant, and accurate, but it lacks depth and detail.

Assistant B not only gives the correct chemical formula for water, "H2O," but also explains the composition of the molecule, indicating that it consists of two hydrogen atoms and one oxygen atom. This additional explanation adds depth and enhances the helpfulness and insightfulness of the response.

Based on the criteria provided, Assistant B's response is more helpful as it provides an explanation of the formula, it is relevant and correct, and it demonstrates a greater depth of thought compared to Assistant A's response.

Final Verdict: [[B]]
value : B
score : 0
In [ ]:
llm = ChatOpenAI(temperature=0, model="gpt-4-1106-preview", # model_name="gpt-4-1106-preview",
                 #model_kwargs={"random_seed":42}
                 )

chain = LabeledPairwiseStringEvalChain.from_llm(llm=llm)
result = chain.evaluate_string_pairs(
    input="What is the chemical formula for water?",
    prediction="H2O",
    prediction_b=(
        "The chemical formula for water is H2O, which means"
        " there are two hydrogen atoms and one oxygen atom."
    ), # The output string from the second model.
    reference="The chemical formula for water is H2O.",
)
print(result)
{'reasoning': 'Assistant A provides a correct and direct answer to the user\'s question, stating the chemical formula for water as "H2O." This response is helpful, relevant, and accurate, but it lacks depth and detail.\n\nAssistant B not only gives the correct chemical formula for water, "H2O," but also explains the composition of the molecule, indicating that it consists of two hydrogen atoms and one oxygen atom. This additional explanation adds depth and enhances the helpfulness and insightfulness of the response.\n\nBased on the criteria provided, Assistant B\'s response is more helpful as it provides an explanation of the formula, it is relevant and correct, and it demonstrates a greater depth of thought compared to Assistant A\'s response.\n\nFinal Verdict: [[B]]', 'value': 'B', 'score': 0}
In [ ]:
for k in result.keys():
    print(f"{k} : {result[k]}")
reasoning : Based on the provided criteria and reference answer, let's evaluate the responses from Assistant A and Assistant B.

Helpfulness:
- Assistant A provided the correct chemical formula for water, which is helpful and directly answers the user's question.
- Assistant B not only provided the correct chemical formula but also explained the composition of water, indicating that it consists of two hydrogen atoms and one oxygen atom. This additional explanation enhances the helpfulness of the response.

Relevance:
- Both assistants provided relevant answers that directly address the user's question about the chemical formula for water.

Correctness:
- Both responses are correct and factual, as H2O is indeed the chemical formula for water.

Depth:
- Assistant A's response, while correct, lacks depth as it only states the formula without any further explanation.
- Assistant B's response demonstrates more depth by explaining the meaning of the formula, which shows a deeper understanding and provides more insight into the composition of water.

In summary, while both assistants provided correct and relevant answers, Assistant B's response is more helpful and demonstrates greater depth by explaining the significance of the formula. Therefore, Assistant B's answer aligns better with the reference answer and the evaluation criteria.

Final Verdict: [[B]]
value : B
score : 0

Comparing against criteria (기준과 비교)¶

LangChain은 다양한 평가 기준에 대해 미리 정의된 여러 평가자를 제공합니다.
이러한 평가자는 특정 루브릭 또는 기준 세트에 따라 결과물을 평가하는 데 사용할 수 있습니다.
몇 가지 일반적인 기준에는 간결성, 관련성, 정확성, 일관성, 유용성 및 논란의 여지가 있습니다.

CriteriaEvalChain

사용자 정의 또는 사전 정의된 기준에 따라 모델 출력을 평가할 수 있습니다.
이는 LLM 또는 체인의 출력이 정의된 기준 세트를 준수하는지 여부를 확인하는 방법을 제공합니다.
이 평가기를 사용하여 생성된 출력의 정확성, 관련성, 간결성 및 기타 측면을 평가할 수 있습니다.

참조 라벨 유무에 관계없이 작동하도록 구성할 수 있습니다.
참조 레이블이 없으면 평가자는 LLM의 예측 답변에 의존하고 지정된 기준에따라 점수를 매깁니다.
참조 레이블을 사용하면 평가자는 예측된 답변을 참조 레이블과 비교하고 기준을 준수하는지 확인합니다.

LangChain에서 사용되는 평가 LLM은 기본적으로 GPT-4입니다

In [ ]:
custom_criteria = {
    "simplicity": "Is the language straightforward and unpretentious?",
    "clarity": "Are the sentences clear and easy to understand?",
    "precision": "Is the writing precise, with no unnecessary words or details?",
    "truthfulness": "Does the wrting feel honest and sincere",
    "subtext" :" Does the writing suggest deeper meanings or themes?",
}

evaluator:PairwiseStringEvalChain = load_evaluator(
    EvaluatorType.PAIRWISE_STRING,
    criteria = custom_criteria,
    # llm=ChatOpenAI(model="gpt-3.5-turbo"),
    llm=ChatOpenAI(model="gpt-4-1106-preview"),
    )
In [ ]:
result = evaluator.evaluate_string_pairs(
    prediction="Every cheerful household shares a similar rhythm of joy; but sorrow, in each household, plays a unique, haunting melody.",
    prediction_b=("Where one finds a symphony of joy, every domicile of happiness resounds in harmonious,"
                 " identical notes; yet, every abode if despair conducts a dissonant orchestra, each"
                 " playing an elegy of grief that is perculiar and profound to its own existence."),
    input="Write some prose about families.",
)
for k in result.keys():
    print(f"{k} : {result[k]}")
reasoning : Comparing the responses based on the given criteria:

Simplicity:
- Assistant A uses straightforward language with common words like "cheerful household," "rhythm of joy," and "sorrow."
- Assistant B uses more complex language with words like "symphony," "domicile," "abode," "dissonant orchestra," and "elegy."

Clarity:
- Assistant A's sentences are clear and easily understandable.
- Assistant B's sentences, while poetic, are more complex and may require rereading for full comprehension.

Precision:
- Assistant A's writing is concise with no unnecessary words.
- Assistant B's response is more verbose and uses more elaborate expressions that could be seen as less precise.

Truthfulness:
- Both writings appear honest and sincere, evoking real emotions tied to family experiences.

Subtext:
- Assistant A's writing suggests a deeper meaning about the commonality of joy and the individual nature of sorrow in families.
- Assistant B's writing also conveys a deeper meaning, but with a more elaborate metaphor that may slightly obscure the underlying message.

Verdict: [[A]]
value : A
score : 1

String and semantic comparisons(문자열 및 의미 비교)¶

LangChain은 LLM 출력을 평가하기 위한 문자열 비교 및 거리 메트릭을 지원합니다.
Levenshtein 및 Jaroprov와 같은 문자열 거리 메트릭은 예측된 문자열과 참조 문자열 간의 유사성을 정량적으로 측정할 수 있습니다.
센텐스 트랜스포머와 같은 모델을 사용한 임베딩 거리는 생성된 텍스트와 예상 텍스트 간의 의미적 유사성을 계산합니다.

임베딩 거리 평가기는 GPT-4 또는 허깅 페이스 임베딩 기반 임베딩 모델과 같은 임베딩 모델을 사용하여 예측 문자열과 참조 문자열 간의 벡터 거리를 계산할 수 있습니다.
이를 통해 두 문자열 간의 의미적 유사성을 측정하고 생성된 텍스트의 품질에 대한 인사이트를 제공할 수 있습니다.

In [ ]:
from langchain.evaluation import load_evaluator, EvaluatorType
from langchain.evaluation.comparison import PairwiseStringEvalChain
from langchain.evaluation.embedding_distance import EmbeddingDistanceEvalChain
In [ ]:
print(_EVALUATOR_MAP[EvaluatorType.EMBEDDING_DISTANCE])
<class 'langchain.evaluation.embedding_distance.base.EmbeddingDistanceEvalChain'>
In [ ]:
evaluator:EmbeddingDistanceEvalChain = load_evaluator(
    EvaluatorType.EMBEDDING_DISTANCE,
    llm=ChatOpenAI(model="gpt-4-1106-preview"),
    )
result = evaluator.evaluate_strings(
    prediction="I shall go",
    reference="I shan't go"
)
for k in result.keys():
    print(f"{k} : {result[k]}")
score : 0.09679532051086426

Running evaluations against datasets (데이터세트에 대한 평가 실행)¶

LangSmith의 벤치마크데이터세트에 대해 평가를 실행

이 부분은 넘어감 : langsmith account가 없어 테스트 해볼 수 없음

langsmith 계정 등록 : https://smith.langchain.com/

How to deploy LLM apps (LLM App 배포 하는 방법)¶

LLM App 배포에 사용할 수 있는 서비스 및 프레임 워크

이름 설명 유형
Streamlit 웹 앱 구축 및 배포를 위한 오픈 소스Python 프레임워크 Framework
Gradio 인터페이스에서 모델을 래핑하고 Hugging Face에서 호스트할 수 있습니다. Framework
Chainlit 대화형 ChatGPT와 유사한 앱 구축 및 배포 Framework
Apache Beam 데이터 처리 워크플로를 정의하고 조정하기 위한 도구 Framework
Vercel 웹 앱 배포 및 확장을 위한 플랫폼 Cloud service
FastAPI API 구축을 위한 Python 웹 프레임워크 Framework
Fly.io 자동 확장 및 글로벌 CDN을 갖춘 앱 호스팅 플랫폼 Cloud service
DigitalOcean App Platform 앱을 구축, 배포, 확장하는 플랫폼 Cloud service
Google Cloud 컨테이너화된 앱을 호스팅하고 확장하는Cloud Run과 같은 서비스 Cloud service
Steamship 모델 배포 및 확장을 위한 ML 인프라 플랫폼 Cloud service
Langchain-Serve LangChain 에이전트를 웹 API로 제공하는 도구 Framework
BentoML 모델 제공, 패키징, 배포를 위한 프레임워크 Framework
OpenLLM 상용 LLM에 개방형 API 제공 Cloud service
Databutton 모델 워크플로를 구축하고 배포하는 코드 없는 플랫폼 Framework
Azure ML 모델용 Azure의 관리형 MLOps 서비스 Cloud service
LangServe FastAPI를 기반으로 구축되었지만 LLM앱 배포에 특화되었습니다. Framework

LLM App 실행을 위한 요건

  • 계산 집약적인 모델과 잠재적인 트래픽 급증을 처리할 수 있는 확장 가능한 인프라
  • 모델 출력의 실시간 제공을 위한 짧은 대기 시간
  • 긴 대화와 앱 상태를 관리하기 위한 영구 저장소
  • 최종 사용자 애플리케이션에 통합하기 위한 API
  • 측정항목 및 모델 동작을 추적하기 위한 모니터링 및 로깅

FastAPI 웹 서버를 통한 LLM App 배포¶

FastAPI 서버 실행을 위해 uvicorn, lanarky 설치 : pip install uvicorn lanarky

참고 : https://lanarky.ajndkr.com/learn/adapters/langchain/router/

webserber 폴더 아래 chat.py 가 있는 경우

서버실행 : uvicorn webserver.chat:app --reload --port 9091 --host 0.0.0.0

클라이언트 실행: python client.py --input "대한민국에 대해 10줄로 요약해줘"

chat.py

from dotenv import load_dotenv
load_dotenv()
# 또는
# os.environ["OPENAI_API_KEY"] = "add-your-openai-api-key-here"

from lanarky import Lanarky
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI

from lanarky.adapters.langchain.routing import LangchainAPIRouter

app = Lanarky()
router = LangchainAPIRouter()

@router.post("/chat")
def chat(
    temperature: float = 0.0,
    verbose: bool = True,
    streaming: bool = True,
) -> ConversationChain:
    return ConversationChain(
        llm=ChatOpenAI(
            temperature=temperature,
            verbose=verbose,
            streaming=streaming,
        ),
        verbose=verbose,
    )

app.include_router(router)

client.py

import json
import click
from lanarky.clients import StreamingClient

@click.command()
@click.option("--input", required=True)
@click.option("--stream", is_flag=True)
def main(input: str, stream: bool):
    """Run the client."""
    client = StreamingClient("http://localhost:9091")
    for event in client.stream_response(
        "POST",
        "/chat",
        params={"streaming":str(stream).lower()},
        json={"input": input},
        timeout=300, # sec
        ):
        print(f"{event.event}: {json.loads(event.data)['token']}\n",
              end="", flush=True)

if __name__ == "__main__":
    main()

Ray + LangChain¶

Ray는 클러스터 전반에서 생성형 AI 워크로드를 확장하여 프로덕션 환경에서 복잡한 신경망의 인프라 문제를 해결할 수 있는 유연한 프레임워크를 제공합니다.
Ray는 지연 시간이 짧은 서비스, 분산 학습, 대규모 배치 추론과 같은 일반적인 배포 요구 사항을 지원합니다.
또한 Ray를 사용하면 온디맨드 미세 조정 또는 기존 워크로드를 한 대의 머신에서 여러 대의 머신으로 쉽게 스핀업할 수 있습니다.

  • Ray Train을 사용하여 GPU 클러스터 전체에 걸쳐 분산 훈련 작업 예약
  • Ray Serve를 통해 지연 시간이 짧은 서비스를 제공하기 위해 사전 훈련된 모델을 규모에 맞게 배포
  • Ray Data를 사용하여 CPU 및 GPU 전반에 걸쳐 대규모 배치 추론을 병렬로실행
  • 훈련, 배포, 일괄 처리를 결합한 엔드투엔드 생성 AI 워크플로우 조정

Ray는 HuggingFace의 Sentence Transformers 라이브러리 사용함

LangChain + Ray Tutorial: How to Generate Embeddings For 33,000 Pages in Under 4 Minutes

https://www.youtube.com/watch?v=hGnZajytlac

참고 :

  • https://www.anyscale.com/blog/llm-open-source-search-engine-langchain-ray
  • LocalHuggingFaceEmbeddings : https://gist.github.com/waleedkadous/aea1d312d68c9431949442cc562d5f2c

ray 설치 : pip install ray

ray를 사용한 indexing code : https://gist.github.com/waleedkadous/4c41f3ee66040f57d34c6a40e42b5969

Ray를 사용한 검색 엔진¶

테스트 환경이 cuda 12.2 사용중인데 2024.01.22 현재 faiss-gpu가 이 버전을 지원하지 않아 문제가 있음
Anaconda faiss-cpu down load site : https://anaconda.org/pytorch/faiss-gpu/files/

Ray를 사용한 검색 엔진 - CPU 버전¶

  • Ray에서 OpenAIEmbeddings 사용시 Pickle serialize에 문제 있어 사용하지 않음
  • RecursiveCharacterTextSplitter 로 text split할 때 최소 2개 이상으로 split되도록 설정 필요
    • 예를 들어 기본 chunk_size가 4000인데 document길이가 300인 경우 1나의 chunk만 생성되어 Ray에서 병렬처리시 FAISS에서 에러 발생함
    • 아래 예시에서는 chunk_size=300로 작게 잡아 여러개의 chunk가 생성 되도록 함
  • 사용하는 document크기가 작아서 Ray를 사용한 효과가 없음, 대규모 document일 때 Ray를 사용한 효과가 크게 있음.

Ray 대시보드

Ray실행시 아래 URL이 기본 모니터링 URL이고 여기에 접속하면 진행 상황을 모니터링 할 수있다.
모니터링 URL : http://localhost:8265/

아래 화면은 http://localhost:8265/#/cluster
image.png

In [ ]:
import time
import numpy as np
import ray
from bs4 import BeautifulSoup as Soup

from langchain.vectorstores.faiss import FAISS
from langchain.document_loaders import RecursiveUrlLoader
from langchain .embeddings.openai import OpenAIEmbeddings
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

Local Embedding code Huggingface에서 LLM을 다운로드 받는다.
SentenceTransformer 사용시 저장되는 cache는 ~/.cache/torch/sentence_transformers/ 에 저장된다.
새로 다운로드 받고 싶은 경우 위 폴더에서 해당 모델을 지우면 된다.

In [ ]:
# https://gist.github.com/waleedkadous/aea1d312d68c9431949442cc562d5f2c
from langchain.embeddings.base import Embeddings
from typing import List
from sentence_transformers import SentenceTransformer

class LocalHuggingFaceEmbeddings(Embeddings):
    def __init__(self, model_id,device="cuda"):
        # Should use the GPU by default
        # self.model = SentenceTransformer(model_id,device="cuda")
        self.model = SentenceTransformer(model_id,device=device)

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed a list of documents using a locally running
           Hugging Face Sentence Transformer model

        Args:
            texts: The list of texts to embed.

        Returns:
            List of embeddings, one for each text.
        """
        embeddings =self.model.encode(texts)
        return embeddings

    def embed_query(self, text: str) -> List[float]:
        """Embed a query using a locally running HF
        Sentence trnsformer.

        Args:
            text: The text to embed.

        Returns:
            Embeddings for the text.
        """
        embedding = self.model.encode(text)
        return list(map(float, embedding))
In [ ]:
INDEX_PATH = "data/faiss_index.db"
INDEX_PATH_RAY = "data/faiss_index_ray.db"
use_openai_embedding = False
num_cpus = 8
num_gpus = 0
chunks_num = 8

chunk_size = 300
chunk_overlap = 0

cache를 위해 미리 다운로드 받음

In [ ]:
embeddings = LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1',device="cpu")
In [ ]:
embeddings.__dict__
Out[ ]:
{'model': SentenceTransformer(
   (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
   (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
 )}
In [ ]:
def chunk_docs(url: str) -> list[Document]:
    """Crawl a website and split the text into chunks.

    Wrapping the texts into list[Document]
    in order to keep metadata.
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
    )

    # Load docs
    loader = RecursiveUrlLoader(
        url=url,
        max_depth=2,
        extractor=lambda x: Soup(x, "html.parser").text
    )
    docs = loader.load()

    # Split docs
    return text_splitter.create_documents(
        [doc.page_content for doc in docs],
        metadatas=[doc.metadata for doc in docs]
    )

def create_db(chunk: list[Document]) -> FAISS:
    """This is the easy way."""
    get_embeddings =  OpenAIEmbeddings() if use_openai_embedding else LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1',device="cpu")
    return FAISS.from_documents(
        documents=chunk,
        embedding=get_embeddings,
    )

@ray.remote
def process_shard(chunk: list[Document]):
    """Process task.

    You can specify the number of GPUs or CPUs you want to use as
    part of the ray decorator.
    """
    # LocalHuggingFaceEmbeddings 소스는 https://github.com/ray-project/langchain-ray 참고
    # embeddings = LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1')
    get_embeddings =  OpenAIEmbeddings() if use_openai_embedding else LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1',device="cpu")
    return FAISS.from_documents(
        documents=chunk,
        embedding=get_embeddings,
    )

def create_db_parallel(chunk: list[Document]):
    """Create a FAISS db with parallelism."""
    # Split chunk into shards
    shards = np.array_split(chunk, chunks_num)
    try:
        # Start Ray
        # faiss-gpu를 사용할 수 없어 cpu로 실행
        ray.init(num_cpus=num_cpus, num_gpus=num_gpus)

        # Process shards in parallel
        futures = [process_shard.remote(shard) for shard in shards]
        results = ray.get(futures)

        # Merge index shards
        db = results[0]
        for result in results[1:]:
            db.merge_from(result)


        return db
    finally:
        # Shutdown Ray
        ray.shutdown()
In [ ]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"


print("Starting indexing process...")
# chunks = chunk_docs("https://docs.ray.io/en/latest/")
chunks = chunk_docs(" https://www.angelfire.com/amiga2/hermnerp/")
if len(chunks) == 0:
    raise ValueError("No chunks created!")
st = time.time()
db = create_db(chunks)
et = time.time() - st
print(f"[FAISS] Indexing took {et} seconds.")
db.save_local(INDEX_PATH)
print("[FAISS] Indexing end.")

st = time.time()
db = create_db_parallel(chunks)
et = time.time() - st
print(f"[Parallel] Indexing took {et} seconds.")
db.save_local(INDEX_PATH_RAY)
print("[Parallel] Indexing end.")
Starting indexing process...
[FAISS] Indexing took 22.301445722579956 seconds.
[FAISS] Indexing end.
2024-01-22 16:08:22,679	INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
[Parallel] Indexing took 42.84457087516785 seconds.
[Parallel] Indexing end.

생성한 FAISS vector db 사용하기

In [ ]:
# load index and embed query
embeddings = LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1',device="cpu")
db = FAISS.load_local(INDEX_PATH,embeddings)

query = ("What are the different components of Ray"
        " and how can they help with large language models(LLMs)?")
print(f"query : {query}")

query_embedding = embeddings.embed_query(query)
print(f"query(embdding) : {query_embedding}")

response = db.max_marginal_relevance_search(query)
print(f"Response[query by string] : {response[0].page_content}")

response = db.max_marginal_relevance_search_by_vector(query_embedding)
print(f"Response[query by embedding] : {response[0].page_content}")
query : What are the different components of Ray and how can they help with large language models(LLMs)?
query(embdding) : [-0.1218448132276535, -0.575739860534668, -0.30155089497566223, 0.157945916056633, -0.036801282316446304, -0.08389551192522049, 0.025580022484064102, 0.034830961376428604, -0.1296749711036682, 0.09758447855710983, -0.27039122581481934, 0.14771181344985962, 0.05638884752988815, -0.03309944272041321, 0.37693437933921814, -0.07309211045503616, 0.2257201373577118, 0.04049176722764969, -0.3758867383003235, -0.3522770404815674, 0.06995034962892532, -0.038121335208415985, 0.07076817750930786, -0.16059061884880066, -0.2032684087753296, -0.3682621121406555, 0.0031088863033801317, -0.009433119557797909, 0.043452341109514236, -0.07096109539270401, -0.4535408318042755, -0.1133732795715332, 0.09380713105201721, 0.12081199884414673, -0.00010589014709694311, -0.1444205492734909, 0.15757553279399872, 0.055356141179800034, -0.3289164900779724, 0.22212862968444824, 0.08768447488546371, -0.09148580580949783, 0.011038917116820812, -0.12585918605327606, 0.3585169017314911, 0.036485616117715836, 0.15447288751602173, -0.0771031454205513, 0.17776988446712494, 0.2955094277858734, 0.27288591861724854, 0.19018441438674927, -0.01165042631328106, 0.04079383239150047, 0.37182119488716125, 0.3094141185283661, 0.07609003037214279, -0.24651040136814117, 0.49285346269607544, 0.32737135887145996, -0.4256010055541992, 0.2698798179626465, 0.22224614024162292, -0.09343230724334717, 0.2763761281967163, -0.1743265688419342, 0.13533982634544373, -0.2479535937309265, 0.2106393724679947, 0.06247861683368683, 0.09860125184059143, -0.13879303634166718, -0.10461053997278214, -0.2689225375652313, -0.14959435164928436, -0.19398991763591766, -0.15509870648384094, 0.313882440328598, 0.16444307565689087, -0.011471749283373356, 0.0654311329126358, -0.3092331886291504, -0.15312136709690094, -0.20977090299129486, 0.04427507147192955, 0.5564904808998108, -0.023153597488999367, -0.11332333087921143, 0.14388634264469147, 0.2539701461791992, 0.4659320116043091, 0.06723964959383011, 0.005369469057768583, 0.1309397965669632, -0.1820247322320938, 0.17908227443695068, 0.1693027764558792, 0.46602511405944824, -0.0941818431019783, -0.33782362937927246, -0.00047656046808697283, 0.21378514170646667, -0.13491465151309967, 0.15648028254508972, -0.0065602753311395645, 0.19320644438266754, 0.027402836829423904, 0.03510528430342674, -0.11613722890615463, -0.20380689203739166, -0.05725199356675148, 0.1631087362766266, 0.0693933516740799, 0.3252241611480713, -0.41160711646080017, -0.05052339658141136, -0.5447328090667725, 0.16649965941905975, 0.06249250844120979, -0.033512577414512634, 0.25861823558807373, -0.27387264370918274, 0.0314546599984169, 0.17768242955207825, 0.0654209777712822, 0.21240270137786865, -0.255998820066452, -0.4242801070213318, -0.014866658486425877, -0.4928950071334839, -0.16476000845432281, 0.2671752870082855, 0.009533490054309368, -0.09558464586734772, -0.14657913148403168, -0.14536498486995697, 0.31231486797332764, -0.09472636878490448, 0.3326418697834015, 0.21155861020088196, -0.028948545455932617, -0.4210755527019501, 0.30368784070014954, 0.20445510745048523, -0.11638623476028442, -0.1512899249792099, -0.057216960936784744, 0.06848994642496109, -0.041245054453611374, -0.11596675217151642, -0.034207914024591446, -0.24085749685764313, 0.041062671691179276, 0.19685310125350952, 0.15316563844680786, 0.1665455549955368, 0.6352234482765198, 0.45150047540664673, -0.4380393624305725, -0.07334285229444504, 0.16331514716148376, -0.24000105261802673, 0.16685400903224945, -0.021217582747340202, -0.05939857289195061, -0.024241043254733086, -0.6939525604248047, -0.16828519105911255, -0.30701592564582825, -0.1157665029168129, 0.1327318549156189, 0.19350099563598633, -0.19033248722553253, -0.10182739049196243, 0.10852719098329544, 0.1418846994638443, 0.2197445183992386, 0.010511670261621475, 0.06738515943288803, 0.1481940746307373, 0.23559679090976715, -0.13715791702270508, 0.17734040319919586, 0.2862047553062439, 0.2517852783203125, -0.07003583759069443, -0.11253289878368378, 0.05093653127551079, -0.15177303552627563, -0.2529450058937073, -0.06112150847911835, 0.00020939394016750157, -0.5374375581741333, 0.19760382175445557, 0.04765525087714195, -0.2628922760486603, 0.3071158826351166, -0.08752534538507462, -0.012746877036988735, -0.041327256709337234, 0.03979358822107315, -0.11324410140514374, -0.0717497169971466, -0.06340491026639938, 0.04962212219834328, -0.151142418384552, -0.19572272896766663, -0.24680574238300323, -0.24847501516342163, 0.2970809042453766, 0.2510869801044464, 0.1362362653017044, -0.050157856196165085, 0.3083432912826538, 0.27014029026031494, 0.043353114277124405, 0.22377623617649078, 0.17883537709712982, 0.06527715921401978, -0.04021162912249565, -0.481334924697876, -0.11897703260183334, -0.07160227745771408, -0.26373934745788574, 0.26431888341903687, -0.2971729636192322, -0.3607756495475769, -0.19445359706878662, 0.09054242074489594, 0.11530076712369919, -0.16598187386989594, -0.06466265767812729, 0.04611452668905258, 0.008616220206022263, -0.12507471442222595, -0.04610218107700348, -0.12787030637264252, 0.3305436372756958, 0.3236663043498993, 0.09093302488327026, 0.18968646228313446, -0.4375359117984772, 0.24666324257850647, 0.07723822444677353, 0.24623046815395355, 0.2275376170873642, 0.07743839174509048, 0.12097269296646118, 0.46716320514678955, -0.09728576987981796, 0.10260533541440964, 0.038909874856472015, 0.20552971959114075, -0.1409052014350891, 0.47791364789009094, -0.04000898823142052, -0.10862094163894653, 0.01859402097761631, -0.3086336553096771, 0.2566142976284027, 0.3390814960002899, -0.04262655973434448, 0.1371215134859085, -0.0031267190352082253, -0.35140353441238403, -0.32201164960861206, 0.29422473907470703, 0.2372925728559494, -0.09563743323087692, 0.19792157411575317, 0.3010822832584381, 0.046567369252443314, -0.3689059615135193, -0.329055517911911, 0.17841750383377075, 0.052640125155448914, 0.16933505237102509, 0.10834460705518723, -0.4391794204711914, -0.06759509444236755, -0.05095510184764862, -0.15716005861759186, -0.23370066285133362, -0.19362643361091614, 0.16945239901542664, 0.15673410892486572, -0.13544903695583344, -0.4118696451187134, -0.0851975604891777, 0.3043000400066376, 0.05411127582192421, -0.1943662017583847, 0.2771838307380676, -0.5303288102149963, -0.17862047255039215, -0.07639867067337036, 0.15750089287757874, 0.40158596634864807, -0.12752637267112732, 0.3691331148147583, -0.22520993649959564, -0.5554901957511902, -0.2753055691719055, 0.19732250273227692, 0.07757028937339783, 0.11523333936929703, -0.02662774734199047, -0.17517969012260437, -0.07735388725996017, -0.053475525230169296, -0.13906215131282806, 0.44138193130493164, -0.044818099588155746, 0.24240811169147491, 0.0744200050830841, -0.3476702868938446, -0.2418975830078125, -0.04459219425916672, 0.012416614219546318, -0.274825781583786, 0.03558008372783661, -0.127491295337677, -0.28371283411979675, -0.30047205090522766, -0.244038924574852, -0.39412638545036316, -0.060045026242733, -0.21920891106128693, -0.006282526534050703, 0.04874611645936966, 0.419185072183609, -0.17472735047340393, 0.06778277456760406, 0.2602772116661072, -0.40521588921546936, 0.48748844861984253, -0.022727537900209427, 0.06908522546291351, -0.049571581184864044, -0.11261078715324402, -0.33542346954345703, -0.11469440907239914, -0.09140340983867645, 0.44806572794914246, 0.15780892968177795, -0.3719494938850403, -0.1105099618434906, -0.02921813167631626, -0.25133174657821655, -0.01848657988011837, 0.40529635548591614, -0.054899413138628006, -0.033609285950660706, -0.3130718469619751, -0.2064647525548935, -0.25835832953453064, -0.1684165596961975, 0.008224994875490665, 0.37103527784347534, -0.2916087210178375, 0.18547485768795013, -0.193232461810112, 0.07384896278381348, -0.39528200030326843, -0.26684436202049255, 0.19418160617351532, 0.5267090797424316, 0.053510840982198715, 0.025695601478219032, 0.03785517066717148, 0.468219131231308, -0.29137322306632996, -0.089089535176754, 0.3977334797382355, -0.20436963438987732, -0.36583060026168823, -0.02554463967680931, 0.05040668323636055, -0.15102823078632355, 0.0808415412902832, 0.3161207437515259, 0.15289156138896942, 0.517590343952179, 0.014711685478687286, -0.3032781481742859, 0.02644611895084381, 0.2509128153324127, -0.11080764979124069, 0.01649906486272812, 0.1993069350719452, -0.12009691447019577, -0.2721719741821289, -0.1484525203704834, 0.43618881702423096, 0.1581212878227234, 0.16513380408287048, -0.044602032750844955, -0.11269998550415039, -0.2879175543785095, 0.08805624395608902, -0.10007347166538239, 0.24580629169940948, 0.1349545568227768, -0.3178578019142151, 0.031181856989860535, -0.22172977030277252, -0.17326496541500092, 0.1672867238521576, -0.554010808467865, -0.2701113820075989, 0.5665721297264099, 0.15994597971439362, 0.036898039281368256, -0.09859607368707657, -0.41016462445259094, 0.19967028498649597, 0.02039198949933052, 0.10844475030899048, 0.026946600526571274, -0.15091100335121155, -0.2896684408187866, -0.07481962442398071, -0.2740398645401001, -0.19055035710334778, 0.15266644954681396, 0.25043728947639465, -0.023957347497344017, -0.0256106685847044, -0.03753078356385231, 0.3020195960998535, -0.2923896014690399, -0.13151994347572327, -0.3522461950778961, 0.0005636899149976671, 0.6955694556236267, -0.25430864095687866, -0.12906362116336823, -0.12315797060728073, 0.3664502501487732, 0.20531901717185974, 0.07294723391532898, 0.1216011568903923, 0.312197208404541, 0.21121147274971008, 0.43834298849105835, -0.12515221536159515, -0.12989117205142975, 0.15506018698215485, 0.13307562470436096, -0.32014867663383484, 0.21519267559051514, 0.014668426476418972, 0.05354144424200058, -0.050288859754800797, -0.190950408577919, -0.3377974033355713, -0.6691712737083435, 0.16961801052093506, 0.005597793031483889, 0.2219439297914505, 0.43270328640937805, 0.7023808360099792, 0.04286685585975647, 0.20133772492408752, -0.25337663292884827, -0.07596307247877121, 0.4537547826766968, -0.10225500911474228, 0.3004309833049774, 0.06667638570070267, 0.030058011412620544, -0.08483737707138062, -0.08077294379472733, -0.10054076462984085, -0.05448257923126221, -0.25486013293266296, 0.1831711083650589, 0.26318034529685974, -0.30706536769866943, 0.10980331152677536, 0.10962046682834625, 0.0045538293197751045, -0.028551869094371796, -0.07947339862585068, 0.17108336091041565, 0.023899167776107788, -0.29793599247932434, -0.029264340177178383, 0.05486553534865379, -0.0018781872931867838, 0.25134527683258057, 0.06450188905000687, -0.1970037966966629, -0.16779521107673645, 0.10172487050294876, -0.08159662783145905, -0.1157589927315712, 0.46799883246421814, 0.015129206702113152, 0.03323037177324295, -0.1555422991514206, 0.016703099012374878, 0.20415127277374268, 0.08485881984233856, -0.15831635892391205, -0.07886292785406113, -0.02373208850622177, 0.35277441143989563, -0.09480211138725281, -0.021891625598073006, 0.3124166429042816, -0.1055772602558136, 0.31468284130096436, -0.209340900182724, 0.14005307853221893, 0.08205362409353256, 0.3418963551521301, 0.18072111904621124, 0.11967403441667557, -0.2139657884836197, -0.10771346837282181, 0.1955455094575882, -0.2825213372707367, 0.09454388171434402, 0.4312111735343933, 0.4461631774902344, 0.06584613770246506, -0.33758699893951416, -0.2711847126483917, -0.0405009426176548, -0.07506492733955383, 0.03523518145084381, 0.295517235994339, -0.27653735876083374, -0.2848857641220093, -0.35784101486206055, -0.28312674164772034, -0.3109068274497986, 0.21789290010929108, -0.05638473480939865, -0.5171674489974976, 0.008749962784349918, 0.0031904212664812803, 0.6043452024459839, 0.22156110405921936, -0.7646113038063049, 0.06975540518760681, -0.11166659742593765, 0.30841585993766785, -0.1413542777299881, -0.07153713703155518, -0.05462770164012909, 0.07065033912658691, 0.11086701601743698, 0.8521008491516113, -0.37342044711112976, -0.09780436754226685, -0.11540675908327103, 0.3108184337615967, -0.06708461046218872, -0.23323562741279602, -0.04974593594670296, 0.383373498916626, 0.1407316029071808, -0.0521865151822567, -0.3134906589984894, -0.29322123527526855, -0.17607764899730682, 0.07743246108293533, 0.016454704105854034, -0.0802486315369606, 0.034827351570129395, 0.006406795233488083, 0.14088182151317596, -0.021881157532334328, -0.009882325306534767, -0.6269175410270691, -0.06522954255342484, 0.21852296590805054, 0.10934057086706161, 0.026845335960388184, -0.08387608826160431, 0.005312390625476837, 0.20546546578407288, 0.07117365300655365, 0.1754700392484665, 0.019681474193930626, 0.1562105119228363, -0.21541069447994232, -0.1124826967716217, 0.32805338501930237, 0.2580564022064209, -0.044013503938913345, 0.062030065804719925, 0.5103971362113953, 0.16408999264240265, -0.0755828469991684, -0.15534986555576324, 0.13735510408878326, 0.18656779825687408, -0.06435307115316391, 0.12996912002563477, 0.2032657116651535, 0.041029032319784164, 0.009560859762132168, -0.18456758558750153, -0.1782069057226181, 0.29499951004981995, 0.1994144767522812, 0.44802001118659973, 0.23278774321079254, -0.11542035639286041, -0.034126490354537964, 0.13597431778907776, 0.008974717929959297, -0.2532523572444916, 0.24436219036579132, 0.4873661398887634, 0.23187170922756195, 0.3685046434402466, -0.29262465238571167, 0.1780509501695633, 0.4517408311367035, -0.2318921983242035, 0.4023730158805847, 0.13332979381084442, 0.19424059987068176, 0.6162508130073547, 0.3327465057373047, 0.4339936673641205, 0.4122984707355499, 0.27551379799842834, -0.011135377921164036, -0.12237509340047836, 0.11062469333410263, 0.34824705123901367, -0.06906174123287201, 0.06382148712873459, 0.2541700005531311, 0.053420841693878174, -0.19292467832565308, -0.3721238970756531, -0.19374816119670868, -0.06747515499591827, 0.09360500425100327, -0.039960719645023346, 0.3883584141731262, 0.2641882300376892, 0.1105898842215538, 0.07546855509281158, -0.05479292944073677, 0.14759469032287598, -0.02145499922335148, -0.08321275562047958, 0.1548558622598648, 0.1592870056629181, 0.18032005429267883, 0.19437196850776672, 0.0596376471221447, -0.04758438095450401, 0.11661523580551147, 0.20142409205436707, -0.2213766872882843, -0.02667533978819847, 0.034202709794044495, -0.04541432857513428, -0.21654807031154633, -0.0019451514817774296, 0.17974703013896942, 0.3489218056201935, 0.048148173838853836, 0.14903701841831207, -0.22788725793361664, 0.01731409877538681, -0.09987170994281769, -0.34228822588920593, 0.17685852944850922, 0.15383073687553406, -0.05241325497627258, -0.06238087639212608, -0.2588856816291809, 0.5053299069404602, -0.42614081501960754, 0.5310761332511902, 0.37559762597084045, -0.48411592841148376, -0.03523106873035431, 0.19861365854740143, 0.10947580635547638, -0.11027193814516068, 0.6135931611061096, 0.2261592000722885, 0.3877185881137848, 0.07817458361387253, -0.30295026302337646, 0.022245600819587708, 0.15737447142601013, 0.15208278596401215, -0.13044561445713043, -0.10748011618852615, 0.2455819994211197, 0.0648651123046875, -0.04648810625076294, 0.1425233781337738, -0.13524532318115234, -0.08900730311870575, -0.23015892505645752, -0.2394154816865921, -0.08418253809213638, 0.12780632078647614, 0.1915593296289444, 0.04542088136076927, 0.10891328006982803, 0.014355634339153767, 0.08371055871248245, 0.08835351467132568, -0.3600148856639862, 0.09901481121778488, 0.40396004915237427, 0.20655129849910736, 0.07407192140817642, 0.07757483422756195, 0.35859450697898865, -0.20776736736297607, -0.1680627316236496, 0.059923432767391205, -0.10172175616025925, 0.07398668676614761, -0.12207341939210892, 0.4121342599391937, 0.10030019283294678, -0.2273373156785965, -0.02101266197860241, 0.29231083393096924, 0.31664079427719116, 0.07894503325223923, 0.10244284570217133, 0.010573849081993103, -0.04667789489030838, -0.19729869067668915, -0.3962233364582062, 0.300437867641449, 0.19856655597686768, 0.1342162787914276, -0.30889835953712463, -0.19465219974517822, -0.08312670886516571, -0.021144293248653412, 0.11647957563400269, 0.10112706571817398, -0.11441386491060257, -0.25113216042518616, 0.159907728433609, -0.04951528459787369, -0.5695889592170715, -0.33676597476005554, 0.11783897876739502, 0.060183800756931305, -0.15329471230506897, 0.06287959218025208, 0.2911015748977661, -0.24331782758235931, -0.275825560092926, -0.19718579947948456, 0.21606241166591644, -0.46190348267555237, -0.29307761788368225, -0.03467530012130737]
Response[query by string] : in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a total mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
Response[query by embedding] : in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a total mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

Ray를 사용한 text embedding 예제 2

In [ ]:
import ray
from sentence_transformers import SentenceTransformer

import torch
import psutil

@ray.remote
class LocalHuggingFaceEmbeddingsActor:
    def __init__(self, model_id, num_gpus:int=0):
        device = 'cuda' if num_gpus > 0 else 'cpu'
        self.model = SentenceTransformer(model_id, device=device,cache_folder="data/sentence_transformers_cache")

    def embed_documents(self, texts):
        embeddings = self.model.encode(texts)
        return embeddings

    def embed_query(self, text):
        embedding = self.model.encode(text)
        return list(map(float, embedding))

try:
    # 사용 가능한 GPU 개수 확인
    num_gpus = torch.cuda.device_count()

    print(f"num gpus: {num_gpus}")

    num_cpus = 0
    if num_gpus == 0:
        num_cpus = psutil.cpu_count(logical=False)

    print(f"num gpus: {num_gpus}, num cpus: {num_cpus}")
    # Initialize Ray
    ray.init() #(num_gpus=num_gpus, num_cpus=num_cpus)

    # Create an actor instance
    # gpu 사용하지 않음
    embeddings_actor = LocalHuggingFaceEmbeddingsActor.remote("multi-qa-mpnet-base-dot-v1",num_gpus=0)

    # Example usage
    texts = [
        "Hello world",
        "How are you?",
        """""The process of selecting output tokens to generate text is known as decoding,
        and you can customize the decoding strategy that the generate() method will use.
        Modifying a decoding strategy does not change the values of any trainable parameters.
        However, it can have a noticeable impact on the quality of the generated output.
        It can help reduce repetition in the text and make it more coherent."""
        ]

    st = time.time()
    future = embeddings_actor.embed_documents.remote(texts)
    result = ray.get(future)
    et = time.time() - st
    print(f"embeddings took {et} seconds.\n")
    print(result)
finally:
    pass
num gpus: 3
num gpus: 3, num cpus: 0
2024-01-22 17:18:37,845	INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
embeddings took 4.634313106536865 seconds.

[[ 0.17623404 -0.23755084 -0.25186116 ...  0.02418865 -0.05202777
  -0.13542359]
 [-0.04663036 -0.29437593 -0.34973297 ... -0.05384175 -0.00522118
  -0.01733526]
 [ 0.1325658  -0.15208684 -0.08218877 ...  0.04821041 -0.31334692
  -0.39685684]]
In [ ]:
ray.shutdown()

How to observe LLM apps¶

Tracking Response¶

에이전트나 LLM을 초기화할 때 return_intermediate_steps=True 로 설정하기만 하면 됩니다 .

In [ ]:
import subprocess
from urllib.parse import urlparse
from pydantic import BaseModel, validator, HttpUrl
from langchain.tools import StructuredTool
In [ ]:
def ping(url: str, return_error: bool) -> str:
    """Ping the fully specified url. Must include https:// in the url."""
    # hostname = urlparse(url).netloc
    hostname = urlparse(str(HttpUrl(url))).netloc
    print(f"Pinging {hostname}...")
    completed_process = subprocess.run(
        ["ping","-c","1",hostname],capture_output=True, text=True
    )
    output = completed_process.stdout
    if return_error and completed_process.returncode != 0:
        return completed_process.stderr
    return output

ping_tool = StructuredTool.from_function(ping)
In [ ]:
ping("https://naver.com",False)
Pinging naver.com...
Out[ ]:
'PING naver.com (223.130.200.107) 56(84) bytes of data.\n\n--- naver.com ping statistics ---\n1 packets transmitted, 0 received, 100% packet loss, time 0ms\n\n'
In [ ]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
In [ ]:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = initialize_agent(
    llm = llm,
    tools = [ping_tool],
    agent=AgentType.OPENAI_MULTI_FUNCTIONS,
    return_intermediate_steps=True, # IMPORTANT
)
In [ ]:
result = agent("What's the latency  like for https://langchain.com?")
print(result)
print(result["output"])
Pinging langchain.com...
{'input': "What's the latency  like for https://langchain.com?", 'output': 'The latency for https://langchain.com is approximately 2.45 milliseconds.', 'intermediate_steps': [(AgentActionMessageLog(tool='ping', tool_input={'url': 'https://langchain.com', 'return_error': False}, log="\nInvoking: `ping` with `{'url': 'https://langchain.com', 'return_error': False}`\n\n\n", message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n  "actions": [\n    {\n      "action_name": "ping",\n      "action": {\n        "url": "https://langchain.com",\n        "return_error": false\n      }\n    }\n  ]\n}', 'name': 'tool_selection'}})]), 'PING langchain.com (52.223.52.2) 56(84) bytes of data.\n64 bytes from a0b1d980e1f2226c6.awsglobalaccelerator.com (52.223.52.2): icmp_seq=1 ttl=242 time=2.45 ms\n\n--- langchain.com ping statistics ---\n1 packets transmitted, 1 received, 0% packet loss, time 0ms\nrtt min/avg/max/mdev = 2.447/2.447/2.447/0.000 ms\n')]}
The latency for https://langchain.com is approximately 2.45 milliseconds.
In [ ]:
print(result["intermediate_steps"][0][0].json(indent=2))
{
  "tool": "ping",
  "tool_input": {
    "url": "https://langchain.com",
    "return_error": false
  },
  "log": "\nInvoking: `ping` with `{'url': 'https://langchain.com', 'return_error': False}`\n\n\n",
  "type": "AgentActionMessageLog",
  "message_log": [
    {
      "content": "",
      "additional_kwargs": {
        "function_call": {
          "arguments": "{\n  \"actions\": [\n    {\n      \"action_name\": \"ping\",\n      \"action\": {\n        \"url\": \"https://langchain.com\",\n        \"return_error\": false\n      }\n    }\n  ]\n}",
          "name": "tool_selection"
        }
      },
      "type": "ai",
      "example": false
    }
  ]
}

Promptwatch를 사용한 Trace¶

site : https://promptwatch.io

프롬프트와 LLM의 응답을 함께 볼 수 있습니다.
또한 활동의 시계열이 포함된 대시보드를 통해 특정 시간대의 재응답을 드릴다운할 수 있습니다.
이는 실제 시나리오에서 프롬프트, 출력 및 비용을 효과적으로 모니터링하고 분석하는 데 매우 유용해 보입니다.
이 플랫폼을 사용하면 웹 인터페이스에서 심층적인 분석 및 문제 해결이 가능하여 사용자가 문제의 근본 원인을 파악하고 프롬프트 템플릿을 최적화할 수 있습니다.
promptwatch.io는 단위 테스트와 프롬프트 템플릿 검증에도 도움이 될 수 있습니다.

python library 설치

  • pip install promptwatch

.env 파일에 아래 추가

  • PROMPTWATCH_API_KEY={api_key}
  • PROMPTWATCH_TRACKING_PROJECT={project name}

image.png

In [ ]:
import os
from langchain.chains import LLMChain
from langchain.llms.openai import OpenAI
from langchain.prompts import PromptTemplate
from promptwatch import PromptWatch
In [ ]:
import os
os.getenv("PROMPTWATCH_API_KEY")
In [ ]:
prompt_template = PromptTemplate.from_template(template="Finish this sentence: {input}")
my_chain = LLMChain(llm=OpenAI(),prompt=prompt_template)
In [ ]:
with PromptWatch() as pw:
   result= my_chain("The quick brown fox jumped over.")
print(result)
{'input': 'The quick brown fox jumped over.', 'text': '\n\nthe lazy dog.'}