Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Different embedding token usage in Langfuse than in OpenAI #1871

Open
michalwelna0 opened this issue Apr 26, 2024 · 5 comments
Open

bug: Different embedding token usage in Langfuse than in OpenAI #1871

michalwelna0 opened this issue Apr 26, 2024 · 5 comments
Labels
question Further information is requested 🐞❔ unconfirmed bug

Comments

@michalwelna0
Copy link

Describe the bug

I wanted to use LlamaParse for parsing a set of documents (PDF/Doc/Docx) and index them to be able to ask custom questions to those documents. I created dedicated (fresh) OpenAI API Key, because I wanted to monitor the token usage and compare Langfuse with OpenAI. Once I performed a bunch of tests where I simply parse documents using LlamaParse and perform indexing step with LlamaIndex, I encountered mismatch between token usage by embedding model in Langfuse and in OpenAI.

Model used text-embedding-ada-002(-v2)
LangFuse token count - 32750
OpenAI account count - 33198
Difference 448 tokens

Tests on same set of documents
33249 - 32800 - 449

Tests on other documents
40779 - 40328 - 451
40646 - 40327 - 319

OpenAI always counted more tokens than Langfuse. As you can see perfoming tests on same documents a pattern shown up that we were missing around 450 tokens.

Have you ever experienced similar issue? Or maybe this is expected behaviour?

To reproduce

from llama_parse import LlamaParse
from langfuse.decorators import observe, langfuse_context
from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.callbacks import CallbackManager
from llama_index.core.node_parser import MarkdownElementNodeParser
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.postprocessor import SimilarityPostprocessor

import os
from langfuse import Langfuse
from pathlib import Path

os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LANGFUSE_HOST"] = ""
os.environ["OPENAI_API_KEY"] = ""
os.environ["LLAMA_CLOUD_API_KEY"] = ""

langfuse = Langfuse()

folder_path = Path("path do local documents")
num_workers = len([path for path in folder_path .iterdir()])
parser = LlamaParse(
    result_type="markdown",
    verbose=True,
    language="en",
    num_workers=num_workers,  # should be number of documents, limit 10
)


def index_docs(docs, trace):
    
    span = trace.span(
    name="index-docs",
    )
    
    Settings.llm = OpenAI(model="gpt-3.5-turbo")
    Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
    node_parser = MarkdownElementNodeParser(
            llm=OpenAI(model="gpt-3.5-turbo"), num_workers=num_workers
        )

    langfuse_callback_handler.set_root(trace)

    nodes = node_parser.get_nodes_from_documents(documents=docs)
    base_nodes, objects = node_parser.get_nodes_and_objects(nodes=nodes)

    index = VectorStoreIndex(nodes=base_nodes + objects)

    engine = index.as_query_engine(
        similarity_top_k=15,
        node_postprocessors=[
            SimilarityPostprocessor(similarity_cutoff=0.4)
        ],
        verbose=True,
    )
    langfuse_callback_handler.set_root(None)
    span.end()
    return engine

def run():
    trace = langfuse.trace(
    name = "Test trace",
    tags = ["TEST"]
)
    documents = parser.load_data([str(path) for path in folder_path.iterdir()])
    engine = index_docs(docs=documents, trace=trace)

run()

Additional information

I tried both Langfuse approaches using decorators (@observe()) and using low-level SDK. Both gave me same results - where embedding token count was not the same as from OpenAI.

The reason of this issue is that I would like to use Langfuse (monitoring, token counting, price calculation etc.) as a reliable source and I want to be sure that it calculates costs of token usage correctly, so I could estimate each the cost of each trace that I will be executing.

I verified that LlamaParse is not responsible for embedding token usage (performed a simple test only on using LlamaParse phase and monitored OpenAI token usage).

@marcklingen
Copy link
Member

Do you run on Langfuse Cloud? If so, could you provide a trace_id where this issue occurred? Checking the logs for this request end-to-end would help debug the problem and identify its source.

@marcklingen marcklingen added the question Further information is requested label Apr 30, 2024 — with Linear
Copy link
Member

Do your embedding generations include inputs/outputs? By default, Langfuse takes the token numbers reported by LlamaIndex and does not attempt to tokenize them on the api-level as storing all embedded documents also in Langfuse is usually not necessary

https://github.com/langfuse/langfuse-python/blob/e77183bc0f69df1803fc33481e06e2fab83ec419/langfuse/llama_index/llama_index.py#L416

@michalwelna0
Copy link
Author

Hi @marcklingen, no we do not run it on Langfuse Cloud, we have it self-hosted in our environment so you will not be able to check it.

Embedding generations include inputs only - I did not experience any tokens generated from OpenAI.
So (If I understand correctly) Langfuse only gets tokens calculated from LlamaIndex and does not calculate them by itself? If so, this may be from LlamaIndex side..

Copy link
Member

Langfuse does both but for llamaindex we try to get them via llamaindex and just ingest into langfuse. If you have logs, you could try to find out if this event included token counts to pinpoint the problem. If no token counts are provided and a known model (e.g. the ones from openai that you use) are used, then Langfuse tokenizes within the ingestion api

@michalwelna0
Copy link
Author

Maybe to visualize the problem. I run the code snippet provided above on a sample 5 documents (PDFs/Docx). I obtained 41119 tokens used by embedding model (Text-embedding-ada-002-v2) from OpenAI usage. And from LangFuse trace I see OpenAIEmbedding generation with 40775 tokens used. Here are some screenshots of Trace within LangFuse UI.

Full trace view:
image

The same view but scrolled to down:
image

Opened EmbeddingGeneration in LangFuse:
image

The difference is 344 tokens between OpenAI and LangFuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested 🐞❔ unconfirmed bug
Projects
None yet
Development

No branches or pull requests

2 participants