Suggested functionality: Estimate by model_type #1

Somerandomguy10111 · 2023-10-19T11:31:27Z

First off: Great tool and saved me the headache of trying to trace the functions tokens myself.
A final touch could to introduce an option to have a token estimator class (tokenizer class?) which gets the model type as attribute and then uses the tiktoken.encoding_for_model() function to retrieve the encoding.

That way if openai ever changes the encoding or uses a different encoding for newer models the package can stay up to date.
On a side note what I think is also useful are following functions which you can use e.g. to prevent logging of huge inputs to the model

def get_string_tokens(self, the_str : str) -> int:
    return len(self.encode(the_str))


def get_limited_string(self, the_str : str, max_tokens : int) -> str:
    encoded_str = self.encode(the_str)
    return self.decode(encoded_str[:max_tokens])

Best
Somerandomguy10111

The text was updated successfully, but these errors were encountered:

Somerandomguy10111 · 2023-10-19T11:31:59Z

If I get around to it I will implement it and pull request it myself

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested functionality: Estimate by model_type #1

Suggested functionality: Estimate by model_type #1

Somerandomguy10111 commented Oct 19, 2023

Somerandomguy10111 commented Oct 19, 2023

Suggested functionality: Estimate by model_type #1

Suggested functionality: Estimate by model_type #1

Comments

Somerandomguy10111 commented Oct 19, 2023

Somerandomguy10111 commented Oct 19, 2023