Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a user friendly inference demo #532

Open
borisdayma opened this issue Mar 18, 2024 · 0 comments
Open

Create a user friendly inference demo #532

borisdayma opened this issue Mar 18, 2024 · 0 comments
Assignees

Comments

@borisdayma
Copy link

borisdayma commented Mar 18, 2024

This is a feature request.

I like maxtext because it is very customizable and efficient for training.
The main issue I’m having is hacking away an inference function. The code is quite complex so not straightforward to do.
The simple decode.py works but it seems mainly experimental development for streaming.

I think streaming will be really cool, but we would also benefit from an easy model.generate(input_ids, attention_mask, params) function:

  • it should allow prefill based on the length of input_ids (user responsibility to try to supply not too many shapes to avoid recompilation)
  • it should allow batch input, with left padding to support different input length
  • should be compilable with jit/pjit
  • allow a few common sampling strategy: greedy, sample (with temperature, top k, top p), beam search
  • allow being used without a separate engine/service in case we want to make it part of a larger function that includes multiple models

This PR looked interesting: #402
I think that it was mainly for benchmarking though as it didn’t stop when the entire batch was eos but had a nice prefill functionality.

@vipannalla vipannalla self-assigned this Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants