Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched & chunked prefill #234

Closed

Conversation

lucasavila00
Copy link
Contributor

@lucasavila00 lucasavila00 commented Apr 28, 2024

Continuing #219

Closes #216

Copy link

Code Metrics Report
  ───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Rust                        70     23339     1550       508    21281       1281
───────────────────────────────────────────────────────────────────────────────
Total                       70     23339     1550       508    21281       1281
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop 69,864
Estimated Schedule Effort 11.811066 months
Estimated People Required 5.038645
───────────────────────────────────────────────────────────────────────────────
Processed 768517 bytes, 0.769 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────
  

@lucasavila00 lucasavila00 mentioned this pull request Apr 28, 2024
f32::NEG_INFINITY
} else {
(0..u).map(move |j| {
if j + t + self.sliding_window.unwrap_or(tgt_len + 1) > i + u {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the sliding window part is right

f32::NEG_INFINITY
} else {
(0..u).map(move |j| {
if j + t + self.sliding_window > i + u {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the sliding window part is right

Comment on lines +370 to +381
pub struct InputMetadata {
pub input: Tensor,
pub positions: Vec<usize>,
pub positions_kernel: Tensor, // [bs, seq len]
pub context_lens: Vec<usize>,
}

fn calculate_inputs_prompt_batched(
seq: &mut Sequence,
device: &Device,
chunk_size: usize,
) -> Result<Vec<InputMetadata>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth it making InputMetadata public? Or should this function return ModelInputs?

Comment on lines +222 to +224
fn get_prefill_chunk_size(&self) -> usize {
512
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable?

This value should be as large as possible for performance, under the constraint that it doesn't OOM the system.

@@ -460,16 +460,12 @@ impl Model {
tgt_len: usize,
seqlen_offset: usize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function already passed the seqlen_offset. I think it was already being correctly calculated for the current use-case, but I'm not sure

@@ -561,16 +561,12 @@ impl XLoraModel {
tgt_len: usize,
seqlen_offset: usize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function already passed the seqlen_offset. I think it was already being correctly calculated for the current use-case, but I'm not sure

@lucasavila00
Copy link
Contributor Author

@EricLBuehler I'd appreciate help on testing this.

It changed a lot of models, some of them I have never used.

I did test the quantized llama I usually run a lot though. Both on mistral-bench and talking to it with the interactive mode.

@EricLBuehler
Copy link
Owner

@lucasavila00, absolutely, I'll test the models out.

@lucasavila00 lucasavila00 changed the title Batched prefill Batched & chunked prefill Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batched & chunked prefill
2 participants