Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the difference of your bleu and sacrebleu #558

Open
cooper12121 opened this issue Mar 7, 2024 · 1 comment
Open

the difference of your bleu and sacrebleu #558

cooper12121 opened this issue Mar 7, 2024 · 1 comment

Comments

@cooper12121
Copy link

What is the difference between your package's bleu implementation and sacrebleu implementation? I calculated the result differently in the two ways, Chinese expected, passed sacrebleu's zh tokenizer

@shenxiangzhuang
Copy link

I believe there are some differences between the implementation and sacrebleu's. Actruly, testing with English has the same problem.

evaluate

import evaluate


predictions = ["hello there general kenobi", "foo bar foobar"]
references = [
    ["hello there general kenobi", "hello there !"],
    ["foo bar foobar"]
] 

bleu = evaluate.load("bleu")
results = bleu.compute(predictions=predictions, references=references, smooth=False, max_order=4)
print(results)

got results:

{'bleu': 1.0, 'precisions': [1.0, 1.0, 1.0, 1.0], 'brevity_penalty': 1.0, 'length_ratio': 1.1666666666666667, 'translation_length': 7, 'reference_length': 6}

sacrebleu

from sacrebleu.metrics import BLEU


predictions = ["hello there general kenobi", "foo bar foobar"]
references = [
                 ["hello there general kenobi", "hello there !"],
                 ["foo bar foobar"]
             ]

bleu = BLEU(smooth_method="none", max_ngram_order=4, tokenize='13a')
results = bleu.corpus_score(predictions, references)
print(results)

got results:

BLEU = 100.00 100.0/100.0/100.0/100.0 (BP = 1.000 ratio = 1.000 hyp_len = 4 ref_len = 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants