Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbbench-std: Task Output Seems Correct But MD5 Mismatches #108

Open
wchen-github opened this issue Jan 24, 2024 · 1 comment
Open

dbbench-std: Task Output Seems Correct But MD5 Mismatches #108

wchen-github opened this issue Jan 24, 2024 · 1 comment
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@wchen-github
Copy link

I looked into one particular DbBench task. GPT4 seems to have give the right answer but MD5 doesn't match.

Steps to reproduce the behavior:

  1. Run a task with line [Bug/Assistance] DBBench Unknown database #106 of dbbench/standard.jsonl:
    {"description": "The film titled 'New Movie' will be added to the Filmography table with the lead actor role and a note of '-' for the year 2019.", "label": ["INSERT INTO Filmography (Year, Title, Role, Notes) VALUES ('2019', 'New Movie', 'Lead Actor', '-')"], "create": {"database": "fetaqa", "init": "fetaqa_init.sql"}, "table": {"table_name": "Filmography", "table_info": {"columns": [{"name": "Year", "type": "INT"}, {"name": "Title", "type": "TEXT"}, {"name": "Role", "type": "TEXT"}, {"name": "Notes", "type": "TEXT"}], "rows": [["1985", "Back to the Future", "Jennifer Parker", "-"], ["2008", "Still Waters Burn", "Laura Harper", "-"], ["2011", "Alien Armageddon", "Eileen Daly", "-"], ["2013", "You Are Not Alone", "Cristina's Mom", "Short film"], ["2013", "Max", "Mom", "Short film"], ["2014", "Starship: Rising", "Captain Savage", "-"], ["2015", "EP/Executive Protection", "Pam Travis", "-"], ["2015", "Back in Time", "Herself", "Back to the Future documentary"], ["2015", "Back to the 2015 Future", "Jennifer Parker", "Short film"], ["2017", "Vitals", "Margaret Parks", "-"], ["2018", "Groove Street", "Julie", "-"], ["1999", "The Matrix", "Trinity", "-"], ["2005", "Batman Begins", "Rachel Dawes", "-"], ["2010", "Inception", "Mal", "-"], ["2012", "The Avengers", "Black Widow/Natasha Romanoff", "-"], ["2014", "Interstellar", "Brand", "-"], ["2016", "La La Land", "Mia Dolan", "-"], ["2017", "Wonder Woman", "Wonder Woman/Diana Prince", "-"], ["2019", "Avengers: Endgame", "Black Widow/Natasha Romanoff", "-"], ["2021", "The Suicide Squad", "Harley Quinn", "-"], ["2022", "Black Panther: Wakanda Forever", "Okoye", "-"]]}}, "evaluation": "", "example": "", "type": ["INSERT"], "heads": ["Year", "Title", "Role", "Notes"], "add_description": "The name of this table is Filmography, and the headers of this table are Year,Title,Role,Notes.", "source": "fetaqa", "answer_md5": "[('ae2213ddbcb907c43fd757035b363328',)]"}

  2. Get the output SQL command and MD5 from the output/runs.jsonl file:

image

  1. Print out the modified table in dbbench.interaction.execute:

image

  1. Get the MD5 from the dataset and compared the one in the output:

image

  • OS: Ubuntu 22.04
  • Python: 3.9

This is only one example I collected. There are many errors of similar kind. Can you help me identify the issues I am facing, please?

@wchen-github wchen-github added bug Something isn't working help wanted Extra attention is needed labels Jan 24, 2024
@zhc7
Copy link
Collaborator

zhc7 commented Jan 25, 2024

Hi, @wchen-github . The answer md5 is calculated based on the label field in the data entry. As you can see, the correct answer is assumed to be INSERT INTO Filmography (Year, Title, Role, Notes) VALUES ('2019', 'New Movie', 'Lead Actor', '-'). Capitalized Lead Actor is probably causing the difference in hash. We'll try to do better in data filtering and validation. There shouldn't be many similar exceptions. Thank you for your report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants