You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, thanks for the great library! We have been using it a lot in torchtune and it's been a huge help for us.
Regarding the bug: the same call to load_dataset errors with ExpectedMoreSplits in 2.19.0 after working fine in 2.18.0. Full details given in the repro below.
Steps to reproduce the bug
On 2.18.0, things work fine:
# First clear the locally cached dataset
rm -r ~/.cache/huggingface/datasets/lvwerra___stack-exchange-paired
pip install "datasets==2.18.0"
python3
>>> from datasets import load_dataset
>>> dataset = load_dataset('lvwerra/stack-exchange-paired', split='train', data_dir='data/rl')
On 2.19.0, they do not:
# First clear the locally cached dataset
rm -r ~/.cache/huggingface/datasets/lvwerra___stack-exchange-paired
pip install "datasets==2.19.0"
python3
>>> from datasets import load_dataset
>>> dataset = load_dataset('lvwerra/stack-exchange-paired', split='train', data_dir='data/rl')
The stack trace I see from the 2.19.0 version of load_dataset can be seen here.
(Maybe unsurprising but) notably if I do not delete the cache first I am able to load the dataset successfully. So based on this I suspect the cause is somewhere in the download logic.
Describe the bug
Hi there, thanks for the great library! We have been using it a lot in torchtune and it's been a huge help for us.
Regarding the bug: the same call to
load_dataset
errors withExpectedMoreSplits
in 2.19.0 after working fine in 2.18.0. Full details given in the repro below.Steps to reproduce the bug
On 2.18.0, things work fine:
On 2.19.0, they do not:
The stack trace I see from the 2.19.0 version of load_dataset can be seen here.
(Maybe unsurprising but) notably if I do not delete the cache first I am able to load the dataset successfully. So based on this I suspect the cause is somewhere in the download logic.
Expected behavior
Download the dataset successfully :)
Environment info
datasets
version: 2.19.0huggingface_hub
version: 0.22.2fsspec
version: 2024.3.1The text was updated successfully, but these errors were encountered: