Move `exceptions.py` to `utils/exceptions.py` #6296

mariosasko · 2023-10-11T18:28:00Z

I didn't notice the path while reviewing the PR yesterday :(

github-actions · 2023-10-11T18:35:55Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006695 / 0.011353 (-0.004658)	0.004321 / 0.011008 (-0.006687)	0.084558 / 0.038508 (0.046050)	0.076290 / 0.023109 (0.053181)	0.312331 / 0.275898 (0.036433)	0.349854 / 0.323480 (0.026374)	0.004267 / 0.007986 (-0.003719)	0.003595 / 0.004328 (-0.000733)	0.065077 / 0.004250 (0.060826)	0.057461 / 0.037052 (0.020409)	0.314989 / 0.258489 (0.056500)	0.364767 / 0.293841 (0.070926)	0.031726 / 0.128546 (-0.096820)	0.008674 / 0.075646 (-0.066972)	0.288282 / 0.419271 (-0.130990)	0.052845 / 0.043533 (0.009312)	0.317501 / 0.255139 (0.062362)	0.333241 / 0.283200 (0.050041)	0.026412 / 0.141683 (-0.115271)	1.475648 / 1.452155 (0.023493)	1.551656 / 1.492716 (0.058939)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.276512 / 0.018006 (0.258506)	0.576350 / 0.000490 (0.575861)	0.009518 / 0.000200 (0.009318)	0.000280 / 0.000054 (0.000226)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.029332 / 0.037411 (-0.008079)	0.082904 / 0.014526 (0.068379)	0.102516 / 0.176557 (-0.074041)	0.159355 / 0.737135 (-0.577780)	0.104112 / 0.296338 (-0.192226)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.379144 / 0.215209 (0.163935)	3.785283 / 2.077655 (1.707629)	1.833753 / 1.504120 (0.329633)	1.667906 / 1.541195 (0.126711)	1.751551 / 1.468490 (0.283061)	0.480998 / 4.584777 (-4.103779)	3.533433 / 3.745712 (-0.212279)	3.343363 / 5.269862 (-1.926498)	2.094169 / 4.565676 (-2.471508)	0.056613 / 0.424275 (-0.367662)	0.007410 / 0.007607 (-0.000197)	0.455077 / 0.226044 (0.229033)	4.541380 / 2.268929 (2.272452)	2.269151 / 55.444624 (-53.175473)	1.955663 / 6.876477 (-4.920814)	2.227663 / 2.142072 (0.085591)	0.580597 / 4.805227 (-4.224630)	0.135034 / 6.500664 (-6.365630)	0.062091 / 0.075469 (-0.013378)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.276295 / 1.841788 (-0.565492)	20.072827 / 8.074308 (11.998519)	14.296462 / 10.191392 (4.105070)	0.164936 / 0.680424 (-0.515488)	0.018415 / 0.534201 (-0.515786)	0.390894 / 0.579283 (-0.188389)	0.415515 / 0.434364 (-0.018849)	0.462798 / 0.540337 (-0.077540)	0.650099 / 1.386936 (-0.736837)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.007218 / 0.011353 (-0.004135)	0.004246 / 0.011008 (-0.006763)	0.065818 / 0.038508 (0.027310)	0.087315 / 0.023109 (0.064206)	0.406449 / 0.275898 (0.130551)	0.442008 / 0.323480 (0.118528)	0.005752 / 0.007986 (-0.002233)	0.003624 / 0.004328 (-0.000704)	0.065349 / 0.004250 (0.061099)	0.062423 / 0.037052 (0.025371)	0.410099 / 0.258489 (0.151610)	0.448929 / 0.293841 (0.155088)	0.032498 / 0.128546 (-0.096048)	0.008877 / 0.075646 (-0.066770)	0.071611 / 0.419271 (-0.347661)	0.048038 / 0.043533 (0.004506)	0.407957 / 0.255139 (0.152818)	0.424045 / 0.283200 (0.140846)	0.025222 / 0.141683 (-0.116461)	1.496191 / 1.452155 (0.044037)	1.580765 / 1.492716 (0.088048)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.274798 / 0.018006 (0.256792)	0.581410 / 0.000490 (0.580920)	0.007302 / 0.000200 (0.007102)	0.000160 / 0.000054 (0.000106)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.034068 / 0.037411 (-0.003343)	0.096116 / 0.014526 (0.081590)	0.110234 / 0.176557 (-0.066323)	0.163246 / 0.737135 (-0.573889)	0.110250 / 0.296338 (-0.186089)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.442381 / 0.215209 (0.227172)	4.427061 / 2.077655 (2.349406)	2.361013 / 1.504120 (0.856893)	2.185048 / 1.541195 (0.643853)	2.312544 / 1.468490 (0.844054)	0.498347 / 4.584777 (-4.086430)	3.640839 / 3.745712 (-0.104873)	3.353405 / 5.269862 (-1.916457)	2.082038 / 4.565676 (-2.483638)	0.058786 / 0.424275 (-0.365489)	0.007403 / 0.007607 (-0.000205)	0.517894 / 0.226044 (0.291850)	5.184257 / 2.268929 (2.915329)	2.838467 / 55.444624 (-52.606157)	2.511116 / 6.876477 (-4.365361)	2.757816 / 2.142072 (0.615743)	0.644050 / 4.805227 (-4.161177)	0.136446 / 6.500664 (-6.364218)	0.062219 / 0.075469 (-0.013250)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.350916 / 1.841788 (-0.490872)	20.549280 / 8.074308 (12.474972)	14.697569 / 10.191392 (4.506177)	0.149818 / 0.680424 (-0.530606)	0.020187 / 0.534201 (-0.514014)	0.396008 / 0.579283 (-0.183275)	0.427535 / 0.434364 (-0.006829)	0.484544 / 0.540337 (-0.055794)	0.687076 / 1.386936 (-0.699860)

albertvillanova

I put the exceptions module at the root of the project in purpose because it is part of the public API.

Note that having the exceptions/errors module at the root is a common pattern followed in many open-source libraries, like numpy, pandas, pyarrow, requests...

mariosasko · 2023-10-12T12:38:06Z

I'd rather be consistent with huggingface_hub and have this module in utils/ with the exceptions exposed in utils/__init__.py ...

albertvillanova

Maybe we could ask huggingface_hub to align with the rest of open-source libraries and expose the errors/exceptions at the root of the library...

In [11]: requests.ConnectionError
Out[11]: requests.exceptions.ConnectionError

In [12]: pandas.errors.ClosedFileError
Out[12]: pandas.errors.ClosedFileError

In [13]: numpy.AxisError  # defined in numpy/exceptions.py
Out[13]: numpy.AxisError

In [14]: pyarrow.ArrowKeyError  # defined in pyarrow/error.pxi
Out[14]: pyarrow.lib.ArrowKeyError

mariosasko · 2023-10-16T14:57:50Z

Ok, I'll close this PR.

Maybe we could ask huggingface_hub to align with the rest of open-source libraries and expose the errors/exceptions at the root of the library...

cc @Wauplin

It would be nice to have an HF style guide to ensure consistency across our libraries 🙂.

Wauplin · 2023-10-17T13:25:32Z

I can expose exceptions at root level yes.

About having guidelines and consistency, let's try to do our best but it's not really in the essence of HF to formalize stuff in libraries 😒

Move exceptions to utils/exceptions.py

02a0d7c

mariosasko requested a review from albertvillanova October 11, 2023 18:28

mariosasko changed the title ~~Move exceptions to utils/exceptions.py~~ Move exceptions.py to utils/exceptions.py Oct 11, 2023

albertvillanova reviewed Oct 12, 2023

View reviewed changes

Wauplin mentioned this pull request Feb 29, 2024

Define all errors in ./src/huggingface_hub/errors.py huggingface/huggingface_hub#2069

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move `exceptions.py` to `utils/exceptions.py` #6296

Move `exceptions.py` to `utils/exceptions.py` #6296

mariosasko commented Oct 11, 2023

github-actions bot commented Oct 11, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

albertvillanova left a comment

mariosasko commented Oct 12, 2023

albertvillanova left a comment •

edited

mariosasko commented Oct 16, 2023

Wauplin commented Oct 17, 2023

Move exceptions.py to utils/exceptions.py #6296

Are you sure you want to change the base?

Move exceptions.py to utils/exceptions.py #6296

Conversation

mariosasko commented Oct 11, 2023

github-actions bot commented Oct 11, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

albertvillanova left a comment

Choose a reason for hiding this comment

mariosasko commented Oct 12, 2023

albertvillanova left a comment • edited

Choose a reason for hiding this comment

mariosasko commented Oct 16, 2023

Wauplin commented Oct 17, 2023

Move `exceptions.py` to `utils/exceptions.py` #6296

Move `exceptions.py` to `utils/exceptions.py` #6296

albertvillanova left a comment •

edited