Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: transliteration option for natural sorting #1053

Merged
merged 4 commits into from
May 26, 2024

Conversation

clispios
Copy link
Contributor

This resolves #616.

To do this, I have a text file of transliteration mappings, which is then initialized into a static HashMap only once via OnceLock. If we ever need more transliterations added, we just add to the transliteration.map file. The actual transliteration is carried out via the Transliterator trait, which is currently implemented on byte arrays.

The user can set the sort_transliteration config variables to true if they would like to enable transliteration. They currently default to false. This will then apply transliteration to natural sorting only. We can extend transliteration to alphabetical sorting as well, but not sure if that is something you'd want.

This also addresses some issues with the sorting keymap commands (they were all forcing --dir-first rather than pulling the value from the users config). Additionally, the --reverse parameter would persist as an override over the users config setting. I.e - If I set reverse = true in my config, and then I ran the ,n command, it would set my reverse value to false for the rest of the session and disregard the value in my config.

Here are some example screenshots (I know I still need to have some fun and make my theme better...) -

Natural sorting (without transliteration):
Screenshot 2024-05-17 at 9 35 21 AM

Natural sorting (with transliteration):
Screenshot 2024-05-17 at 9 34 21 AM

Please let me know your thoughts. Thanks!

@sxyazi
Copy link
Owner

sxyazi commented May 17, 2024

Thank you for the PR! Here's my change request:

Since most characters are contiguous, representing them as Rust arrays could avoid parsing while providing lookup efficiency, as they are contiguous in memory:

const DICT = [
  "A",  // À 192
  "A",  // Á 193
  "A",  // Â 194
  // ...
  "h",  // ʯ 687
];

I think they can be represented by 3 arrays, À (U+00C0) ~ ʯ (U+02AF):

À 192
Á 193
 194
à 195
Ä 196
Å 197
Æ 198
Ç 199
È 200
É 201
Ê 202
Ë 203
Ì 204
Í 205
Î 206
Ï 207
Ð 208
Ñ 209
Ò 210
Ó 211
Ô 212
Õ 213
Ö 214
Ø 216
Ù 217
Ú 218
Û 219
Ü 220
Ý 221
Þ 222
ß 223
à 224
á 225
â 226
ã 227
ä 228
å 229
æ 230
ç 231
è 232
é 233
ê 234
ë 235
ì 236
í 237
î 238
ï 239
ð 240
ñ 241
ò 242
ó 243
ô 244
õ 245
ö 246
ø 248
ù 249
ú 250
û 251
ü 252
ý 253
þ 254
ÿ 255
Ā 256
ā 257
Ă 258
ă 259
Ą 260
ą 261
Ć 262
ć 263
Ĉ 264
ĉ 265
Ċ 266
ċ 267
Č 268
č 269
Ď 270
ď 271
Đ 272
đ 273
Ē 274
ē 275
Ĕ 276
ĕ 277
Ė 278
ė 279
Ę 280
ę 281
Ě 282
ě 283
Ĝ 284
ĝ 285
Ğ 286
ğ 287
Ġ 288
ġ 289
Ģ 290
ģ 291
Ĥ 292
ĥ 293
Ħ 294
ħ 295
Ĩ 296
ĩ 297
Ī 298
ī 299
Ĭ 300
ĭ 301
Į 302
į 303
İ 304
ı 305
IJ 306
ij 307
Ĵ 308
ĵ 309
Ķ 310
ķ 311
ĸ 312
Ĺ 313
ĺ 314
Ļ 315
ļ 316
Ľ 317
ľ 318
Ŀ 319
ŀ 320
Ł 321
ł 322
Ń 323
ń 324
Ņ 325
ņ 326
Ň 327
ň 328
ʼn 329
Ŋ 330
ŋ 331
Ō 332
ō 333
Ŏ 334
ŏ 335
Ő 336
ő 337
Π338
œ 339
Ŕ 340
ŕ 341
Ŗ 342
ŗ 343
Ř 344
ř 345
Ś 346
ś 347
Ŝ 348
ŝ 349
Ş 350
ş 351
Š 352
š 353
Ţ 354
ţ 355
Ť 356
ť 357
Ŧ 358
ŧ 359
Ũ 360
ũ 361
Ū 362
ū 363
Ŭ 364
ŭ 365
Ů 366
ů 367
Ű 368
ű 369
Ų 370
ų 371
Ŵ 372
ŵ 373
Ŷ 374
ŷ 375
Ÿ 376
Ź 377
ź 378
Ż 379
ż 380
Ž 381
ž 382
ſ 383
Ɓ 385
Ɔ 390
Ɗ 394
Ə 399
Ɛ 400
Ƒ 401
ƒ 402
Ƙ 408
ƙ 409
ƚ 410
ƛ 411
Ɲ 413
Ɵ 415
Ơ 416
ơ 417
Ư 431
ư 432
Ƴ 435
ƴ 436
Ƶ 437
ƶ 438
DŽ 452
Dž 453
dž 454
LJ 455
Lj 456
lj 457
NJ 458
Nj 459
nj 460
Ǎ 461
ǎ 462
Ǐ 463
ǐ 464
Ǒ 465
ǒ 466
Ǔ 467
ǔ 468
Ǖ 469
ǖ 470
Ǘ 471
ǘ 472
Ǚ 473
ǚ 474
Ǜ 475
ǜ 476
Ǧ 486
ǧ 487
Ǫ 490
ǫ 491
ǯ 495
ǰ 496
DZ 497
Dz 498
dz 499
Ǵ 500
ǵ 501
Ǹ 504
ǹ 505
Ǻ 506
ǻ 507
Ǽ 508
ǽ 509
Ǿ 510
ǿ 511
Ș 536
ș 537
Ț 538
ț 539
Ȳ 562
ȳ 563
ȷ 567
Ƚ 573
ɐ 592
ɑ 593
ɒ 594
ɓ 595
ɔ 596
ɕ 597
ɖ 598
ɗ 599
ɘ 600
ə 601
ɚ 602
ɛ 603
ɜ 604
ɝ 605
ɞ 606
ɟ 607
ɠ 608
ɡ 609
ɣ 611
ɥ 613
ɦ 614
ɧ 615
ɨ 616
ɩ 617
ɫ 619
ɬ 620
ɭ 621
ɮ 622
ɯ 623
ɰ 624
ɱ 625
ɲ 626
ɳ 627
ɵ 629
ɶ 630
ɸ 632
ɹ 633
ɺ 634
ɻ 635
ɼ 636
ɽ 637
ɾ 638
ɿ 639
ʁ 641
ʂ 642
ʃ 643
ʄ 644
ʅ 645
ʆ 646
ʇ 647
ʈ 648
ʉ 649
ʊ 650
ʋ 651
ʌ 652
ʍ 653
ʎ 654
ʐ 656
ʑ 657
ʒ 658
ʓ 659
ʗ 663
ʘ 664
ʚ 666
ʛ 667
ʝ 669
ʞ 670
ʠ 672
ʣ 675
ʤ 676
ʥ 677
ʦ 678
ʧ 679
ʨ 680
ʩ 681
ʪ 682
ʫ 683
ʬ 684
ʮ 686
ʯ 687

And, Ḅ (U+1E04 - 7684) ~ ỹ (U+1EF9 - 7929):

Ḅ 7684
ḅ 7685
Ḍ 7692
ḍ 7693
Ḏ 7694
ḏ 7695
Ḓ 7698
ḓ 7699
Ḡ 7712
ḡ 7713
Ḥ 7716
ḥ 7717
Ḫ 7722
ḫ 7723
Ḳ 7730
ḳ 7731
Ḵ 7732
ḵ 7733
Ḷ 7734
ḷ 7735
Ḹ 7736
ḹ 7737
Ḻ 7738
ḻ 7739
Ḽ 7740
ḽ 7741
Ḿ 7742
ḿ 7743
Ṁ 7744
ṁ 7745
Ṃ 7746
ṃ 7747
Ṅ 7748
ṅ 7749
Ṇ 7750
ṇ 7751
Ṉ 7752
ṉ 7753
Ṋ 7754
ṋ 7755
Ṙ 7768
ṙ 7769
Ṛ 7770
ṛ 7771
Ṝ 7772
ṝ 7773
Ṟ 7774
ṟ 7775
Ṡ 7776
ṡ 7777
Ṣ 7778
ṣ 7779
Ṭ 7788
ṭ 7789
Ṯ 7790
ṯ 7791
Ṱ 7792
ṱ 7793
Ẁ 7808
ẁ 7809
Ẃ 7810
ẃ 7811
Ẅ 7812
ẅ 7813
Ẏ 7822
ẏ 7823
Ẓ 7826
ẓ 7827
Ẕ 7828
ẕ 7829
ẖ 7830
ẗ 7831
ẞ 7838
Ạ 7840
ạ 7841
Ả 7842
ả 7843
Ấ 7844
ấ 7845
Ầ 7846
ầ 7847
Ẩ 7848
ẩ 7849
Ẫ 7850
ẫ 7851
Ậ 7852
ậ 7853
Ắ 7854
ắ 7855
Ằ 7856
ằ 7857
Ẳ 7858
ẳ 7859
Ẵ 7860
ẵ 7861
Ặ 7862
ặ 7863
Ẹ 7864
ẹ 7865
Ẻ 7866
ẻ 7867
Ẽ 7868
ẽ 7869
Ế 7870
ế 7871
Ề 7872
ề 7873
Ể 7874
ể 7875
Ễ 7876
ễ 7877
Ệ 7878
ệ 7879
Ỉ 7880
ỉ 7881
Ị 7882
ị 7883
Ọ 7884
ọ 7885
Ỏ 7886
ỏ 7887
Ố 7888
ố 7889
Ồ 7890
ồ 7891
Ổ 7892
ổ 7893
Ỗ 7894
ỗ 7895
Ộ 7896
ộ 7897
Ớ 7898
ớ 7899
Ờ 7900
ờ 7901
Ở 7902
ở 7903
Ỡ 7904
ỡ 7905
Ợ 7906
ợ 7907
Ụ 7908
ụ 7909
Ủ 7910
ủ 7911
Ứ 7912
ứ 7913
Ừ 7914
ừ 7915
Ử 7916
ử 7917
Ữ 7918
ữ 7919
Ự 7920
ự 7921
Ỳ 7922
ỳ 7923
Ỵ 7924
ỵ 7925
Ỷ 7926
ỷ 7927
Ỹ 7928
ỹ 7929

fi (U+FB01 - 64257) ~ fl (U+FB02 - 64258):

fi 64257
fl 64258

@clispios
Copy link
Contributor Author

Very good point! Should be an easy enough change on my end.

@sxyazi
Copy link
Owner

sxyazi commented May 22, 2024

Please change it!

@clispios
Copy link
Contributor Author

I'll be able to finish the change request in the next day or two. Have not had enough free time the last week. Thanks!

@clispios
Copy link
Contributor Author

Okay, change request is complete. Transliteration now uses 3 const arrays that return &'static str's.

@sxyazi
Copy link
Owner

sxyazi commented May 26, 2024

Hi, I've made some simplifications and performance optimizations to the code and reviewed the unsafe parts twice, and everything looks good, I think we can merge it now. Thank you so much for contributing this awesome feature!

@sxyazi sxyazi merged commit 53d658d into sxyazi:main May 26, 2024
6 checks passed
@sxyazi sxyazi changed the title feat: transliteration option for natural sorting feat!: transliteration option for natural sorting May 26, 2024
@clispios clispios deleted the feat/transliteration branch May 26, 2024 15:32
sxyazi added a commit that referenced this pull request May 26, 2024
Co-authored-by: sxyazi <sxyazi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort_by function not work very well with hungarian letters
2 participants