Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to language id 0 with FormatMessageA if the error messages are not available in system locale #3260

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cppdev123
Copy link

@cppdev123 cppdev123 commented Dec 3, 2022

After this commit <system_error>: explicitly pass the system locale ID for system_categ… std::system_category used the locale system language with FormatMessageA to get the error message which fixed a problem and introduced another one. As reported here: std::system_category returns "unknown error" the error messages may not be available in the system locale, but they are available in en-US locale shipped with windows so FormatMessageA failed with ERROR_MUI_FILE_NOT_FOUND and std::system_category returned hard coded "unknown error". It was working befor because if the language id is 0 it will eventually fall back to en-US locale and return an error message in that locale.

I reproduced the problem on following windows installations:

  • Windows 10 and Windows 11 with locale set to ar-EG and the Arabic langauge pack installed but FormatMessageA failed with ERROR_MUI_FILE_NOT_FOUND. Seems that there are no error messages in the Arabic mui because windows apps, explorer and login screen can show properly in arabic and right to left layout
  • Windows 11 installed with locale en-GB set during installation by mistake then I removed the English UK keyboard and added English US and left the system locale set to en-GB. I was getting "unknown error" until I changed the locale back to "en-US"

This should fix #3254

@cppdev123 cppdev123 requested a review from a team as a code owner December 3, 2022 17:10
@StephanTLavavej StephanTLavavej added the bug Something isn't working label Dec 4, 2022
@StephanTLavavej StephanTLavavej added this to Initial Review in Code Reviews via automation Dec 4, 2022
@fsb4000
Copy link
Contributor

fsb4000 commented Dec 4, 2022

With your fix, the issue #2451 is back again:
(I changed my system language to English(United Kingdom) and I don't have the MUI pack and my user language is Russian, and the same issue happened: "For example, If your user language is Chinese but the system language is English, it can only returns “????”, because you can’t encode a Chinese string in ISO-8859-1." , only Russian instead of Chinese)
изображение

#include <clocale>
#include <windows.h>
#include <system_error>
#include <stdio.h>

int main()
{
    setlocale(LC_ALL, "");
    std::error_code ec(ERROR_DIR_NOT_EMPTY, std::system_category());
    auto s = ec.message();
    printf("message = '%s' (%lld)\n", s.c_str(), static_cast<long long>(_MSC_FULL_VER));
    for (auto c: s) {
        printf("%c = %d\n", c, (int)c);
    }
}
C:\Dev\STL\playground\test>main
message = '????? ?? ?????.' (193532019)
? = 63
? = 63
? = 63
? = 63
? = 63
  = 32
? = 63
? = 63
  = 32
? = 63
? = 63
? = 63
? = 63
? = 63
. = 46

изображение

Comment on lines +61 to +63
_Chars = FormatMessageA(
FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, nullptr,
_Message_id, 0, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_Chars = FormatMessageA(
FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, nullptr,
_Message_id, 0, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
const DWORD _US_lang_id = MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US);
_Chars = FormatMessageA(
FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, nullptr,
_Message_id, _US_lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);

Could you test this? Does it work for you?
It works for me...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails if the US english locale is not installed (which it isn't on localized windows installs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test it? What windows versions?
I believe this should work because:

  1. I tested it on a localized windows install(Russian)
  2. Look at https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessagea
 If you pass in zero, FormatMessage looks for a message for LANGIDs in the following order:

1. Language neutral
2. Thread LANGID, based on the thread's locale value
3. User default LANGID, based on the user's default locale value
4. System default LANGID, based on the system default locale value
5. US English

US English is the final fallback. This is why I think it should work.
Of course I may be wrong and more real tests are needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My program used to hardcode US english as well, but I had to change that because I would get failures on some systems (I tried a brand-new Japanese install of the OS and saw the issue, IIRC): TranslucentTB/TranslucentTB@2d1cd95

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test to avoid regression?

@sylveon
Copy link
Contributor

sylveon commented Dec 4, 2022

Here's an idea: why not use only US English? There is certainly a precedent for this, for example all STL-thrown exceptions have a English message in std::exception::what, and std::generic_category only returns English messages as well.

My comment earlier mentions it failing without the US English but this isn't quite true. After more investigation, MUI (multilanguage) installs should always have US English available, but non-MUI installs (e.g. Windows 10 Single Language) do not have US English, in which case asking for the US English error message actually returns the localized message without error (I misremembered). But this won't be an issue, as the only valid codepage would be the locale matching said message anyways.

Alternatively, if this is not a valid suggestion, we should go with @fsb4000's suggestion of always falling back to US English instead of lang ID 0, to avoid regressing #2451.

@sylveon
Copy link
Contributor

sylveon commented Dec 4, 2022

Whichever solution we pick, the fix needs to be checked on a variety of configs (single language installs, MUI installs with only non-English language available, mixed user/system locale (both ways), etc.)

@cppdev123
Copy link
Author

@fsb4000 I am aware that this will break for users who does not have the messages in the system locale language and have display language different from system language, they are already getting "unknown error" because the current behaviour only uses system locale.

By the way the suggestion to try en-US if system locale fails seems good to go with

@strega-nil-ms strega-nil-ms added the decision needed We need to choose something before working on this label Dec 5, 2022
@cppdev123
Copy link
Author

Other options I thought of:

  • use FormatMessageW and convert the wide string to narrow using __std_fs_convert_wide_to_narrow which uses the current application code page instead of system's one and can be set using setlocale to utf-8 for example and it is the application responsibility to use the proper code page
  • check the active system code page GetACP and if it is utf-8 use language id of 0 since all utf-16 can be represented in utf-8 and otherwise use the system locale and if failed use the English locale

@strega-nil-ms

This comment was marked as outdated.

@Dave-Lowndes
Copy link

I appreciate it's probably not part of the standard, but would providing an alternate wide string version alleviate the problems of non-English (US) texts?
I'd expect that most Windows applications would be using wide-strings normally, so having a naturally wide-string version of this would probably work better - I know it would for my usage.

@sylveon
Copy link
Contributor

sylveon commented Jun 7, 2023

If you're a Windows application, you can easily do this out of the standard (pseudo code):

std::wstring error_message(const std::error_code &err)
{
	if (err.category() == std::system_category())
	{
		return FormatMessage(err.value()));
	}
	else if (err.category() == std::generic_category())
	{
		return _wcserror_s(err.value());
	}
	else
	{
		return L"Unknown error";
	}
}

@Dave-Lowndes
Copy link

I already have my own implementation, but everything I don't have to write myself is usually a better solution (though not in this case until this gets resolved) :)

@StephanTLavavej StephanTLavavej removed the decision needed We need to choose something before working on this label Jan 24, 2024
@StephanTLavavej StephanTLavavej moved this from Initial Review to Work In Progress in Code Reviews Jan 24, 2024
@frederick-vs-ja
Copy link
Contributor

I tried to implement the suggestion from @davemcincork in #3254 (comment).

diff --git a/stl/src/syserror_import_lib.cpp b/stl/src/syserror_import_lib.cpp
index e2f2d2b7..a70424ba 100644
--- a/stl/src/syserror_import_lib.cpp
+++ b/stl/src/syserror_import_lib.cpp
@@ -46,11 +46,48 @@ extern "C" {
     if (_Ret == 0) {
         _Lang_id = 0;
     }
-    const unsigned long _Chars =
+    const unsigned long _Chars_primary_path =
         FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
             nullptr, _Message_id, _Lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
 
-    return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars);
+    const WORD _Lang_id_primary = PRIMARYLANGID(_Lang_id);
+    if (_Chars_primary_path != 0 || _Lang_id_primary == LANG_NEUTRAL) {
+        return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_primary_path);
+    }
+
+    const WORD _Lang_id_sub = SUBLANGID(_Lang_id);
+    if (_Lang_id_sub != SUBLANG_NEUTRAL && _Lang_id_sub != SUBLANG_DEFAULT) {
+        const DWORD _Adjusted_lang_id = MAKELANGID(_Lang_id_primary, SUBLANG_DEFAULT);
+        const unsigned long _Chars_sub_default_path =
+            FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+                nullptr, _Message_id, _Adjusted_lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
+        if (_Chars_sub_default_path != 0) {
+            return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_sub_default_path);
+        }
+    }
+
+    if (_Lang_id_sub != SUBLANG_NEUTRAL) {
+        const DWORD _Adjusted_lang_id = MAKELANGID(_Lang_id_primary, SUBLANG_NEUTRAL);
+        const unsigned long _Chars_sub_neutral_path =
+            FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+                nullptr, _Message_id, _Adjusted_lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
+        if (_Chars_sub_neutral_path != 0) {
+            return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_sub_neutral_path);
+        }
+    }
+
+    constexpr DWORD _En_us_lang_id = MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US);
+    const unsigned long _Chars_en_us_path =
+        FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+            nullptr, _Message_id, _En_us_lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
+    if (_Chars_en_us_path != 0) {
+        return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_en_us_path);
+    }
+
+    const unsigned long _Chars_neutral_path =
+        FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+            nullptr, _Message_id, 0, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
+    return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_neutral_path);
 }
 
 void __CLRCALL_PURE_OR_STDCALL __std_system_error_deallocate_message(char* const _Str) noexcept {

Does this work? @cppdev123 @fsb4000 @sylveon

@davemcatcisco
Copy link

I tried to implement the suggestion from @davemcincork in #3254 (comment).

diff --git a/stl/src/syserror_import_lib.cpp b/stl/src/syserror_import_lib.cpp
index e2f2d2b7..a70424ba 100644
--- a/stl/src/syserror_import_lib.cpp
+++ b/stl/src/syserror_import_lib.cpp
@@ -46,11 +46,48 @@ extern "C" {
     if (_Ret == 0) {
         _Lang_id = 0;
     }
-    const unsigned long _Chars =
+    const unsigned long _Chars_primary_path =
         FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
             nullptr, _Message_id, _Lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
 
-    return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars);
+    const WORD _Lang_id_primary = PRIMARYLANGID(_Lang_id);
+    if (_Chars_primary_path != 0 || _Lang_id_primary == LANG_NEUTRAL) {
+        return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_primary_path);
+    }
+
+    const WORD _Lang_id_sub = SUBLANGID(_Lang_id);
+    if (_Lang_id_sub != SUBLANG_NEUTRAL && _Lang_id_sub != SUBLANG_DEFAULT) {
+        const DWORD _Adjusted_lang_id = MAKELANGID(_Lang_id_primary, SUBLANG_DEFAULT);
+        const unsigned long _Chars_sub_default_path =
+            FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+                nullptr, _Message_id, _Adjusted_lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
+        if (_Chars_sub_default_path != 0) {
+            return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_sub_default_path);
+        }
+    }
+
+    if (_Lang_id_sub != SUBLANG_NEUTRAL) {
+        const DWORD _Adjusted_lang_id = MAKELANGID(_Lang_id_primary, SUBLANG_NEUTRAL);
+        const unsigned long _Chars_sub_neutral_path =
+            FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+                nullptr, _Message_id, _Adjusted_lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
+        if (_Chars_sub_neutral_path != 0) {
+            return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_sub_neutral_path);
+        }
+    }
+
+    constexpr DWORD _En_us_lang_id = MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US);
+    const unsigned long _Chars_en_us_path =
+        FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+            nullptr, _Message_id, _En_us_lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
+    if (_Chars_en_us_path != 0) {
+        return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_en_us_path);
+    }
+
+    const unsigned long _Chars_neutral_path =
+        FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+            nullptr, _Message_id, 0, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);
+    return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars_neutral_path);
 }
 
 void __CLRCALL_PURE_OR_STDCALL __std_system_error_deallocate_message(char* const _Str) noexcept {

Does this work? @cppdev123 @fsb4000 @sylveon

I can't be certain of the following because you've just provided a diff which is a little difficult to visualise in context...

I think your proposal will leak. As in my suggestion, you're calling FormatMessageA with the FORMAT_MESSAGE_ALLOCATE_BUFFER which causes it to allocate the buffer. But unlike my suggestion, you don't appear to be calling LocalFree.

Just curious: why did you move away from the loop-based approach that I used?

@frederick-vs-ja
Copy link
Contributor

I think your proposal will leak. As in my suggestion, you're calling FormatMessageA with the FORMAT_MESSAGE_ALLOCATE_BUFFER which causes it to allocate the buffer. But unlike my suggestion, you don't appear to be calling LocalFree.

I think when FormatMessageA fails (i.e. returns 0), it will behave as-if the allocation is not done. Although this is somehow unclear in the documentation. If the statement doesn't hold, we can insert LocalFree calls later.

Just curious: why did you move away from the loop-based approach that I used?

I think it would be clearer to expand the loop to exactly see which lang-ids are tried.

@davemcatcisco
Copy link

I think your proposal will leak. As in my suggestion, you're calling FormatMessageA with the FORMAT_MESSAGE_ALLOCATE_BUFFER which causes it to allocate the buffer. But unlike my suggestion, you don't appear to be calling LocalFree.

I think when FormatMessageA fails (i.e. returns 0), it will behave as-if the allocation is not done. Although this is somehow unclear in the documentation. If the statement doesn't hold, we can insert LocalFree calls later.

I was referring to what happens in the case where the API call succeeds. Keep in mind that because you're just showing a diff here, it is difficult for me to see the full context of this change, so perhaps the allocation will be freed in the calling function.

Just curious: why did you move away from the loop-based approach that I used?

I think it would be clearer to expand the loop to exactly see which lang-ids are tried.

That's probably subjective so fair enough.

@jovibor
Copy link
Contributor

jovibor commented Apr 27, 2024

I was referring to what happens in the case where the API call succeeds.

I believe that freeing should be done by the call site with the __std_system_error_deallocate_message method, that is defined right after.

@davemcatcisco
Copy link

I was referring to what happens in the case where the API call succeeds.

I believe that freeing should be done by the call site with the __std_system_error_deallocate_message method, that is defined right after.

Yes, after I posted my last comment above, I looked at the entire source file so I could see your proposed changed in context, and I saw the __std_system_error_deallocate_message function. Thanks.

@b-spencer
Copy link

b-spencer commented May 7, 2024

It does seem like the "last chance" call to FormatMessageA() should use a LANGID of 0, and thus benefit from the built-in hunting algorithm of that function (which tries "en-US" as a last resort anyway, according to its documentation). The "Files Changed" changes that in this review appear to restore the behaviour from VS 17.2.x (i.e. before #2669) when the system language doesn't have strings installed, and that would be a great help, and couldn't come soon enough!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Work In Progress
Code Reviews
Work In Progress
Development

Successfully merging this pull request may close these issues.

Visual Studio 2022 std::system_category returns "unknown error" if system locale is not en-US