-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting mixed urdu and english text RTF to HTML causes messed up characters #9758
Comments
I assume you got the "unsupported code page" warning? This is the same issue as #9683. We can't really support all the legacy code pages; maybe there's a way to convert your document to unicode prior to passing it to pandoc? |
I got no warning when i convert the document. if there is no mixed text, the convert runs fine. |
OK, I jumped to conclusions. Actually it's ansicp1252, which we support, so the problem lies elsewhere... |
Hm, cp1252 just has latin characters: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT There's |
In any case this might be out of scope, if it requires large lookup tables corresponding to fonts (see discussion of the other linked issue). |
Thanks for looking up my problem here. I can give you a working example if this helps. I'am quite not so fimiliar with rtf
|
The one that works contains unicode escapes to back up the single-byte font characters; that's why it works. I think there might be programs that will unicodify an existing RTF document -- maybe Word can do this? You could look into it. |
Explain the problem.
Convert from RTF to HTML produces messed up Characters. I'am not sure if this is similar to this one #9683
I use this Commandline
pandoc.exe input.rtf --metadata title=" " -f rtf -t html -s -o output.html
RTF (Input)
HTML (Output)
Pandoc version?
Pandoc 3.2 on Windows Server 2016
The text was updated successfully, but these errors were encountered: