Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent and Incorrect conversion of text "NA" to unknown in the File widget #6808

Open
Newbrie opened this issue May 18, 2024 · 4 comments
Assignees
Labels
bug A bug confirmed by the core team

Comments

@Newbrie
Copy link

Newbrie commented May 18, 2024

What's wrong?

On importing standard CSV data file with a category column , it sometimes converts the category text "NA" to "unknown" ie "?" , but not always.

My current workaround is to not use NA but rename it NX and it works fine.

How can we reproduce the problem?

Filebug.ows.zip

Test2.csv

Instructions:
1 - check the contents of the test2.csv to seee the innocuous use of "NA" as a category value in the "PD" column.

2 - Now open the filebug.ows in Orange3, open the File widget to upload the Test2.csv data.

3 - Open the Data Table to see how the data has been uploaded, scroll down to where you expect to see the "NA" text value and notice that precisely the rows which use the "NA" value have been modified and the "NA" replaced with "?"

The behaviour is inconsistent because if you create a smaller table using the category values "NA", the import works fine.

What's your environment?

  • Operating system:
  • Orange version:
  • How you installed Orange:
    I am using Orange 3.36 on MAC OS Sanoma V 14.4.1 installed with the Orange DMG
@Newbrie Newbrie added the bug report Bug is reported by user, not yet confirmed by the core team label May 18, 2024
@processo
Copy link

I can confirm this.

"File" and "CSV File Import" both do this. Neither cares whether NA is put in quotation marks. The only difference is "File" still converts to ? even if "text" type is chosen, "CSV File Import" does not.

@markotoplak markotoplak added bug A bug confirmed by the core team and removed bug report Bug is reported by user, not yet confirmed by the core team labels May 21, 2024
@Newbrie
Copy link
Author

Newbrie commented May 26, 2024

Thanks - hadn't noticed that. so I could CSV File Import and choose text as a workaround.

@processo
Copy link

@Newbrie Yes, if you put an Edit Domain after import you can even convert it into categorical.

@janezd janezd self-assigned this May 31, 2024
@janezd
Copy link
Contributor

janezd commented Jun 6, 2024

To sum up:

  • If data is smaller, Orange's auto-detection recognizes PD as text attribute. We can tweak these rules, but it's guesswork, so it will never be correct.
  • Categorical and text values use a different set of symbols for missing values. I don't think we can do anything here: we cannot prohibit value "NA" in text, and convert it to missing.
  • The File widget uses rules text variables even if the user manually changes the type to categorical. I suppose this happens because the type is converted after the data is read.

I suppose the latter is why @markotoplak marked this as a bug (and I agree).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug confirmed by the core team
Projects
None yet
Development

No branches or pull requests

4 participants