Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[embedded] Add a mechanism to opt-in for using String, and for using Unicode data tables #73682

Closed
wants to merge 1 commit into from

Conversation

kubamracek
Copy link
Contributor

Building on top the previous parts of porting String into Embedded Swift (#70446, #73585), this PR adds a mechanism to select whether an Embedded Swift program wants to use (dynamic) Strings or not, and whether it wants to use Unicode data tables that are needed for operations like comparing strings for equality, or using strings as keys in a dictionary.

Background + motivation:

  • Today, we only have StaticString in Embedded Swift. Some meaningful programs can be written without Swift.String, but it's a big limitation -- it's desirable to be able to produce, concatenate, interpolate, print strings, etc.
  • While Swift.String, which is dynamic and Unicode correct, might not be the best solution everywhere, it would fit the needs for some set of Embedded Swift users, who don't mind the dynamism (heap usage) and the codesize/data cost as long as the programs still fits into the Flash/RAM memory of their embedded device.
  • This means we can't view Swift.String as a "one size fits all", and some users will be willing to accept the cost, others will want to stay away from Swift.String.
  • The codesize/data cost is roughly the following (this isn't super rigorous data, but just a summary of some trial observations):
Extra cost
Level 0: No String, only StaticString 0 kB
Level 1: "Basic" String usage, print, interpolation, append ~30 kB (codesize)
Level 2: "Almost full" String usage, comparisons, hashing ~30 kB (codesize) + ~35 kB (NFC/NFD data tables)
Level 3: "Pathological" String usage (querying character properties) ~30 kB (codesize) + ~635 kB (all Unicode data tables)

Approach in this PR

  • The idea is to make sure users are aware of the cost of String, and that they can use compilation flags to choose which "level" they want their program to be. While compilation flags might not be the best solution (perhaps some "import ..." statements could be used instead?) I would like to consider going forward with the compiler flags as a starting point.
  • This PR implements an availability-based solution, where by default we're at level 0, and the compiler warns about using String and asks the user to opt-in by passing -enable-strings:
let s = "abc" // (!) ERROR: String has large associated codesize cost; only available in embedded Swift by explicitly passing -enable-strings
  • It's not clear to me whether that's the best default behavior, but given the experimental status of Embedded Swift, it should be a good starting point and we can change the default behavior later based on user feedback.
  • Similarly, even in level 1, the compiler will flag attempts to rely on String APIs that need the NFC/NFD data tables, for example comparing two strings:
let eq = string1 == string2 // (!) ERROR: This operation on String needs Unicode data tables; pass -enable-string=full-unicode-data-tables to acknowledge the associated codesize cost
  • Passing -enable-string=full-unicode-data-tables gets us into level 2.
  • Level 3 is about asking Characters for their name ("LATIN SMALL LETTER A") and other properties, and I consider it to be rare and "pathological" enough (apologies for using this word) that we don't need a separate flag for it, but it would be easy to extend the scheme to this level too. The program just needs to stay away from using Unicode.Scalar.Properties.

@@ -894,6 +894,13 @@ def no_allocations : Flag<["-"], "no-allocations">,
Flags<[FrontendOption, HelpHidden]>,
HelpText<"Diagnose any code that needs to heap allocate (classes, closures, etc.)">;

def enable_strings : Flag<["-"], "enable-strings">,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -enable-embedded-strings work? It's not a big deal, but -enable-strings seems a bit overly generic to me at first glance, unless we can think of other scenarios where String would be restricted outside of -embedded mode (and we could always introduce a more generic option later).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want flags unique to embedded if we can avoid it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings are implicitly available by default, so unless you are in embedded mode this flag doesn't make sense. In this phrasing, how can it not be specific to embedded? In order to make it generic, I think it would have to be -disable-strings, which could theoretically work in any configuration (but it's unclear why you'd use it outside of embedded).

extension Character: Equatable {
@inlinable @inline(__always)
@_effects(readonly)
@_needsUnicodeDataTablesInEmbedded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for this to be intuitive for library authors, I think we should probably aim to support lexical scoping for this attribute so that you don't have to repeat it like this

@phausler
Copy link
Member

I have some serious concerns about the split here; having four levels of configuration is going to have some severe impact to libraries/shared-code above this split. It feels like we are kinda punting a design problem out to our users. Perhaps we should take some time and review this a bit more one-on-one to figure out how we can tackle it a bit more ergonomically and perhaps isolate some areas that need improvement by other teams.

@kubamracek
Copy link
Contributor Author

👍

@kubamracek kubamracek added the embedded Embedded Swift label May 26, 2024
@kubamracek
Copy link
Contributor Author

I'm going to shelve/abandon this approach, and avoid introducing any splits or levels, at least for now.

But I'm going to proceed with porting Swift.String to Embedded Swift (#70446) as that's currently the largest missing feature compared to full Swift. I'll also start some more discussions about strings and Swift.String, and also about "opt-in" and "opt-out" or other possibilities to enforce usage/non-usage of strings in embedded contexts.

@kubamracek kubamracek closed this May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embedded Embedded Swift
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants