Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not portable #17

Open
yamiteru opened this issue Feb 13, 2022 · 5 comments
Open

Not portable #17

yamiteru opened this issue Feb 13, 2022 · 5 comments

Comments

@yamiteru
Copy link

The size of this format is not very portable or small. It could be done much much smaller.

@acoreyj
Copy link

acoreyj commented Jan 23, 2023

It's portable in that when you have data defined this way it can be used by a variety of consumers in a way that markdown and html can't be easily, so not just web pages but native apps, backends, etc.

@yamiteru
Copy link
Author

I understand that but it could be made smaller meaning more efficient.

@kmelve
Copy link
Member

kmelve commented Mar 6, 2023

Interesting feedback. As stated, portability has much more to do with the application and integrations.

Do you have use cases where the data size would be an issue or thoughts about how you would make it smaller?

@yamiteru
Copy link
Author

yamiteru commented Mar 7, 2023

I see people don't like my feedback or find it funny. So let me clarify.

This is how the format looks right now:

{
  "style": "normal",
  "_type": "block",
  "children": [
    {
      "_type": "span",
      "marks": ["a-key", "emphasis"],
      "text": "some text"
    }
  ],
  "markDefs": [
    {
      "_key": "a-key",
      "_type": "markType",
      "extraData": "some data"
    }
  ]
}

You can see there's quite a lot of duplicate keys. It looks like every _type has a different set of keys and data.

So instead of objects we could use tuples (arrays) like this:

["block", "normal", [
  ["span", ["a-key", "emphasis"], "some text"],
], [
  ["markType", "a-key", "some data"]
]]

This would be possible only if my assumption is correct. The assumption is that each _type has a fixed structure. Something like this:

type Block = [type: "block", style: string, children: Type[], markDefs: Def[]];

Also if types can be known ahead of time we could use enums for them. The same applies to data. This is how it'd look:

enum Type {
  block,
  span,
  mark
}

enum BlockStyle {
  normal
}

[Type.block, Block.normal, [
  [Type.span, ["a-key", "emphasis"], "some text"],
], [
  [Type.mark, "a-key", "some data"]
]]

Which would output this:

[0, 0, [
  [1, ["a-key", "emphasis"], "some text"],
], [
  [2, "a-key", "some data"]
]]

Now if it doesn't have to be human-readable (which thanks to enums is not very human-readable already) we could leverage the fact that types always have the same structure and encode/decode them to binary.

How to encode:

  • enums -> uint8 -> 1 byte
  • tuples -> since they have a predefined length they don't take any space -> 0 bytes
  • strings -> string.length + 1 uint8s -> (string.length + 1) * 1 byte
  • arrays -> array.length + 1 uint8s -> (array.length + 1) * 1 byte

Size comparison:

  • Current solution -> ~290 bytes
  • Proposed solution -> 46 bytes

Difference is ~80%.

Additionally in most of the cases encoding/decoding binary is much faster then doing JSON.parse/stringify. So it'd not only save size but also speed.

@CanRau
Copy link

CanRau commented Nov 1, 2023

I think the negative emojis happened because people were lacking context, your last comment makes it much more clear what exactly you're talking about 🤗

Personally I like the idea of tuples, possibly enums and maybe even binary, although I've currently no idea what this would entail etc I think it's definitely interesting to think about those things to lower bandwidth, storage and be a little addition in making things more eco-friendly 🌱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants