Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new builtin: @typeId #19858

Open
ikskuh opened this issue May 4, 2024 · 15 comments
Open

Add new builtin: @typeId #19858

ikskuh opened this issue May 4, 2024 · 15 comments

Comments

@ikskuh
Copy link
Contributor

ikskuh commented May 4, 2024

Add a new builtin called @typeId:

@typeId(comptime T: type) usize

This builtin returns a unique integer for each type passed, and will return the same integer for the same type.

The return value must not be consistent inbetween builds, so a second build might return completly different numbers
for the same types.

An alternative variant might return u32 or u64 to have a stable interface between different platforms.

Use cases

  • Type erasure
  • Additional programmatic type safety
  • Variadic in-memory serialization

Prior art:

User-land implementation

The following version is runtime only, as we can't perform intFromPtr at compiletime:

fn typeId(comptime T: type) TypeId {
    const Tag = struct {
        var name: u8 = @typeName(T)[0]; // must depend on the type somehow!
        inline fn id() TypeId {
            return @enumFromInt(@intFromPtr(&name));
        }
    };
    return Tag.id();
}
@silversquirl
Copy link
Contributor

One small request: it would be really nice if this returned u32 or another smaller integer type, instead of usize

@sno2
Copy link
Contributor

sno2 commented May 4, 2024

It could also take inspiration from the @intFromEnum behavior and return an integer denoted as anytype which may have the smallest integer type possible. This leaves room for shrinking the return type in the future because I'm not sure how/if this would affect incremental story.

@SuperAuguste
Copy link
Sponsor Contributor

SuperAuguste commented May 4, 2024

There is actually an at-comptime solution for typeId that works today but it is absolutely horrible:

fn typeId(comptime T: type) u32 {
    return @intFromError(@field(anyerror, @typeName(T)));
}

Bad status quo solutions have helped back changes such as this one before so just wanted to share. :)

@mlugg
Copy link
Member

mlugg commented May 4, 2024

@sno2 Using RLS here is a bit tricky. There are two options:

  • The same integer value is returned regardless of the type. In this case, that integer must fit in some minimum size (e.g. u32), and so there is no difference to just returning that size integer!
  • The integer value differs based on the result type. I presume this is what you intend. The issue here is that it's a bit of a footgun; if you accidentally pass the result type as u32 and upcast to u64, or vice versa, you might accidentally get a different ID to a different part of your codebase! This would probably lead to quite tricky bugs.

If this proposal is accepted, I definitely think the returned integer should have a fixed size (probably 32 bits). 32 bits is a sweet spot: 16 is too few to represent the number of types which might exist in a large application, but 64 (or usize) is very excessive (in fact, the canonical compiler implementation won't let you have more than 2^32 distinct types today, and I am beyond certain nobody will ever hit this limitation).

@sno2
Copy link
Contributor

sno2 commented May 4, 2024

@sno2 Using RLS here is a bit tricky. There are two options:

  • The same integer value is returned regardless of the type. In this case, that integer must fit in some minimum size (e.g. u32), and so there is no difference to just returning that size integer!

I was more thinking of the compiler counting how many types we have defined and log2 that into an integer type as the TypeId integer type. Although, this is now seeming like a hard task with many different edge cases depending on when you call @typeId so I think using u32 as you said is the best option.

Also, possibly useless side note but Rust's type id uses a u128 but I wasn't able to find any reasoning or investigations into shrinking it anywhere.

@rohlem
Copy link
Contributor

rohlem commented May 5, 2024

32 bits is a sweet spot: 16 is too few to represent the number of types which might exist in a large application, [...]

I can't think of a use case I would ever use this builtin for (it's a bit at odds with my fundamental design philosophy), but for everyone here who seems to have use cases:
Do you expect to use it for all/most types that appear anywhere (including intermediately) in your entire program?
Think of every

  • integer type, signed and unsigned,
  • enum, union
  • pointer with and without mutability, for every alignment
  • arrays with different lengths
  • optional

That multiplies into a really big number.

I would expect many use cases actually only use this builtin when serializing a rather small set of types (and maybe their fields' types, recursively) over particular interfaces.
Therefore it might be less work for the compiler to only assign ids to types that have been passed to @typeId.
Doing that, maybe a u8 would be enough for some use cases, and we can guarantee only code which uses this feature "pays" for it (assuming this isn't something we already get for basically free via the intern pool - which it might be).

(Then again, maybe this is more of an ergonomics feature than a performance-oriented one?
Plus, deduplicating ids in userland is also possible, even if it poses some of the same global-type-list challenges this proposal fundamentally tries to move into the compiler.)

@likern
Copy link

likern commented May 5, 2024

Could someone give a solid use-case for this, I have nothing in my head. And never came across situation I need this even remotely.

@pfgithub
Copy link
Contributor

pfgithub commented May 5, 2024

The one reason I've wanted it in the past is for safety on *anyopaque. Assuming @typeName is guaranteed to be unique, it can be used instead.

const AnyPtr = struct {
    type_id: usize, // alternatively, [*:0]u8
    ptr: *anyopaque,
    pub fn from(item: AnyPtr, comptime T: type, ptr: *T) AnyPtr {
        return .{
            .type_id = @typeId(T), // alternatively @typeName(T).ptr
            .ptr = @ptrCast(@alignCast(ptr)),
        };
    }
    pub fn readAs(item: AnyPtr, comptime T: type) *T {
        if(item.type_id != @typeId(T)) unreachable; // alternatively `item.type_id != @typeName(T).ptr`
        return @ptrCast(@alignCast(item.ptr));
    }
};

@ikskuh
Copy link
Contributor Author

ikskuh commented May 5, 2024

Could someone give a solid use-case for this, I have nothing in my head. And never came across situation I need this even remotely.

Basically type checking (see linked any-pointer project) when doing type erasure, then see #19859 where you need to store a user-defined type in a non-generic datastructure (think HashMap(TypeId, *anyopaque), you can implement RTTI with, ...

@SuperAuguste
Copy link
Sponsor Contributor

As @MasterQ32 indicated in the original issue, his any-pointer is a great example of an exact usecase, but another explicit example of where @typeId would also be useful is in an ECS. The hack I shared above was actually created while attempting to solve an enum identification system with @slimsag where values could be detached and reattached from their respective enum types to identify components and events and store their identities easily.

Of course, all of these issues can be solved with userspace hacks but:

  • The runtime tricks are of course runtime-only, which didn't solve the ECS problem above
  • The comptime tricks are all awful hacks
  • The runtime tricks are also hacky, albeit less awful

To not use any hacks while obtaining unique type identifiers, you can do something like this, but:

  • It would require passing it to all code (perhaps this is acceptable considering Zig's dislike of global state)
  • It requires tying all code that requires shared type ID together, including dependencies/dependents, which is a huge pain if you care about extensibility (and I'm not sure if it's even possible as you can't store this comptime-only type globally and I can't think of a way to effectively pass it around to dependencies/dependents 😅)
  • The performance/memory usage is extremely questionable, especially considering (unless this has changed recently) that the compiler does not garbage collect unused data

In my opinion, any sort of RTTI-ish solution would greatly benefit from this builtin. I imagine Felix sees it the same way, thus why he opened this issue.

About implementation details @rohlem, check out my PR to see how easy it is to implement from the InternPool. In short, the InternPool stores types (and other deduplication-dependent data like default values, memoized calls, etc. though this is not important for this explanation) by inserting them into a std.AutoArrayHashMapUnmanaged(void, void) which produces a single, unique InternPool.Index (a u32-backed enum) which we can then reuse for @typeId. If, understandably, the compiler folks wouldn't like exposing InternPool indices directly, we could simply create a second map, a std.AutoArrayHashMapUnmanaged(InternPool.Index, void), which would also be relatively inexpensive.

@yzrmn
Copy link

yzrmn commented May 6, 2024

I think #5459 is directly related (and solved by #19861).

@greytdepression
Copy link

greytdepression commented May 6, 2024

@rohlem
Do you expect to use it for all/most types that appear anywhere (including intermediately) in your entire program? Think of every

* integer type, signed and unsigned,

* enum, union

* pointer with and without mutability, for every alignment

* arrays with different lengths

* optional

This made me wonder about how the technical implementation would solve something like this (I'm pretending like @typeId just returns a usize for convenience here. You could insert any necessary @intFromEnum or whatever to make it work)

fn SelfReferentialStruct(comptime T: type) type {
    return struct {
        const Self = @This();
        const array: [@typeId(Self)] u32 = undefined;
    };
}

edit: Nevermind. I just checked and there already is a check for similar transitive failures in the compiler. :)

@likern
Copy link

likern commented May 6, 2024

Am I correct that the idea os basically to split pointer to T to void pointer and type separately, and to identify type to use it's unique identifier?

If that's correct, that's very interesting feature. But I would like to extend it even further. If it's stored separately, we can save this information to disk and restore back. But only if we have stable guarantee not only within one build.

Do I'd to take into account this feature too with this proposal.
Where this might be useful? I think in databases, where there information about types is stored on disk, like in PostgreSQL where is used Oid type which uniquely identifies almost any object in database - types, attributes, tables, etc.

@SuperAuguste
Copy link
Sponsor Contributor

SuperAuguste commented May 14, 2024

After sharing my first terrible comptime typeId hack, I'm back for more:

pub fn typeId(comptime T: type) u32 {
    _ = T;
    const fn_name = @src().fn_name;
    return std.fmt.parseInt(u32, fn_name[std.mem.lastIndexOfScalar(u8, fn_name, '_').? + 1 ..], 10) catch unreachable;
}

This one even exposes the InternPool.Index of the memoized call - enjoy! :^) (this one is runtime only though :()

@silversquirl
Copy link
Contributor

stable guarantee not only within one build

This is not practical or even really possible. You can already assign explicit IDs to types manually, through a variety of methods, which is a much better option for serialization usecases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants