Add new builtin: @typeId #19858

ikskuh · 2024-05-04T11:43:32Z

Add a new builtin called @typeId:

@typeId(comptime T: type) usize

This builtin returns a unique integer for each type passed, and will return the same integer for the same type.

The return value must not be consistent inbetween builds, so a second build might return completly different numbers
for the same types.

An alternative variant might return u32 or u64 to have a stable interface between different platforms.

Use cases

Type erasure
Additional programmatic type safety
Variadic in-memory serialization

Prior art:

any-pointer

User-land implementation

The following version is runtime only, as we can't perform intFromPtr at compiletime:

fn typeId(comptime T: type) TypeId {
    const Tag = struct {
        var name: u8 = @typeName(T)[0]; // must depend on the type somehow!
        inline fn id() TypeId {
            return @enumFromInt(@intFromPtr(&name));
        }
    };
    return Tag.id();
}

The text was updated successfully, but these errors were encountered:

silversquirl · 2024-05-04T17:29:38Z

One small request: it would be really nice if this returned u32 or another smaller integer type, instead of usize

sno2 · 2024-05-04T18:25:05Z

It could also take inspiration from the @intFromEnum behavior and return an integer denoted as anytype which may have the smallest integer type possible. This leaves room for shrinking the return type in the future because I'm not sure how/if this would affect incremental story.

SuperAuguste · 2024-05-04T18:26:30Z

There is actually an at-comptime solution for typeId that works today but it is absolutely horrible:

fn typeId(comptime T: type) u32 {
    return @intFromError(@field(anyerror, @typeName(T)));
}

Bad status quo solutions have helped back changes such as this one before so just wanted to share. :)

mlugg · 2024-05-04T19:49:08Z

@sno2 Using RLS here is a bit tricky. There are two options:

The same integer value is returned regardless of the type. In this case, that integer must fit in some minimum size (e.g. u32), and so there is no difference to just returning that size integer!
The integer value differs based on the result type. I presume this is what you intend. The issue here is that it's a bit of a footgun; if you accidentally pass the result type as u32 and upcast to u64, or vice versa, you might accidentally get a different ID to a different part of your codebase! This would probably lead to quite tricky bugs.

If this proposal is accepted, I definitely think the returned integer should have a fixed size (probably 32 bits). 32 bits is a sweet spot: 16 is too few to represent the number of types which might exist in a large application, but 64 (or usize) is very excessive (in fact, the canonical compiler implementation won't let you have more than 2^32 distinct types today, and I am beyond certain nobody will ever hit this limitation).

sno2 · 2024-05-04T20:13:20Z

@sno2 Using RLS here is a bit tricky. There are two options:

The same integer value is returned regardless of the type. In this case, that integer must fit in some minimum size (e.g. u32), and so there is no difference to just returning that size integer!

I was more thinking of the compiler counting how many types we have defined and log2 that into an integer type as the TypeId integer type. Although, this is now seeming like a hard task with many different edge cases depending on when you call @typeId so I think using u32 as you said is the best option.

Also, ~~possibly useless~~ side note but Rust's type id uses a u128 but I wasn't able to find any reasoning or investigations into shrinking it anywhere.

rohlem · 2024-05-05T10:22:26Z

32 bits is a sweet spot: 16 is too few to represent the number of types which might exist in a large application, [...]

I can't think of a use case I would ever use this builtin for (it's a bit at odds with my fundamental design philosophy), but for everyone here who seems to have use cases:
Do you expect to use it for all/most types that appear anywhere (including intermediately) in your entire program?
Think of every

integer type, signed and unsigned,
enum, union
pointer with and without mutability, for every alignment
arrays with different lengths
optional

That multiplies into a really big number.

I would expect many use cases actually only use this builtin when serializing a rather small set of types (and maybe their fields' types, recursively) over particular interfaces.
Therefore it might be less work for the compiler to only assign ids to types that have been passed to @typeId.
Doing that, maybe a u8 would be enough for some use cases, and we can guarantee only code which uses this feature "pays" for it (assuming this isn't something we already get for basically free via the intern pool - which it might be).

(Then again, maybe this is more of an ergonomics feature than a performance-oriented one?
Plus, deduplicating ids in userland is also possible, even if it poses some of the same global-type-list challenges this proposal fundamentally tries to move into the compiler.)

likern · 2024-05-05T12:34:04Z

Could someone give a solid use-case for this, I have nothing in my head. And never came across situation I need this even remotely.

pfgithub · 2024-05-05T17:58:52Z

The one reason I've wanted it in the past is for safety on *anyopaque. Assuming @typeName is guaranteed to be unique, it can be used instead.

const AnyPtr = struct {
    type_id: usize, // alternatively, [*:0]u8
    ptr: *anyopaque,
    pub fn from(item: AnyPtr, comptime T: type, ptr: *T) AnyPtr {
        return .{
            .type_id = @typeId(T), // alternatively @typeName(T).ptr
            .ptr = @ptrCast(@alignCast(ptr)),
        };
    }
    pub fn readAs(item: AnyPtr, comptime T: type) *T {
        if(item.type_id != @typeId(T)) unreachable; // alternatively `item.type_id != @typeName(T).ptr`
        return @ptrCast(@alignCast(item.ptr));
    }
};

ikskuh · 2024-05-05T17:59:06Z

Could someone give a solid use-case for this, I have nothing in my head. And never came across situation I need this even remotely.

Basically type checking (see linked any-pointer project) when doing type erasure, then see #19859 where you need to store a user-defined type in a non-generic datastructure (think HashMap(TypeId, *anyopaque), you can implement RTTI with, ...

SuperAuguste · 2024-05-05T18:43:44Z

As @MasterQ32 indicated in the original issue, his any-pointer is a great example of an exact usecase, but another explicit example of where @typeId would also be useful is in an ECS. The hack I shared above was actually created while attempting to solve an enum identification system with @slimsag where values could be detached and reattached from their respective enum types to identify components and events and store their identities easily.

Of course, all of these issues can be solved with userspace hacks but:

The runtime tricks are of course runtime-only, which didn't solve the ECS problem above
The comptime tricks are all awful hacks
The runtime tricks are also hacky, albeit less awful

To not use any hacks while obtaining unique type identifiers, you can do something like this, but:

It would require passing it to all code (perhaps this is acceptable considering Zig's dislike of global state)
It requires tying all code that requires shared type ID together, including dependencies/dependents, which is a huge pain if you care about extensibility (and I'm not sure if it's even possible as you can't store this comptime-only type globally and I can't think of a way to effectively pass it around to dependencies/dependents 😅)
The performance/memory usage is extremely questionable, especially considering (unless this has changed recently) that the compiler does not garbage collect unused data

In my opinion, any sort of RTTI-ish solution would greatly benefit from this builtin. I imagine Felix sees it the same way, thus why he opened this issue.

About implementation details @rohlem, check out my PR to see how easy it is to implement from the InternPool. In short, the InternPool stores types (and other deduplication-dependent data like default values, memoized calls, etc. though this is not important for this explanation) by inserting them into a std.AutoArrayHashMapUnmanaged(void, void) which produces a single, unique InternPool.Index (a u32-backed enum) which we can then reuse for @typeId. If, understandably, the compiler folks wouldn't like exposing InternPool indices directly, we could simply create a second map, a std.AutoArrayHashMapUnmanaged(InternPool.Index, void), which would also be relatively inexpensive.

yzrmn · 2024-05-06T12:35:59Z

I think #5459 is directly related (and solved by #19861).

greytdepression · 2024-05-06T15:19:15Z

@rohlem
Do you expect to use it for all/most types that appear anywhere (including intermediately) in your entire program? Think of every
* integer type, signed and unsigned,

* enum, union

* pointer with and without mutability, for every alignment

* arrays with different lengths

* optional

This made me wonder about how the technical implementation would solve something like this (I'm pretending like @typeId just returns a usize for convenience here. You could insert any necessary @intFromEnum or whatever to make it work)

fn SelfReferentialStruct(comptime T: type) type {
    return struct {
        const Self = @This();
        const array: [@typeId(Self)] u32 = undefined;
    };
}

edit: Nevermind. I just checked and there already is a check for similar transitive failures in the compiler. :)

likern · 2024-05-06T17:50:02Z

Am I correct that the idea os basically to split pointer to T to void pointer and type separately, and to identify type to use it's unique identifier?

If that's correct, that's very interesting feature. But I would like to extend it even further. If it's stored separately, we can save this information to disk and restore back. But only if we have stable guarantee not only within one build.

Do I'd to take into account this feature too with this proposal.
Where this might be useful? I think in databases, where there information about types is stored on disk, like in PostgreSQL where is used Oid type which uniquely identifies almost any object in database - types, attributes, tables, etc.

SuperAuguste · 2024-05-14T08:27:35Z

After sharing my first terrible comptime typeId hack, I'm back for more:

pub fn typeId(comptime T: type) u32 {
    _ = T;
    const fn_name = @src().fn_name;
    return std.fmt.parseInt(u32, fn_name[std.mem.lastIndexOfScalar(u8, fn_name, '_').? + 1 ..], 10) catch unreachable;
}

This one even exposes the InternPool.Index of the memoized call - enjoy! :^) (this one is runtime only though :()

silversquirl · 2024-05-14T08:35:14Z

stable guarantee not only within one build

This is not practical or even really possible. You can already assign explicit IDs to types manually, through a variety of methods, which is a much better option for serialization usecases.

ikskuh mentioned this issue May 4, 2024

Build system: Add support for custom exports #19859

Open

SuperAuguste mentioned this issue May 4, 2024

Implement @typeId #19861

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new builtin: @typeId #19858

Add new builtin: @typeId #19858

ikskuh commented May 4, 2024

silversquirl commented May 4, 2024

sno2 commented May 4, 2024

SuperAuguste commented May 4, 2024 •

edited

mlugg commented May 4, 2024 •

edited

sno2 commented May 4, 2024 •

edited

rohlem commented May 5, 2024 •

edited

likern commented May 5, 2024

pfgithub commented May 5, 2024

ikskuh commented May 5, 2024

SuperAuguste commented May 5, 2024

yzrmn commented May 6, 2024

greytdepression commented May 6, 2024 •

edited

likern commented May 6, 2024

SuperAuguste commented May 14, 2024 •

edited

silversquirl commented May 14, 2024

Add new builtin: @typeId #19858

Add new builtin: @typeId #19858

Comments

ikskuh commented May 4, 2024

Use cases

Prior art:

User-land implementation

silversquirl commented May 4, 2024

sno2 commented May 4, 2024

SuperAuguste commented May 4, 2024 • edited

mlugg commented May 4, 2024 • edited

sno2 commented May 4, 2024 • edited

rohlem commented May 5, 2024 • edited

likern commented May 5, 2024

pfgithub commented May 5, 2024

ikskuh commented May 5, 2024

SuperAuguste commented May 5, 2024

yzrmn commented May 6, 2024

greytdepression commented May 6, 2024 • edited

likern commented May 6, 2024

SuperAuguste commented May 14, 2024 • edited

silversquirl commented May 14, 2024

SuperAuguste commented May 4, 2024 •

edited

mlugg commented May 4, 2024 •

edited

sno2 commented May 4, 2024 •

edited

rohlem commented May 5, 2024 •

edited

greytdepression commented May 6, 2024 •

edited

SuperAuguste commented May 14, 2024 •

edited