Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssetCheckSummaryRecord #21940

Merged
merged 1 commit into from
May 22, 2024
Merged

AssetCheckSummaryRecord #21940

merged 1 commit into from
May 22, 2024

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented May 17, 2024

Adds an "AssetCheckState" API; which functions similarly to an AssetRecord. The core conceit is to be able to query just based on asset check key a set of information about the "state" of the asset check (most recent run, for now), without having any references to the event log.

For now, it's actually powered by the event log, however.

How I tested this

  • Added a new semi-end-to-end test to our event log storage tests. This required me setting up an instance hooked up to an event log storage; which I think is nice to have anyway.
  • There's one failing test here that's confusing me; test_get_event_records_sqlite. I changed it to use the instance passed in by fixture, which should in theory not change the results? But it seems like it has. Probably missing something here.

There will be a follow up cloud pr.

Copy link
Contributor Author

dpeng817 commented May 17, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @dpeng817 and the rest of your teammates on Graphite Graphite

@dpeng817 dpeng817 requested review from prha and sryza and removed request for prha May 17, 2024 22:06
@dpeng817 dpeng817 changed the title Asset state instance API AssetCheckState May 17, 2024
@dpeng817 dpeng817 requested a review from johannkm May 20, 2024 21:38
prha
prha previously requested changes May 20, 2024
Copy link
Member

@prha prha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more cosmetic nits... I think the storage fixtures are confusing.

Bigger issue is to get a better handle on the API signature. I think @schrockn voiced some reservations about the "state" terminology. We can do the conservative thing and maybe do get_latest_asset_check_info or something?

python_modules/dagster/dagster/_core/instance/__init__.py Outdated Show resolved Hide resolved
@@ -164,6 +164,7 @@
from dagster._core.storage.daemon_cursor import DaemonCursorStorage
from dagster._core.storage.event_log import EventLogStorage
from dagster._core.storage.event_log.base import (
AssetCheckState,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will defer to @schrockn on AssetCheckState vs LatestAssetCheckInfo.

@@ -452,6 +454,13 @@ def event_log_storage(self, request):
def instance(self, request) -> Optional[DagsterInstance]:
return None

@pytest.fixture
def end_instance(self, instance):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing to me... Why are they different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the internal PR; but the essential thing is that in the graphql event log storage tests, the "instance" fixture is expected to be the host instance, rather than the user code instance. So we need a separate instance for the "user code instance" for the cases where they are different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason this is confusing is that we are presumably testing the event log storage implementation.

But we don't make use of the fixtured storage at all in the test. We're effectively testing the implementation of DagsterInstance rather than any particular implementation of the EventLogStorage.

Instead, I would expect to see the following:

def test_get_asset_check_state(self, storage: EventLogStorage):
    ...
    events = _synthesize_events(...)
    for event in events:
        storage.store_event(event)
    latest_check_info = storage.get_latest_check_info(...)
    # assert things about the latest check info here

This way, we don't have to touch the instance at all, don't have to be aware that there is a storage class on the instance or anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing about this is that we're really just testing the relationship between store_event and get_asset_check_state / get_latest_check_info / whatever we decide to call this.

@schrockn
Copy link
Member

Idea: We could call these "summary records". It is a rollup of information about a particular key at a given point in time.

One use case I am imagining is generating a log of these over time, so you could understand why the system made decisions when it did.

"State record" could also work. But in general I like the notion of also thinking of these as a varietal of record.

@dpeng817 dpeng817 changed the title AssetCheckState AssetCheckSummaryRecord May 21, 2024
Copy link
Member

@prha prha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay if the internal/oss test suites pass.

@dpeng817 dpeng817 merged commit 50a8675 into master May 22, 2024
1 check failed
@dpeng817 dpeng817 deleted the dpeng817/asset_state_api branch May 22, 2024 15:04
OwenKephart pushed a commit that referenced this pull request May 22, 2024
Adds an "AssetCheckState" API; which functions similarly to an
AssetRecord. The core conceit is to be able to query just based on asset
check key a set of information about the "state" of the asset check
(most recent run, for now), without having any references to the event
log.

For now, it's actually powered by the event log, however.

How I tested this
- Added a new semi-end-to-end test to our event log storage tests. This
required me setting up an instance hooked up to an event log storage;
which I think is nice to have anyway.
- There's one failing test here that's confusing me;
test_get_event_records_sqlite. I changed it to use the instance passed
in by fixture, which should in theory not change the results? But it
seems like it has. Probably missing something here.

There will be a follow up cloud pr.
nikomancy pushed a commit that referenced this pull request May 22, 2024
Adds an "AssetCheckState" API; which functions similarly to an
AssetRecord. The core conceit is to be able to query just based on asset
check key a set of information about the "state" of the asset check
(most recent run, for now), without having any references to the event
log.

For now, it's actually powered by the event log, however.

How I tested this
- Added a new semi-end-to-end test to our event log storage tests. This
required me setting up an instance hooked up to an event log storage;
which I think is nice to have anyway.
- There's one failing test here that's confusing me;
test_get_event_records_sqlite. I changed it to use the instance passed
in by fixture, which should in theory not change the results? But it
seems like it has. Probably missing something here.

There will be a follow up cloud pr.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants