feat(datasets): datasets feature #3167

anticorrelator · 2024-05-10T17:29:20Z

feature branch for #2017

mikeldking

Blocking as a feature branch

review-notebook-app · 2024-05-14T01:29:02Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

* parent 1238c1a author Mikyo King <mikyo@arize.com> 1715451959 -0600 committer Mikyo King <mikyo@arize.com> 1715691349 -0600 refactor(datasets): rename legacy datasets to inferences migrate python schema refactor the UI refactor tests more refactor more refactor more fix WIP pr comments more python changes more renaming more refactor more refactor fix directory refactor tests final * gql * Rename * more refactor * package.json * final refactor * fix comment * prettier

gql build gql build

* WIP * WIP * fix * remove console log * remove console log

* feat(datasets): gql dataset create * PR comments * add defaults

* feat(datasets): expose the API playgrounds * move api up

* Rename model * Add backend Dataset schema and `get` route for Datasets by ID * Refactor app creation and test fixtures for API test harness * Ruff 🐶 * Add timestamps to payload * Use correct field names * Ensure timestamps are serializable * Flesh out test * Use async client * Add test for empty datasets * Add another nontrivial example * Ruff 🐶 * Resolve type annotation errors * Update import * Use default export path * Create export directories in test env * Try creating home directories? * Do not mount UI route in test client * Explicitly disable UI route on initialization * Fix gnarly bug (properly skip prostgresql tests) * Push example count into model * Move serialization into route handler * Use global ids for REST API * Refactor server instrumentation * Break up engine creation and instrumentation for clarity * Remove engine from session factory wrapper * Use updated httpx client syntax * Improve type specificity for db dialect property * Use explicit kwarg for sqlalchemy join * Flesh out OpenAPI response schema for GET /datasets/id route * Implement list datasets with cursor-based pagination * Check dialect type when creating session factory * Move API tests to `rest_v1` subdirectory for clarity * Use hybrid property * Properly try and return a value from hybrid property * Use inplace expression

* add form feat(datasets): create datset form feat(datasets): create datset form * Update app/src/pages/datasets/CreateDatasetForm.tsx Co-authored-by: Xander Song <axiomofjoy@gmail.com> * Update app/src/pages/datasets/CreateDatasetForm.tsx Co-authored-by: Xander Song <axiomofjoy@gmail.com> --------- Co-authored-by: Xander Song <axiomofjoy@gmail.com>

* wip * feat(datasets): gql dataset versions connection * gql

…sub-modules (#3239)

)

- adds fixture for llama-index-rag - adds test cases for pagination on Project spans resolvers

* feat(datasets): delete examples mutation * cleanup the mutation to use joins * abort mutation if delete revision already exists * gql build

* Add delete_dataset gql mutation * Dataset id is not optional for delete mutation * Return dataset from delete mutation * Tweak test naming * gql build * gql build

…ation (#3345)

- Adds `RevisionKind` enum type. - Adds `DatasetExampleRevision` type. - Adds a revisions field to the `DatasetExample` type. - Updates behavior of `DatasetExample` node interface to pull data from latest non-delete revision.

…set_id` in JSON (#3347)

* feat(datasets): Delete examples * cleanup

final touches fix casing

* fix: json payload * fix UI

* feat(datasets): sort on version * gql: build * Update app/schema.graphql Co-authored-by: Xander Song <axiomofjoy@gmail.com> * Update src/phoenix/server/api/input_types/DatasetVersionSort.py Co-authored-by: Xander Song <axiomofjoy@gmail.com> --------- Co-authored-by: Xander Song <axiomofjoy@gmail.com>

* feat(datasets): Display latest version * implement copy to clipboard * make slide over smaller * account for empty datasets * Put examples in a tab * cleanup layout

… type (#3361)

* work on progress for edit UI * cleanup dismiss * add mutation * cleanup * Update ErrorElement.tsx Co-authored-by: Xander Song <axiomofjoy@gmail.com> --------- Co-authored-by: Xander Song <axiomofjoy@gmail.com>

* feat(experiments): gql resolver for experiments * Update src/phoenix/server/api/queries.py Co-authored-by: Xander Song <axiomofjoy@gmail.com> * Update src/phoenix/server/api/types/Dataset.py Co-authored-by: Xander Song <axiomofjoy@gmail.com> * rerun gql --------- Co-authored-by: Xander Song <axiomofjoy@gmail.com>

- Adds `span` resolver on `DatasetExample` type returning optional `Span`. - Makes `span_rowid` a nullable foreign key with an index on the `dataset_examples` table. - Defines a `sqlalchemy` relationship between dataset examples and the span the example came from. - Adds corresponding tests. - Optimizes a few pre-existing queries using `load_only`.

* feat(datasets): link to view source span * add selectedSpanId

* Create datasets feature branch * Spike out experiment routes * Add status column to datasets migration * Spike out experiment runner * Ruff 🐶 * Add object payloads to experiment responses * Spike out basic eval flow * Migration and ORM fixes * Ensure we default to the latest dataset version * Always pull examples associated with correct version * Flush, don't commit * Clean up experiment creation route and remove experiment "sealing" * Implement GET experiment route * Flesh out experiment runs API * Remove experiments module spike * Flesh out evaluation route * Update migration to use trace id instead of trace rowid * Clean up API and add basic workflow test * Fix list experiment runs bugs * Remove unused exceptions * Don't use `get` for path params * Remove more "gets" from global id constructions for type clarity * Remove accidental nullable trace-rowid addition * Address PR feedback * Only use get if we have a good fallback

- Exports OpenAPI schema as a YAML file inside new `schemas` directory. - Adds a `hatch` command to accomplish this. - Adds CI to ensure that the exported schema is up to date with the code.

mikeldking changed the title ~~feat: Add datasets~~ feat(datasets): datasets feature May 10, 2024

mikeldking added the feature branch a feature branch that consolidates multiple features into a single commit on main label May 10, 2024

mikeldking requested changes May 10, 2024

View reviewed changes

mikeldking force-pushed the datasets branch from 8ebf72b to 444cf35 Compare May 13, 2024 21:22

mikeldking force-pushed the datasets branch from 56bdad8 to a297fda Compare May 21, 2024 18:30

mikeldking force-pushed the datasets branch from 4f16dce to 9505c68 Compare June 1, 2024 15:06

anticorrelator and others added 23 commits June 6, 2024 11:34

Create datasets feature branch

1619dac

ci(datasets): enable ci

6eef6c7

feat: add dataset-related tables (#3169)

e7a8692

feat(datasets): dataset upload endpoint (plus fixtures) (#3183)

acb6cbf

refactor: use strawberry relay types (#3184)

5bd6f0d

feat(datasets): datasets graphql (#3192)

1c202a8

gql build gql build

feat(datasets): datasets page (#3172)

7681cdf

* WIP * WIP * fix * remove console log * remove console log

feat(datasets): gql dataset create (#3203)

09f05f4

* feat(datasets): gql dataset create * PR comments * add defaults

feat(datasets): expose the API playgrounds (#3204)

4ef2809

* feat(datasets): expose the API playgrounds * move api up

fix(datasets): api spec for upload endpoint (#3213)

c2da353

feat(dataset): gql dataset versions connection (#3222)

b2ef63f

* wip * feat(datasets): gql dataset versions connection * gql

feat: add graphql resolver for adding spans to datasets (#3205)

0da7f28

refactor(datasets): break contents of schema into query and mutation …

8086aee

…sub-modules (#3239)

refactor: use input and output types for create datasets mutation (#3242

a950c6c

)

feat: gql resolver for dataset examples (#3238)

75fcca2

test: improve test case for add span to dataset (#3243)

b127047

test: pagination (#3245)

72e1fe3

- adds fixture for llama-index-rag - adds test cases for pagination on Project spans resolvers

feat(datasets): multi-select on span / traces tables (#3236)

b5cd85f

fix(datasets): make tests pass with new client

fac7263

feat(datasets): download dataset as CSV text file (#3250)

fd237a8

RogerHYang and others added 23 commits June 6, 2024 11:34

refactor: make csv download more restful (#3320)

564c179

feat(datasets): delete examples mutation (#3324)

f423394

* feat(datasets): delete examples mutation * cleanup the mutation to use joins * abort mutation if delete revision already exists * gql build

feat(datasets): JSON endpoint to get dataset versions (#3323)

a0979fb

feat(datasets): Delete dataset mutation (#3321)

8ea07ab

* Add delete_dataset gql mutation * Dataset id is not optional for delete mutation * Return dataset from delete mutation * Tweak test naming * gql build * gql build

fix(datasets): filter examples by dataset in gql (#3330)

296e841

add support for DatasetExample to node interface (#3327)

5f3ea67

feat: dataset example slideover (#3325)

aa1e4b9

feat(datasets): Delete dataset UI (#3336)

02f237b

ensure that objects, not strings, are sent to addExampleToDataset mut…

504b5bf

…ation (#3345)

feat(datasets): synchronously upload dataset examples returning `data…

a1a279e

…set_id` in JSON (#3347)

feat(datasets): Delete examples (#3352)

d361a92

* feat(datasets): Delete examples * cleanup

wip (#3362)

fcd3796

final touches fix casing

fix(datasets): json return payload for upload csv endpoint (#3364)

ed9cc55

* fix: json payload * fix UI

feat: add patchDatasetExamples mutation (#3343)

ae247e9

fix: ensure patches are sorted in numeric patch order (#3379)

000b693

feat(datasets): Display latest version (#3373)

836a2db

* feat(datasets): Display latest version * implement copy to clipboard * make slide over smaller * account for empty datasets * Put examples in a tab * cleanup layout

refactor(datasets): implement revision resolver on DatasetExample gql…

ac2fb2f

… type (#3361)

Add OpenAPI spec and update get dataset examples route (#3382)

0d27004

feat: add experiment-related tables and migrations (#3381)

9f57d41

chore: add description column to experiment table (#3386)

c388f13

feat(datasets): UI to edit a dataset example (#3376)

7816c35

* work on progress for edit UI * cleanup dismiss * add mutation * cleanup * Update ErrorElement.tsx Co-authored-by: Xander Song <axiomofjoy@gmail.com> --------- Co-authored-by: Xander Song <axiomofjoy@gmail.com>

mikeldking force-pushed the datasets branch from 69de464 to 7816c35 Compare June 6, 2024 18:34

mikeldking and others added 6 commits June 6, 2024 22:51

feat: add project resolver on span (#3406)

ea38e2c

feat(datasets): link to view source span (#3413)

a57d345

* feat(datasets): link to view source span * add selectedSpanId

chore: export openapi schema (#3419)

5135e3d

- Exports OpenAPI schema as a YAML file inside new `schemas` directory. - Adds a `hatch` command to accomplish this. - Adds CI to ensure that the exported schema is up to date with the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datasets): datasets feature #3167

feat(datasets): datasets feature #3167

anticorrelator commented May 10, 2024 •

edited by mikeldking

mikeldking left a comment

review-notebook-app bot commented May 14, 2024

feat(datasets): datasets feature #3167

Are you sure you want to change the base?

feat(datasets): datasets feature #3167

Conversation

anticorrelator commented May 10, 2024 • edited by mikeldking

mikeldking left a comment

Choose a reason for hiding this comment

review-notebook-app bot commented May 14, 2024

anticorrelator commented May 10, 2024 •

edited by mikeldking