Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): datasets feature #3167

Draft
wants to merge 60 commits into
base: main
Choose a base branch
from
Draft

feat(datasets): datasets feature #3167

wants to merge 60 commits into from

Conversation

anticorrelator
Copy link
Contributor

@anticorrelator anticorrelator commented May 10, 2024

feature branch for #2017

@mikeldking mikeldking changed the title feat: Add datasets feat(datasets): datasets feature May 10, 2024
@mikeldking mikeldking added the feature branch a feature branch that consolidates multiple features into a single commit on main label May 10, 2024
Copy link
Contributor

@mikeldking mikeldking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking as a feature branch

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

anticorrelator and others added 23 commits June 6, 2024 11:34
* parent 1238c1a
author Mikyo King <mikyo@arize.com> 1715451959 -0600
committer Mikyo King <mikyo@arize.com> 1715691349 -0600

refactor(datasets): rename legacy datasets to inferences

migrate python

schema

refactor the UI

refactor tests

more refactor

more refactor

more fix

WIP

pr comments

more python changes

more renaming

more refactor

more refactor

fix directory

refactor tests

final

* gql

* Rename

* more refactor

* package.json

* final refactor

* fix comment

* prettier
* WIP

* WIP

* fix

* remove console log

* remove console log
* feat(datasets): gql dataset create

* PR comments

* add defaults
* feat(datasets): expose the API playgrounds

* move api up
* Rename model

* Add backend Dataset schema and `get` route for Datasets by ID

* Refactor app creation and test fixtures for API test harness

* Ruff 🐶

* Add timestamps to payload

* Use correct field names

* Ensure timestamps are serializable

* Flesh out test

* Use async client

* Add test for empty datasets

* Add another nontrivial example

* Ruff 🐶

* Resolve type annotation errors

* Update import

* Use default export path

* Create export directories in test env

* Try creating home directories?

* Do not mount UI route in test client

* Explicitly disable UI route on initialization

* Fix gnarly bug (properly skip prostgresql tests)

* Push example count into model

* Move serialization into route handler

* Use global ids for REST API

* Refactor server instrumentation

* Break up engine creation and instrumentation for clarity

* Remove engine from session factory wrapper

* Use updated httpx client syntax

* Improve type specificity for db dialect property

* Use explicit kwarg for sqlalchemy join

* Flesh out OpenAPI response schema for GET /datasets/id route

* Implement list datasets with cursor-based pagination

* Check dialect type when creating session factory

* Move API tests to `rest_v1` subdirectory for clarity

* Use hybrid property

* Properly try and return a value from hybrid property

* Use inplace expression
* add form

feat(datasets): create datset form

feat(datasets): create datset form

* Update app/src/pages/datasets/CreateDatasetForm.tsx

Co-authored-by: Xander Song <axiomofjoy@gmail.com>

* Update app/src/pages/datasets/CreateDatasetForm.tsx

Co-authored-by: Xander Song <axiomofjoy@gmail.com>

---------

Co-authored-by: Xander Song <axiomofjoy@gmail.com>
* wip

* feat(datasets): gql dataset versions connection

* gql
- adds fixture for llama-index-rag
- adds test cases for pagination on Project spans resolvers
RogerHYang and others added 23 commits June 6, 2024 11:34
* feat(datasets): delete examples mutation

* cleanup the mutation to use joins

* abort mutation if delete revision already exists

* gql build
* Add delete_dataset gql mutation

* Dataset id is not optional for delete mutation

* Return dataset from delete mutation

* Tweak test naming

* gql build

* gql build
- Adds `RevisionKind` enum type.
- Adds `DatasetExampleRevision` type.
- Adds a revisions field to the `DatasetExample` type.
- Updates behavior of `DatasetExample` node interface to pull data from latest non-delete revision.
* feat(datasets): Delete examples

* cleanup
final touches

fix casing
* feat(datasets): sort on version

* gql: build

* Update app/schema.graphql

Co-authored-by: Xander Song <axiomofjoy@gmail.com>

* Update src/phoenix/server/api/input_types/DatasetVersionSort.py

Co-authored-by: Xander Song <axiomofjoy@gmail.com>

---------

Co-authored-by: Xander Song <axiomofjoy@gmail.com>
* feat(datasets): Display latest version

* implement copy to clipboard

* make slide over smaller

* account for empty datasets

* Put examples in a tab

* cleanup layout
* work on progress for edit UI

* cleanup dismiss

* add mutation

* cleanup

* Update ErrorElement.tsx

Co-authored-by: Xander Song <axiomofjoy@gmail.com>

---------

Co-authored-by: Xander Song <axiomofjoy@gmail.com>
mikeldking and others added 6 commits June 6, 2024 22:51
* feat(experiments): gql resolver for experiments

* Update src/phoenix/server/api/queries.py

Co-authored-by: Xander Song <axiomofjoy@gmail.com>

* Update src/phoenix/server/api/types/Dataset.py

Co-authored-by: Xander Song <axiomofjoy@gmail.com>

* rerun gql

---------

Co-authored-by: Xander Song <axiomofjoy@gmail.com>
- Adds `span` resolver on `DatasetExample` type returning optional `Span`.
- Makes `span_rowid` a nullable foreign key with an index on the `dataset_examples` table.
- Defines a `sqlalchemy` relationship between dataset examples and the span the example came from.
- Adds corresponding tests.
- Optimizes a few pre-existing queries using `load_only`.
* feat(datasets): link to view source span

* add selectedSpanId
* Create datasets feature branch

* Spike out experiment routes

* Add status column to datasets migration

* Spike out experiment runner

* Ruff 🐶

* Add object payloads to experiment responses

* Spike out basic eval flow

* Migration and ORM fixes

* Ensure we default to the latest dataset version

* Always pull examples associated with correct version

* Flush, don't commit

* Clean up experiment creation route and remove experiment "sealing"

* Implement GET experiment route

* Flesh out experiment runs API

* Remove experiments module spike

* Flesh out evaluation route

* Update migration to use trace id instead of trace rowid

* Clean up API and add basic workflow test

* Fix list experiment runs bugs

* Remove unused exceptions

* Don't use `get` for path params

* Remove more "gets" from global id constructions for type clarity

* Remove accidental nullable trace-rowid addition

* Address PR feedback

* Only use get if we have a good fallback
- Exports OpenAPI schema as a YAML file inside new `schemas` directory.
- Adds a `hatch` command to accomplish this.
- Adds CI to ensure that the exported schema is up to date with the code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature branch a feature branch that consolidates multiple features into a single commit on main
Projects
Status: 🔍. Needs Review
Development

Successfully merging this pull request may close these issues.

None yet

4 participants