Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify VNet setup #41653

Closed
wants to merge 20 commits into from
Closed

Conversation

ravicious
Copy link
Member

After a conversation with Alan, I decided to try out a couple of improvements which could simplify what I've done in #41413.

In #41413, vnet.Setup and vnet.Run serve mostly as a big bags of stray functions that are needed to start VNet that I wanted to share between tsh and Connect. The main problem the split into Setup and Run was solving is that tsh can set up VNet and then start it while blocking the main thread, whereas Connect cannot do this. The RPC in Connect that sets up VNet needs to return after a successful setup and let the VNet manager and the admin subcommand run in the background.

This PR:

  • Explicitly throws blocking calls into errgroup to get rid of all the shenanigans I was doing with contexts in Improve handling of various lifecycle erorrs in VNet #41413.
  • Removes aggregation of errors from running the admin subcommand and VNet manager. If one fails we cancel the other and forward only the first error. This gets rid of a bunch of different shenanigans with cancel.

The setup code ends up being a little bit more elaborate, but I think it's easier to follow and doesn't raise as much red flags as the one I introduced in #41413 does.

@ravicious ravicious requested a review from nklaassen May 16, 2024 16:27
lib/teleterm/vnet/service.go Outdated Show resolved Hide resolved
lib/teleterm/vnet/service.go Show resolved Hide resolved
lib/vnet/setup.go Outdated Show resolved Hide resolved
lib/vnet/setup.go Outdated Show resolved Hide resolved
lib/vnet/setup.go Outdated Show resolved Hide resolved
lib/vnet/setup.go Outdated Show resolved Hide resolved
lib/vnet/setup.go Outdated Show resolved Hide resolved
lib/vnet/setup.go Show resolved Hide resolved
lib/vnet/vnet.go Outdated Show resolved Hide resolved
tool/tsh/common/vnet_darwin.go Outdated Show resolved Hide resolved
@nklaassen
Copy link
Contributor

i had a bunch of implementation suggestions but overall i do like this much better now

@ravicious
Copy link
Member Author

Didn't add anything new today besides the tests, just rebasing on top of my custom branch off of nklaassen/vnet6 to have an easier time merging.

lib/vnet/setup.go Show resolved Hide resolved
lib/vnet/setup.go Show resolved Hide resolved
@ravicious
Copy link
Member Author

@nklaassen After I rebased it on top of nklaassen/vnet6, something went wrong with terminating the admin subcommand and now it leaves leftovers in /etc/resolver. 😭

I think it's maybe because nklaassen/vnet6 is outdated or something, I swear it used to work when I had it rebased on top of vnet5 in 673bb43.

I'm signing off today, so I'll try to hunt it down tomorrow.

@ravicious
Copy link
Member Author

But generally it'd be nice if I could get this PR through reviews by the end of tomorrow, then I'd have Thursday and Friday to get other dependent PRs through.

@nklaassen
Copy link
Contributor

@nklaassen After I rebased it on top of nklaassen/vnet6, something went wrong with terminating the admin subcommand and now it leaves leftovers in /etc/resolver. 😭

I think it's maybe because nklaassen/vnet6 is outdated or something, I swear it used to work when I had it rebased on top of vnet5 in 673bb43.

I'm signing off today, so I'll try to hunt it down tomorrow.

This is the weirdest thing... I had the osascript shell command redirecting logs to a file for debugging, and when I removed that, the /etc/resolver cleanup broke. But I can't debug it without logs 😭 so I added back the log redirect to /var/log/vnet.log and now it's working again

@ravicious ravicious requested review from nklaassen and zmb3 May 22, 2024 11:03
@ravicious
Copy link
Member Author

@nklaassen @zmb3 Are there any other outstanding issues that I should address? It'd be great if I could split this tomorrow and merge it into #41413 and #41415.

@ravicious ravicious marked this pull request as ready for review May 22, 2024 15:18
@github-actions github-actions bot added size/sm tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels May 22, 2024

This comment was marked as outdated.

@ravicious ravicious added the no-changelog Indicates that a PR does not require a changelog entry label May 22, 2024
@nklaassen
Copy link
Contributor

@nklaassen @zmb3 Are there any other outstanding issues that I should address? It'd be great if I could split this tomorrow and merge it into #41413 and #41415.

no it seems good to me, a bit hard to review on top of #41413, i'll make sure to review after you merge it in there

ravicious added a commit that referenced this pull request May 23, 2024
Squashed from #41653 (only lib/vnet and tool/tsh).
@ravicious
Copy link
Member Author

Thanks for the reviews. I squashed lib/vnet and tool/tsh code from here into #41413 and lib/teleterm into #41415.

@ravicious ravicious closed this May 23, 2024
ravicious added a commit that referenced this pull request May 31, 2024
Squashed from #41653 (only lib/vnet and tool/tsh).
github-merge-queue bot pushed a commit that referenced this pull request May 31, 2024
* Split vnet.Run into vnet.Setup and vnet.Run

lib/teleterm cannot reuse this function due to different semantics
of contexts within an RPC vs a CLI command. It needs one
function that has request-response semantics and another that blocks
that can be run from a separate goroutine.

* tsh vnet: Select on ctx.Done too when awaiting TUN device

Without this extra select on ctx.Done, interrupting tsh vnet at that
moment would return an error about using a closed socket. That's because
when the context gets canceled, another goroutine closes the socket.

Instead, we should wait for either the cancelation of the context or
socket.AcceptUnix returning a value.

To test this, run `tsh vnet`. When the osascript prompt is shown, go
back to the terminal and Ctrl+C the command. The previous version would
show an error about using closed connection whereas this exits cleanly.

* Do not show context canceled errors to user

The previous version would show "context canceled" after sending SIGINT
to tsh vnet after the user entered the password in the prompt and  the
TUN device was set up.

* Fix handling of admin subcommand exiting prematurely with no err

If the socket file was removed, the admin subcommand stops. The previous
version of VNet would continue running. This version correctly
propagates the fact that the admin subcommand has quit prematurely with
no error and shuts down vnet.Manager as well.

* Correctly report admin subcommand quitting prematurely

Because of the changes from a couple of commits ago related to ignoring
context.Canceled errors, if the admin subcommand was killed or has
crashed, VNet would stop with no error message.

This commit fixes that and propagates the error from osascript.

* Avoid race condition when ctx gets canceled on awaiting TUN device

* Refactor detecting premature exit with no error

* Simplify shared VNet setup code

Squashed from #41653 (only lib/vnet and tool/tsh).

* Rename taskErr to expectedErr

* Remove special treatment of context.Canceled in tsh vnet

* Turn socket closer into critical task
github-actions bot pushed a commit that referenced this pull request May 31, 2024
Squashed from #41653 (only lib/vnet and tool/tsh).
github-merge-queue bot pushed a commit that referenced this pull request Jun 3, 2024
* Split vnet.Run into vnet.Setup and vnet.Run

lib/teleterm cannot reuse this function due to different semantics
of contexts within an RPC vs a CLI command. It needs one
function that has request-response semantics and another that blocks
that can be run from a separate goroutine.

* tsh vnet: Select on ctx.Done too when awaiting TUN device

Without this extra select on ctx.Done, interrupting tsh vnet at that
moment would return an error about using a closed socket. That's because
when the context gets canceled, another goroutine closes the socket.

Instead, we should wait for either the cancelation of the context or
socket.AcceptUnix returning a value.

To test this, run `tsh vnet`. When the osascript prompt is shown, go
back to the terminal and Ctrl+C the command. The previous version would
show an error about using closed connection whereas this exits cleanly.

* Do not show context canceled errors to user

The previous version would show "context canceled" after sending SIGINT
to tsh vnet after the user entered the password in the prompt and  the
TUN device was set up.

* Fix handling of admin subcommand exiting prematurely with no err

If the socket file was removed, the admin subcommand stops. The previous
version of VNet would continue running. This version correctly
propagates the fact that the admin subcommand has quit prematurely with
no error and shuts down vnet.Manager as well.

* Correctly report admin subcommand quitting prematurely

Because of the changes from a couple of commits ago related to ignoring
context.Canceled errors, if the admin subcommand was killed or has
crashed, VNet would stop with no error message.

This commit fixes that and propagates the error from osascript.

* Avoid race condition when ctx gets canceled on awaiting TUN device

* Refactor detecting premature exit with no error

* Simplify shared VNet setup code

Squashed from #41653 (only lib/vnet and tool/tsh).

* Rename taskErr to expectedErr

* Remove special treatment of context.Canceled in tsh vnet

* Turn socket closer into critical task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/sm tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants