Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forking breaks lwt-io if happens after Lwt_main.run #970

Open
ivg opened this issue Oct 28, 2022 · 3 comments
Open

forking breaks lwt-io if happens after Lwt_main.run #970

ivg opened this issue Oct 28, 2022 · 3 comments

Comments

@ivg
Copy link

ivg commented Oct 28, 2022

If I use Lwt_unix.fork (or Unix.fork) after any successful invocation of Lwt_main.run (even Lwt_main.run (Lwt.return ())), then in all consecutive forks the lwt-io system will not work in the child processes. The peculiar thing here is that this happens only if I do not perform any Lwt-specific stuff in the first fork1 (i.e., this fork is totally lwt-independent). The version that I am using is 5.6.1.

Here is the code to reproduce, put it into run.ml,

open Lwt.Infix
open Lwt.Syntax

let fork_and_wait () =
  match Lwt_unix.fork () with
  | 0 -> Unix.sleep 1; exit 0
  | child ->
      match Unix.waitpid [] child with
      | _,WEXITED 0 -> ()
      | _ -> assert false

let fork_and_talk () =
  let input,output = Lwt_unix.pipe () in
  match Lwt_unix.fork () with
  | 0 ->
      Lwt_unix.close input >>= fun () ->
      let output = Lwt_io.of_fd ~mode:Output output in
      Lwt_io.write_value output "hello!" >>= fun () ->
      Lwt_io.flush output >>= fun () ->
      Lwt_io.close output >|= fun () ->
      exit 0
  | pid ->
      Lwt_unix.close output >>= fun () ->
      let input = Lwt_io.of_fd ~mode:Input input in
      let* hello = Lwt_io.read_value input in
      Lwt_io.close input >>= fun () ->
      Lwt_io.printl ("fork and talk: " ^ hello) >>= fun () ->
      Lwt_unix.waitpid [] pid >|= function
      | _,WEXITED 0 -> ()
      | _ -> assert false

let just_talk () =
  let input,output = Lwt_unix.pipe () in
  let output = Lwt_io.of_fd ~mode:Output output in
  Lwt_io.write_value output "hello!" >>= fun () ->
  Lwt_io.flush output >>= fun () ->
  Lwt_io.close output >>= fun () ->
  let input = Lwt_io.of_fd ~mode:Input input in
  let* hello = Lwt_io.read_value input in
  Lwt_io.close input >>= fun () ->
  Lwt_io.printl ("just talk: " ^ hello)

let () =
  Lwt_main.run (Lwt.return ());  (* all works if this line is removed *)
  fork_and_wait ();              (* or if this one is removed *)
  Lwt_main.run (fork_and_talk ());
  Lwt_main.run (just_talk ());

and the dune file for your convenience,

(executable
 (name run)
 (libraries lwt lwt.unix))

FWIW, I also tried using the libev engine, with select, poll, and epoll backends, they all exhibit the same behavior.


1)) In terms of the above example, it means that if I will remove fork_and_wait then I can do fork_and_talk as many times as I like. It is only the peculiar combination of running any Lwt_main.run (even the trivial one that apparently shall not have any observable side-effects) and doing a fork (even with Unix.fork) that doesn't touch any lwt-related stuff. Unfortunately, this combination is quite common in large applications.

@ChrisVine
Copy link

ChrisVine commented Nov 16, 2022

If you insert Lwt_unix.set_pool_size 0 ; before the first Lwt_main.run it will work. I Imagine Lwt starts a worker thread, which will not exist in the child process after the fork. With Lwt < 5, Lwt_unix.(set_default_async_method Async_none) would also have done the trick.

More generally I have no idea why Lwt provides Lwt_unix.fork given that (i) Lwt starts worker threads on encountering blocking system calls and (ii) threads and fork don't mix (all you can do in the child process after fork in a multi-threaded program is call async-signal-safe functions and then exec). Note that Unix.execv* is not thread safe and cannot be used in multi-threaded programs either.

@ChrisVine
Copy link

Possibly Lwt_unix.fork is not thread safe either, because the documentation says that (as opposed to Unix.fork) "in the child process all pending jobs are canceled", and I doubt that that can be achieved only by means of applying async-signal-safe functions.

@ivg
Copy link
Author

ivg commented Dec 5, 2022

If you insert Lwt_unix.set_pool_size 0 ; before the first Lwt_main.run it will work. I Imagine Lwt starts a worker thread, which will not exist in the child process after the fork. With Lwt < 5, Lwt_unix.(set_default_async_method Async_none) would also have done the trick.

Yes, it was our workaround to use Lwt_unix.(set_default_async_method Async_none) in the child process.

More generally I have no idea why Lwt provides Lwt_unix.fork given that (i) Lwt starts worker threads on encountering blocking system calls and (ii) threads and fork don't mix (all you can do in the child process after fork in a multi-threaded program is call async-signal-safe functions and then exec). Note that Unix.execv* is not thread safe and cannot be used in multi-threaded programs either.

Yep, in the end we had to switch to lwt-parallel library instead of using fork directly. This library addresses this issue by creating a process snapshot before lwt is used. Probably, this is the only safe way of using Lwt_unix.fork with lwt. We pushed some new features to lwt-parallel (including explicit snapshots). See ocaml/opam-repository#22611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants