Elixir/Ports and external process wiring: Difference between revisions

Adamw (talk | contribs)
increase all heading levels
Adamw (talk | contribs)
lots of background detail
Line 1: Line 1:
== Challenge: controlling "rsync" ==
This is a short programming adventure which goes into piping and signaling between processes.
This exploration began as I wrote a simple library to run rsync from Elixir.<ref>https://hexdocs.pm/rsync/Rsync.html</ref>  I was hoping to learn how to interface with long-lived external processes, in this case to transfer files and monitor progress.  Starting and reading from rsync went very well, thanks to the <code>--info=progress2</code> option which reports progress in a fairly machine-readable format.  I was able to start the file transfer, capture status, and report it back to the Elixir caller in various ways.


My library starts rsync using a low-level <code>Port</code> call, which maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> implementation:<syntaxhighlight lang="elixir">
== Context: controlling "rsync" ==
This exploration began when I wrote a simple library to run rsync from an Elixir program<ref>https://hexdocs.pm/rsync/Rsync.html</ref>, to transfer files in a background thread while monitoring progress.  I was hoping to learn how to interface with long-lived external processes, and I ended up learning more than I wished for.
 
Starting rsync and reading from it went very well, mostly thanks to the <code>--info=progress2</code> option which reports progress with a simple columnar format that can be easily parsed:<syntaxhighlight lang="text">
      3,342,336  33%    3.14MB/s    0:00:02
</syntaxhighlight>In case you're here to integrate with rsync, there's also a slightly different <code>--progress</code> option which reports statistics per file, and an option <code>--itemize-changes</code> which can be included to get information about the operations taken on each file, but in my case I care more about the overall transfer progress.
 
On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>.  This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again!  But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>".  [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>
 
My library starts rsync using Elixir's low-level <code>Port</code> call, which maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> implementation:<syntaxhighlight lang="elixir">
Port.open(
Port.open(
   {:spawn_executable, rsync_path},
   {:spawn_executable, rsync_path},
Line 19: Line 29:
   ]
   ]
)
)
</syntaxhighlight>
</syntaxhighlight>This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.


== Problem: runaway processes ==
== Problem: runaway processes ==
Since I was calling my rsync library from an application under development, I would often kill the program abruptly by crashing or by typing <control>-C in the terminal.  What I found is that the rsync transfer would continue to run in the background even after Elixir had completely shut down.
This would have been the end of the story, but I'm a very flat-footed and iterative developer and as I was calling my rsync library from my application under development, I would often kill the program abruptly by crashing or by typing <control>-C in the terminal.  Dozens of times.  What I found is that the rsync transfers would continue to run in the background even after Elixir had completely shut down.


That would have to change—leaving overlapping file transfers running unmonitored is exactly what I wanted to avoid by having Elixir control the process in the first place.
That would have to change—leaving overlapping file transfers running unmonitored is exactly what I wanted to avoid by having Elixir control the process in the first place.  Once the BEAM stops there was no way to clearly identify and kill the sketchy rsyncing.
 
In fact, killing the lower-level threads when a higher-level supervising process dies is central to the BEAM concept of supervisors<ref>https://www.erlang.org/doc/system/sup_princ.html</ref> which has earned the virtual machine its reputation for being legendarily robust.  Why would some external processes stop and others not?  There seemed to be no way to send a signal or close the port to stop the process, either.


== Bad assumption: pipe-like processes ==
== Bad assumption: pipe-like processes ==
A common use case is to use external processes for something like compression and decompression.  A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, using a C system call like this:<syntaxhighlight lang="c">
A straightforward use case for external processes would be to run a standard transformation such as compression or decompression.  A program like <code>gzip</code> or <code>cat</code> will stop once it detects that its input has ended, because the main loop usually makes a C system call to <code>read</code> like this:<syntaxhighlight lang="c">
ssize_t n_read = read (input_desc, buf, bufsize);
ssize_t n_read = read (input_desc, buf, bufsize);
if (n_read < 0) { error... }
if (n_read < 0) { error... }
if (n_read == 0) { end of file... }
if (n_read == 0) { end of file... }
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed.
</syntaxhighlight>The manual for read<ref>https://man.archlinux.org/man/read.2</ref> explains that reading 0 bytes indicates the end of file, and a negative number indicates an error such as the input file descriptor already being closed.  If you think this sounds weird, I would agree: how do we tell the difference between a stream which is stalled and one which has ended?  Does the calling process yield control until input arrives?  How do we know if more than bufsize bytes are available?  If that word salad excites you, read more about <code>O_NONBLOCK</code><ref>https://man.archlinux.org/man/open.2.en#O_NONBLOCK</ref> and unix pipes<ref>https://man.archlinux.org/man/pipe.7.en</ref>.
 
But here we'll focus on how processes affect each other through pipes.  Surprising answer: not very much!  Try opening a "cat" in the terminal and then type <control>-d to "send" an end-of-file.  Oh no, you killed it!  You didn't actually send anything, instead the <control>-d is interpreted by bash and it responds by closing the pipe to the child process.  This is similar to how <control>-c is not sending a character but is interpreted by the terminal, trapped by the shell and forwarded as an interrupt signal to the child process, completely independently of the data pipe.  My entry point to learning more is this stty webzine<ref>https://wizardzines.com/comics/stty/</ref> by Julia Evans.  Go ahead, try it: <code>stty -a</code>
 
Any special behavior at the other end of a pipe is the result of intentional programming decisions and "end of file" (EOF) is more a convention than a real thing.  Now try opening "watch ls" or "sleep 60" and try <control>-d all you want—no effect.  You did close its stdin but nobody cares because it wasn't listening anway.
 
Back to the problem at hand, as it turns out "rsync" is in this latter category of programs which sees itself as a daemon which should continue even when input is closed.  This makes sense enough, since rsync expects no user input and its output is just a side-effect of its main purpose.


BEAM assumes the connected process behaves like this, so nothing needs to be done to clean up a dangling external process because it will end itself as soon as the Port is closed or the BEAM exits.  If the external process is known to not behave this way, the recommendation is to wrap it in a shell script which converts a closed stdin into a kill signal.<ref>https://hexdocs.pm/elixir/main/Port.html#module-orphan-operating-system-processes</ref>
BEAM assumes the connected process behaves like this, so nothing needs to be done to clean up a dangling external process because it will end itself as soon as the Port is closed or the BEAM exits.  If the external process is known to not behave this way, the recommendation is to wrap it in a shell script which converts a closed stdin into a kill signal.<ref>https://hexdocs.pm/elixir/main/Port.html#module-orphan-operating-system-processes</ref>