Elixir/Ports and external process wiring: Difference between revisions

Adamw (talk | contribs)
light c/e
Adamw (talk | contribs)
c/e
Line 9: Line 9:


{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network.  It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle any edge case.</p>
{{Aside|text=<p>[[w:rsync|Rsync]] is the standard utility for file transfers, locally or over a network.  It can resume incomplete transfers and synchronize directories efficiently, and after almost 30 years of usage it can be trusted to handle any edge case.</p>
<p>BEAM is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: it's common to port external dependencies into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}
<p>BEAM is a fairly unique ecosystem in which it's not considered deviant to reinvent a rounder wheel: an external dependency like "cron" would often be ported into native Erlang—but the complexity of rsync and its dependence on a matching remote daemon makes it unlikely that it will be rewritten any time soon, which is why I've decided to wrap external command execution in a library.</p>}}


[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]
[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|300x300px]]
Line 71: Line 71:
Our Port captures output and each line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>.  After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.
Our Port captures output and each line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code>.  After the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code> message.


As a first step, we extract the percent_done column and log any unrecognized output:
As a first step, we extract the percent_done column and flag any unrecognized output:
<syntaxhighlight lang="elixir">
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
         percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
         percent_done_text when percent_done_text != nil <- Enum.at(terms, 1),
         {percent_done, "%"} <- Float.parse(percent_done_text) do
         {percent_done, "%"} <- Float.parse(percent_done_text) do
       percent_done
       percent_done
Line 81: Line 81:
         {:unknown, line}
         {:unknown, line}
     end
     end
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—and even a leading carriage return that we can see in the rsync source code,<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
</syntaxhighlight>The <code>trim</code> is lifting more than its weight here: it lets us completely ignore spacing and newline trickery—even skipping the leading carriage return that can be seen in the rsync source code,<ref>[https://github.com/RsyncProject/rsync/blob/797e17fc4a6f15e3b1756538a9f812b63942686f/progress.c#L129 rsync/progress.c] source code</ref>
<syntaxhighlight lang="c">
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>Carriage return <code>\r</code> deserves a special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is playing a control function because of how the terminal emulator responds to it.  On a terminal the effect is to rewind the cursor so that the current line can be overwritten!
</syntaxhighlight>Carriage return <code>\r</code> deserves special mention: this "control" character is just a byte in the binary data coming over the pipe from rsync, but its normal role is to control the terminal emulator, rewinding the cursor so that the current line can be overwritten!


A repeated theme in inter-process communication is that data and control are leaky categories.  We come to the more formal control side channels later.
A repeated theme in inter-process communication is that data and control are leaky categories.  We come to the more formal control side channels later.
Line 103: Line 103:


== OTP generic server ==
== OTP generic server ==
The Port API is convenient enough so far, but where Erlang/OTP really starts to shine is when we wrap each Port connection under a gen_server<ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us some properties for free: A dedicated thread coordinates with its rsync independent of anything else.  Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way.  It holds internal state including the up-to-date completion percentage.  And the caller can either request updates manually, or it can listen for pushed statistics.
The Port API is convenient enough so far, but Erlang/OTP really starts to shine once we wrap each Port connection under a <code>gen_server</code><ref>[https://www.erlang.org/doc/apps/stdlib/gen_server.html Erlang gen_server docs]</ref> module, giving us several properties for free: A dedicated application thread coordinates with its rsync process independent of anything else.  Input and output are asynchronous and buffered, but handled sequentially in a thread-safe way.  The gen_server holds internal state including the up-to-date completion percentage.  And the caller can request updates as needed, or it can listen for push messages with the parsed statistics.


This gen_server should also be able to run under an [https://adoptingerlang.org/docs/development/supervision_trees/ OTP supervision tree] but this is where the dream falls apart, for the moment.  The Port can watch for rsync completion or failure and report this to its caller, but we fail at the second critical property of being able to shut down rsync if the calling code or our library module crashes.
This gen_server is also expected to run safely under an OTP supervision tree<ref>[https://adoptingerlang.org/docs/development/supervision_trees/ "Supervision Trees"] chapter from [https://adoptingerlang.org/ Adopting Erlang]</ref> but this is where our dream falls apart for the moment.  The Port already watches for rsync completion or failure and reports upwards to its caller, but we fail at the critical property of being able to propagate a termination downwards to shut down rsync if the calling code or our library module crashes.


== Problem: runaway processes ==
== Problem: runaway processes ==
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
[[File:CargoNet Di 12 Euro 4000 Lønsdal - Bolna.jpg|thumb]]
The unpleasant real-world consequence of this limitation is that rsync transfers would continue to run in the background even after Elixir had completely shut down, because the BEAM has no way of stopping the process.
The unpleasant real-world consequence is that rsync transfers will continue to run in the background even after Elixir kills our gen_server or shuts down, because the BEAM has no way of stopping the external process.


It might be possible to send a signal using unix "kill", but BEAM doesn't expose the child process ID and it doesn't include any built-in commands to send a signal.  Clearly we're expected to do this another way.  Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged so we can't rely on stored data and on making a few last calls before crashing.
It's possible to send a signal by shelling out to unix <code>kill PID</code>, but BEAM doesn't expose the child process ID and doesn't include any built-in functions to send a signal to an OS process.  Clearly we're expected to do this another way.  Another problem with "kill" is that we want the external process to stop no matter how badly the BEAM is damaged, so we shouldn't rely on stored data or on running final clean-up logic before exiting.


To eliminate variable and to understand whether the failure to stop was specific to rsync, I tried the same Port command but spawning a <code>sleep 60</code>, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.  This happens to have been a lucky choice, as I learned later that "sleep" is also unusual but its behavior is much simpler to reason about.
To debug what happens during <code>port_close</code> and to eliminate variables, I tried to spawn  <code>sleep 60</code> using the same Port command, and I found that it behaves exactly the same way, hanging until the sleep ends naturally regardless of what happened in Elixir or whether its pipes are still open.  This happens to have been a lucky choice as I learned later: "sleep" is unusual in the same way as rsync but its behavior is much simpler to reason about.


== Bad assumption: pipe-like processes ==
== Bad assumption: pipe-like processes ==