Elixir/Ports and external process wiring: Difference between revisions

Adamw (talk | contribs)
c/e, image, formatting and arrangement
Adamw (talk | contribs)
clarify
Line 11: Line 11:
System.shell("rsync -a source target")
System.shell("rsync -a source target")
</syntaxhighlight>
</syntaxhighlight>
This has a few shortcomings, starting with filename escaping so at a minimum we should use <code>System.cmd</code>:<syntaxhighlight lang="elixir">
This has a few shortcomings, such as the static filenames—it feels unsafe to even demonstrate how string interpolation like <code>#{source}</code> could be misused so let's skip straight to the next tool,  <code>System.cmd</code> which doesn't expand its argv:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>However this job would block until the transfer is finished and we get no feedback until completion.
</syntaxhighlight>This is safer, but the calling thread loses control and gets no feedback until the transfer is complete.


Elixir's low-level <code>Port.open</code> maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> which provides flexibilityHere we have a command turning some knobs:<syntaxhighlight lang="elixir">
To run a external process asynchronously we reach for Elixir's lowest-level <code>Port.open</code> which maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref>.  These are tremendously flexible, here we demonstrate turning a few knobs:<syntaxhighlight lang="elixir">
Port.open(
Port.open(
   {:spawn_executable, rsync_path},
   {:spawn_executable, rsync_path},
Line 35: Line 35:
</syntaxhighlight>
</syntaxhighlight>


Progress lines have a fairly self-explanatory format:
Progress lines come in with a fairly self-explanatory format:
<syntaxhighlight lang="text">
<syntaxhighlight lang="text">
       3,342,336  33%    3.14MB/s    0:00:02
       3,342,336  33%    3.14MB/s    0:00:02
Line 43: Line 43:
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".


Here is the menu:
Here is the menu of alternatives:


; <code>--info=progress2</code> : report overall progress
; <code>--info=progress2</code> : report overall progress
Line 52: Line 52:
}}
}}


Each rsync output line is sent to the library callback <code>handle_info</code> as <code>{:data, line}</code>, and after transfer is finished it receives a conclusive <code>{:exit_status, status_code}</code>.
Each rsync output line is sent to the library's <code>handle_info</code> callback as <code>{:data, line}</code> and after the transfer is finished we receive a conclusive <code>{:exit_status, status_code}</code>.


Here we extract the percent_done column and strictly reject any other output:
We extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
Line 64: Line 64:
         {:unknown, line}
         {:unknown, line}
     end
     end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or the leading carriage return you can see in this line from rsync's source,
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or even a leading carriage return as you can see in the rsync source code,
<syntaxhighlight lang="c">
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
Line 70: Line 70:


{{Aside|text=
{{Aside|text=
On the terminal, rsync progress lines are updated in-place by emitting the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code> as you see above.  The character seems to be named after pushing the physical paper carriage of a typewriter backwards without feeding a new line.  On the terminal this overwrites the current line!
On the terminal, rsync progress lines are updated in place by emitting a [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code> as you see above.  The character seems to be named after pushing the physical paper carriage of a typewriter backwards without feeding a new line.  On the terminal this overwrites the current line!


[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreements about carriage return]] vs. newline have caused eye-rolling since the dawn of personal computing.
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreements about carriage return]] vs. newline have caused eye-rolling since the dawn of personal computing.
}}
}}


One more comment about this carriage return: it's a byte in the binary data coming over the pipe from rsync, but it plays a "control" function because of how it will be interpreted by the tty.  A repeated theme is that data and control are leaky categories,
One more comment about this carriage return: the "control" character is just a byte in the binary data coming over the pipe from rsync, but it plays a control function because of how the tty interprets itStill, a repeated theme is that data and control are leaky categories.  We come to the more formal control side channels later.


This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.
This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.