Elixir/Ports and external process wiring: Difference between revisions

Adamw (talk | contribs)
c/e and move out some asides
Adamw (talk | contribs)
c/e, image, formatting and arrangement
Line 1: Line 1:
This is a short programming adventure which goes into piping and signaling between processes.
This deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.


== Context: controlling "rsync" ==
== Context: controlling "rsync" ==
This exploration began with writing a library<ref>https://hexdocs.pm/rsync/Rsync.html</ref> to run rsync in order to transfer files in a background thread and monitor progress.  I hoped to learn how to interface with long-lived external processes, and I got more than I wished for.
{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}


Starting rsync would be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background and can monitor progress.  I hoped to learn better how to interface with long-lived external processes—and I got more than I wished for.
System.shell("rsync -a src target")
 
[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|400x400px]]
 
Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
System.shell("rsync -a source target")
</syntaxhighlight>
</syntaxhighlight>
This has a few shortcomings: filename escaping is hard to do safely so <code>System.cmd</code> should be used instead, and the job would block until the transfer is done so we get no feedback until completion.  Ending the shell command in an ampersand <code>&</code> is not enough, so the caller would have to manually start a new thread.
This has a few shortcomings, starting with filename escaping so at a minimum we should use <code>System.cmd</code>:<syntaxhighlight lang="elixir">
System.find_executable(rsync_path)
|> System.cmd([~w(-a), source, target])
</syntaxhighlight>However this job would block until the transfer is finished and we get no feedback until completion.


Elixir's low-level <code>Port</code> call maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> and it gives much more flexibility:<syntaxhighlight lang="elixir">
Elixir's low-level <code>Port.open</code> maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> which provides flexibility.  Here we have a command turning some knobs:<syntaxhighlight lang="elixir">
Port.open(
Port.open(
   {:spawn_executable, rsync_path},
   {:spawn_executable, rsync_path},
Line 26: Line 33:
   ]
   ]
)
)
</syntaxhighlight>
Progress lines have a fairly self-explanatory format:
<syntaxhighlight lang="text">
      3,342,336  33%    3.14MB/s    0:00:02
</syntaxhighlight>
</syntaxhighlight>


{{Aside|text=
{{Aside|text=
If you're here for rsync, it includes a few alternatives for progress reporting:
rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".
 
Here is the menu:
 
; <code>--info=progress2</code> : report overall progress


; <code>--info=progress2</code> : reports overall progress
; <code>--progress</code> : report statistics per file
; <code>--progress</code> : reports statistics per file
; <code>--itemize-changes</code> ; lists the operations taken on each file


Progress reporting uses a columnar format:
; <code>--itemize-changes</code> : list the operations taken on each file
<syntaxhighlight lang="text">
      3,342,336  33%    3.14MB/s    0:00:02
</syntaxhighlight>
}}
}}


{{Aside|text=
Each rsync output line is sent to the library callback <code>handle_info</code> as <code>{:data, line}</code>, and after transfer is finished it receives a conclusive <code>{:exit_status, status_code}</code>.
On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>.  This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again!  But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>".  [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
 
Here we extract the percent_done column and strictly reject any other output:
<syntaxhighlight lang="elixir">
with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
        percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
        {percent_done, "%"} <- Float.parse(percent_done_text) do
      percent_done
    else
      _ ->
        {:unknown, line}
    end
</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or the leading carriage return you can see in this line from rsync's source,
<syntaxhighlight lang="c">
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
</syntaxhighlight>
</syntaxhighlight>
{{Aside|text=
On the terminal, rsync progress lines are updated in-place by emitting the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code> as you see above.  The character seems to be named after pushing the physical paper carriage of a typewriter backwards without feeding a new line.  On the terminal this overwrites the current line!
[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreements about carriage return]] vs. newline have caused eye-rolling since the dawn of personal computing.
}}
}}
One more comment about this carriage return: it's a byte in the binary data coming over the pipe from rsync, but it plays a "control" function because of how it will be interpreted by the tty.  A repeated theme is that data and control are leaky categories,


This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.
This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.