Elixir/Ports and external process wiring: Difference between revisions

Line 1:

This ~~is a short~~ programming adventure ~~which goes~~ into piping and signaling between processes.

This deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.

== Context: controlling "rsync" ==

~~This exploration began with writing a library<ref>~~https://hexdocs.pm/rsync/Rsync.html~~</ref> to run rsync in order to transfer files in a background thread and monitor progress. I hoped to learn how to interface with long-lived external processes, and I got more than I wished for.~~

{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}

Starting rsync ~~would~~ be as easy as calling out to a shell:<syntaxhighlight lang="elixir">

My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background and can monitor progress. I hoped to learn better how to interface with long-lived external processes—and I got more than I wished for.

System.shell("rsync -a ~~src~~ target")

[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|400x400px]]

Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">

System.shell("rsync -a source target")

</syntaxhighlight>

This has a few shortcomings: filename escaping ~~is hard to do safely~~ so <code>System.cmd</code> ~~should be used instead~~, ~~and the~~ job would block until the transfer is ~~done so~~ we get no feedback until completion~~. Ending the shell command in an ampersand <code>&</code> is not enough, so the caller would have to manually start a new thread~~.

This has a few shortcomings, starting with filename escaping so at a minimum we should use <code>System.cmd</code>:<syntaxhighlight lang="elixir">

System.find_executable(rsync_path)

|> System.cmd([~w(-a), source, target])

</syntaxhighlight>However this job would block until the transfer is finished and we get no feedback until completion.

Elixir's low-level <code>Port</code> ~~call~~ maps directly to ~~the base Erlang~~ open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> ~~and it gives much more~~ flexibility:<syntaxhighlight lang="elixir">

Elixir's low-level <code>Port.open</code> maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> which provides flexibility. Here we have a command turning some knobs:<syntaxhighlight lang="elixir">

Port.open(

{:spawn_executable, rsync_path},

Line 26:

Line 33:

]

)

</syntaxhighlight>

Progress lines have a fairly self-explanatory format:

3,342,336 33% 3.14MB/s 0:00:02

</syntaxhighlight>

{{Aside|text=

~~If you're here for~~ rsync, ~~it includes a few alternatives for~~ progress ~~reporting~~:

rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".

Here is the menu:

; <code>--info=progress2</code> : report overall progress

~~; <code>--info=progress2</code> : reports overall progress~~

; <code>--progress</code> : report statistics per file

; <code>--progress</code> : ~~reports~~ statistics per ~~file~~

~~; <code>--itemize-changes</code> ; lists the operations taken on each~~ file

~~Progress reporting uses a columnar format:~~

; <code>--itemize-changes</code> : list the operations taken on each file

<~~syntaxhighlight lang="text"~~>

~~3,342,336 33% 3.14MB/s 0:00:02~~

</~~syntaxhighlight~~>

}}

~~{{Aside|text=~~

Each rsync output line is sent to the library callback <code>handle_info</code> as <code>{:data, line}</code>, and after transfer is finished it receives a conclusive <code>{:exit_status, status_code}</code>.

~~On the terminal the progress~~ line is ~~updated in-place by restarting~~ the ~~line with the fun [[w:Carriage return|carriage return]] control character~~ <code>~~0x0d~~</code> or <code>\r</code>~~. This~~ is ~~apparently named after pushing the physical paper carriage of a typewriter and on a terminal~~ it ~~will erases the current line so it can be written again! But over~~ a ~~pipe~~ we ~~see this as a regular byte in~~ the ~~stream~~, ~~like~~ "<~~code>~~-~~old line~~-^M-~~new~~ line-</code>~~". [[W:|Disagreements]] about~~ carriage return ~~vs. newline have caused eye-rolling since the dawn of personal computing but we~~ can ~~double-check the~~ rsync source ~~code and we see that it will format output using carriage return on any platform:~~ <syntaxhighlight lang="c">

Here we extract the percent_done column and strictly reject any other output:

with terms when terms != [] <- String.split(line, ~r"\s", trim: true),

percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),

{percent_done, "%"} <- Float.parse(percent_done_text) do

percent_done

else

_ ->

{:unknown, line}

end

</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or the leading carriage return you can see in this line from rsync's source,

rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);

</syntaxhighlight>

{{Aside|text=

On the terminal, rsync progress lines are updated in-place by emitting the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code> as you see above. The character seems to be named after pushing the physical paper carriage of a typewriter backwards without feeding a new line. On the terminal this overwrites the current line!

[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreements about carriage return]] vs. newline have caused eye-rolling since the dawn of personal computing.

}}

One more comment about this carriage return: it's a byte in the binary data coming over the pipe from rsync, but it plays a "control" function because of how it will be interpreted by the tty. A repeated theme is that data and control are leaky categories,

This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line. It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.

@@ Line 1: / Line 1: @@
-This is a short programming adventure which goes into piping and signaling between processes.
+This deceivingly simple programming adventure veers unexpectedly into piping and signaling between unix processes.
 == Context: controlling "rsync" ==
-This exploration began with writing a library<ref>https://hexdocs.pm/rsync/Rsync.html</ref> to run rsync in order to transfer files in a background thread and monitor progress.  I hoped to learn how to interface with long-lived external processes, and I got more than I wished for.
+{{Project|source=https://gitlab.com/adamwight/rsync_ex/|status=beta|url=https://hexdocs.pm/rsync/Rsync.html}}
-Starting rsync would be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
+My exploration begins while writing a beta-quality rsync library for Elixir which transfers files in the background and can monitor progress.  I hoped to learn better how to interface with long-lived external processes—and I got more than I wished for.
-System.shell("rsync -a src target")
+[[File:Monkey eating.jpg|alt=A Toque macaque (Macaca radiata) Monkey eating peanuts. Pictured in Bangalore, India|right|400x400px]]
+Starting rsync should be as easy as calling out to a shell:<syntaxhighlight lang="elixir">
+System.shell("rsync -a source target")
 </syntaxhighlight>
-This has a few shortcomings: filename escaping is hard to do safely so <code>System.cmd</code> should be used instead, and the job would block until the transfer is done so we get no feedback until completion.  Ending the shell command in an ampersand <code>&</code> is not enough, so the caller would have to manually start a new thread.
+This has a few shortcomings, starting with filename escaping so at a minimum we should use <code>System.cmd</code>:<syntaxhighlight lang="elixir">
+System.find_executable(rsync_path)
+|> System.cmd([~w(-a), source, target])
+</syntaxhighlight>However this job would block until the transfer is finished and we get no feedback until completion.
-Elixir's low-level <code>Port</code> call maps directly to the base Erlang open_port<ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> and it gives much more flexibility:<syntaxhighlight lang="elixir">
+Elixir's low-level <code>Port.open</code> maps directly to ERTS <code>open_port</code><ref>https://www.erlang.org/doc/apps/erts/erlang.html#open_port/2</ref> which provides flexibility.  Here we have a command turning some knobs:<syntaxhighlight lang="elixir">
 Port.open(
    {:spawn_executable, rsync_path},
@@ Line 26: / Line 33: @@
    ]
 )
+</syntaxhighlight>
+Progress lines have a fairly self-explanatory format:
+<syntaxhighlight lang="text">
+,342,336  33%    3.14MB/s    0:00:02
 </syntaxhighlight>
 {{Aside|text=
-If you're here for rsync, it includes a few alternatives for progress reporting:
+rsync has a variety of progress options, we chose overall progress above so the meaning of the percentage is "overall percent complete".
+Here is the menu:
+; <code>--info=progress2</code> : report overall progress
-; <code>--info=progress2</code> : reports overall progress
+; <code>--progress</code> : report statistics per file
-; <code>--progress</code> : reports statistics per file
-; <code>--itemize-changes</code> ; lists the operations taken on each file
-Progress reporting uses a columnar format:
+; <code>--itemize-changes</code> : list the operations taken on each file
-<syntaxhighlight lang="text">
-,342,336  33%    3.14MB/s    0:00:02
-</syntaxhighlight>
 }}
-{{Aside|text=
+Each rsync output line is sent to the library callback <code>handle_info</code> as <code>{:data, line}</code>, and after transfer is finished it receives a conclusive <code>{:exit_status, status_code}</code>.
-On the terminal the progress line is updated in-place by restarting the line with the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code>.  This is apparently named after pushing the physical paper carriage of a typewriter and on a terminal it will erases the current line so it can be written again!  But over a pipe we see this as a regular byte in the stream, like "<code>-old line-^M-new line-</code>".  [[W:|Disagreements]] about carriage return vs. newline have caused eye-rolling since the dawn of personal computing but we can double-check the rsync source code and we see that it will format output using carriage return on any platform: <syntaxhighlight lang="c">
+Here we extract the percent_done column and strictly reject any other output:
+<syntaxhighlight lang="elixir">
+with terms when terms != [] <- String.split(line, ~r"\s", trim: true),
+         percent_done_text when is_binary(percent_done_text) <- Enum.at(terms, 1),
+         {percent_done, "%"} <- Float.parse(percent_done_text) do
+      percent_done
+    else
+      _ ->
+        {:unknown, line}
+    end
+</syntaxhighlight>The <code>trim</code> lets us ignore spacing and newline trickery—or the leading carriage return you can see in this line from rsync's source,
+<syntaxhighlight lang="c">
 rprintf(FCLIENT, "\r%15s %3d%% %7.2f%s %s%s", ...);
 </syntaxhighlight>
+{{Aside|text=
+On the terminal, rsync progress lines are updated in-place by emitting the fun [[w:Carriage return|carriage return]] control character <code>0x0d</code> or <code>\r</code> as you see above.  The character seems to be named after pushing the physical paper carriage of a typewriter backwards without feeding a new line.  On the terminal this overwrites the current line!
+[[w:https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats|Disagreements about carriage return]] vs. newline have caused eye-rolling since the dawn of personal computing.
 }}
+One more comment about this carriage return: it's a byte in the binary data coming over the pipe from rsync, but it plays a "control" function because of how it will be interpreted by the tty.  A repeated theme is that data and control are leaky categories,
 This is where Erlang/OTP really starts to shine: by opening the port inside of a dedicated gen_server<ref>https://www.erlang.org/doc/apps/stdlib/gen_server.html</ref> we have a separate thread communicating with rsync, which receives an asynchronous message like <code>{:data, text_line}</code> for each progress line.  It's easy to parse the line, update some internal state and optionally send a progress summary to the code calling the library.