I just made the following request to GNU coreutils team.
From
Annada Behera@annada@tilde.green to
tilde.meta on Wed May 28 15:52:48 2025
Dear GNU Coreutils maintainers,
I am writing to propose a backward-compatible enhancement that could
improve modern scripting enviroments while maintaining complete
compatiblily with existing workflows without any impact on performance. PROBLEM:
Although the output of coreutils are meant to be human readable
only, many scripts today use/pipe them to other commands for various
kinds of automation. This leads to brittle solutions involving
complex awk/sed/grep gymnastics that break when the output format
changes slightly. While "everything is text" philosophy has served GNU/Unix/Linux well, structured data processing has become important in
modern computing.
Even Microsoft people recognized this more than 20 years ago and added
built in structured output into MS Powershell from day one completely eliminating text parsing entirely. Cloud tools like Docker, kubectl, MS
Github, Google Gcloud and increasing number of cli tools are providing
JSON output options as flags, as well as shells like Nushell, who have reimplemented most of the coreutils to output structured data. This is
not unpresidented in the industry.
PROPOSAL: stdoutm and stderrm
I would like to propose the addition of two new optional machine
readable output streams (in addition to already present human readable streams):
- stdout (fd 1): human readable output
- stderr (fd 2): human readable errors
- stdoutm (fd 3): machine readable output (NEW)
- stderrm (fd 4): machine readable errors (NEW)
The machine readable output format and conventions needs to be
established. JSON is the most obvious choice with battle-tested parsers
and tools, and immediately available for the scripting ecosystem. This
could be implemented incrementally, starting with "high-usage" commands
like (ls, ps, df, du) and then gradually expand coverage.
If the structured output is generated only when fd3/4 are open, there
should be not performance penalty and all existing behavior will
identical. It also doesn't require any flags or arguments.
EXAMPLES:
# Traditional usage - UNCHANGED
ls -l
# Structured output
ls 3>&1 1>/dev/null > metadata.json
# Structured output scripting
ls 3>&1 1>/dev/null | fx 'this.filter(x => x.size > 1048576)'
ls 3>&1 1>/dev/null | jq '.[] | select(.size > 1048576)'
# Traditional brittle approach (unreadable)
ls -la | awk '$5 > 1048576 {print $9}' | grep -v '^d'
# Structured error handling
find / -name "*.txt" 4>&1 3>/dev/null | jq '.[] | select(.error == "EACCES")'
This eliminates unreliable fragile regex based approaches, provides
structured error handling, integrated with already present tools like
fx, jq and python scripts making sure existing scripts are not affected
at all (while gradually transitioning to structured output).
Would the maintainer team be interested in discussing further?
Thank you for your time and considerations.
Annada
--- Synchronet 3.20a-Linux NewsLink 1.2