I love awk as a language / framework. If it got an uplift to make it useful for more complex problems it would be an absolute winner for lots of basic data processing tasks.
I’ve often written incantations many lines long and even broken out to actually writing awk scripts from time to time.
Once the penny drops with it it’s great fun but it’s absolutely useless once your problems get to any degree of sophistication.
I typically move into Python at this stage. Perl and Ruby are probably a more elegant fit here but those aren’t rows I want to have.
In this day and age, awk really needs CSV (RFC-4180) support and better semantic scoping and library support.
I’d also think it would be neat as an embedded language for various data processing platforms but if we haven’t seen it yet I doubt we will ever see it.
EDIT support for file formats beyond plain text would also be a winner.
show comments
Elfener
Technically an empty awk program is an implementation of cat(1) (before it came back from berkeley waving flags, anyway).
Of course no awk will run an empty program, so the article's '1' or '"a"' or other truthy value is required.
sevensor
I’ve been appreciating awk more and more lately as a desktop calculator. There are things dc and bc can’t handle, that awk will cheerfully compete for me. Of all the tools I can expect to be present on a Linux machine, it gives me the easiest way to compute a logarithm in a shell pipeline.
mananaysiempre
Funny how exploiting uninitialized variables is evil in (nonstrict) JavaScript but good in Awk. I agree that’s true, mind you, I just can’t pinpoint what about the languages’ designs makes it work in one case but not the other.
show comments
ykonstant
This is overall a good guide, especially emphasizing the 'condition {action}' pattern; it is a very elegant and clear construct. However, some of the suggestions lean on the "too clever" and can make the code incomprehensible for the newcomer. For instance, if you will use snippets like `awk '!a[$0]++'`, do make sure to comment their use for your sake and others'.
I wanted to like awk and tried really hard but in the end was disappointed by what I see as unnecessary complications or limitations in the language. For example, it has first class support for regexes but only for matching. You can’t do ‘s/foo/bar’. I also found string manipulation to be cumbersome with the string functions. I would have expected a string processing language to have better primitives for this. And function arguments/variables are just a mess, it’s hard to understand how they came up with that design. It’s also quirky and unintuitive in some places you would not expect. Take the non-working example from the article:
awk -v FS=';' -v OFS=',' 1
I expect this to change the change the separator in the output. Period. The “efficiency” argument for why it doesn’t work just doesn’t cut it for me. First, it’s very simple to do a one time comparison of FS and OFS, if they are different then you know you know you _have_ to perform the change, because the user is asking you! If I do this in reality and it doesn’t work I just switch over to sed or perl and call it a day.
All in all, perl -eP is a better awk. And for data processing I switched to miller. It has it’s idiosyncrasies as well but it’s much better for working with structured records.
I love awk as a language / framework. If it got an uplift to make it useful for more complex problems it would be an absolute winner for lots of basic data processing tasks.
I’ve often written incantations many lines long and even broken out to actually writing awk scripts from time to time.
Once the penny drops with it it’s great fun but it’s absolutely useless once your problems get to any degree of sophistication.
I typically move into Python at this stage. Perl and Ruby are probably a more elegant fit here but those aren’t rows I want to have.
In this day and age, awk really needs CSV (RFC-4180) support and better semantic scoping and library support.
I’d also think it would be neat as an embedded language for various data processing platforms but if we haven’t seen it yet I doubt we will ever see it.
EDIT support for file formats beyond plain text would also be a winner.
Technically an empty awk program is an implementation of cat(1) (before it came back from berkeley waving flags, anyway).
Of course no awk will run an empty program, so the article's '1' or '"a"' or other truthy value is required.
I’ve been appreciating awk more and more lately as a desktop calculator. There are things dc and bc can’t handle, that awk will cheerfully compete for me. Of all the tools I can expect to be present on a Linux machine, it gives me the easiest way to compute a logarithm in a shell pipeline.
Funny how exploiting uninitialized variables is evil in (nonstrict) JavaScript but good in Awk. I agree that’s true, mind you, I just can’t pinpoint what about the languages’ designs makes it work in one case but not the other.
This is overall a good guide, especially emphasizing the 'condition {action}' pattern; it is a very elegant and clear construct. However, some of the suggestions lean on the "too clever" and can make the code incomprehensible for the newcomer. For instance, if you will use snippets like `awk '!a[$0]++'`, do make sure to comment their use for your sake and others'.
Seems like the link should be https://backreference.org/2010/02/10/idiomatic-awk/ instead of https://backreference.org/index.html
I wanted to like awk and tried really hard but in the end was disappointed by what I see as unnecessary complications or limitations in the language. For example, it has first class support for regexes but only for matching. You can’t do ‘s/foo/bar’. I also found string manipulation to be cumbersome with the string functions. I would have expected a string processing language to have better primitives for this. And function arguments/variables are just a mess, it’s hard to understand how they came up with that design. It’s also quirky and unintuitive in some places you would not expect. Take the non-working example from the article:
I expect this to change the change the separator in the output. Period. The “efficiency” argument for why it doesn’t work just doesn’t cut it for me. First, it’s very simple to do a one time comparison of FS and OFS, if they are different then you know you know you _have_ to perform the change, because the user is asking you! If I do this in reality and it doesn’t work I just switch over to sed or perl and call it a day.All in all, perl -eP is a better awk. And for data processing I switched to miller. It has it’s idiosyncrasies as well but it’s much better for working with structured records.