<?xml version="1.0"?>

<items>


        <item>
                <title>
GAWK is good
                </title>
                <apropos id="163" author="stuff" dob="1187662670" />
                <link>http://menzies.us/p/?bin=163</link>
                <category>gawk</category>
                <description>
                        <![CDATA[
            

<p>GAWK is good.</p>

<p>The language is simple.</p>

<p>It's programs are short.</p>

<p>Teaching GAWK is fast. For example, 
GAWK can be quickly taught to data mining students
and there still be lost of time left over to explore
many mining method.</p>

<p>But aren't there better scripting languages? 
Faster? Well, maybe <a href="http://www.cs.columbia.edu/~sedwards/classes/2002/w4115/scripting.9up.pdf" title="Scripting languages (2002) by Stephen A. Edwards; Department of Computer Science; Columbia University;">maybe yes</a> and <a href="http://cm.bell-labs.com/cm/cs/who/bwk/interps/pap.html" title="Timing Trials, or, the Trials of Timing: Experiments with Scripting and User-Interface Languages  by Brian W. Kernighan and Christopher J. Van Wyk; ">maybe no</a>. </p>

<p>And GAWK is old (mid-70s). Aren't modern languages more productive? 
Well
again, maybe yes and maybe no. One measure of the productivity of a
language is how lines of code are required to code up one business level
`function point'. 
Compared to many
popular languages, GAWK scores very highly:</p>

<pre><code>loc/fp   language
------   --------

    6,   excel 5
   13,   sql
   21,   awk       &lt;================
   21,   perl
   21,   eiffel
   21,   clos
   21,   smalltalk
   29,   delphi
   29,   visual basic 5
   49,   ada 95
   49,   ai shells
   53,   c++
   53,   java
   64,   lisp
   71,   ada 83
   71,   fortran 95
   80,   3rd generation default
   91,   ansi cobol 85
   91,   pascal
  107,   2nd generation default
  107,   algol 68
  107,   cobol
  107,   fortran
  128,   c
  320,   1st generation default
  640,   machine language
 3200,   natural language
</code></pre>

<p>Anyway, there are other considerations. GAWK is real succinct, simple
enough to teach, and easy enough to recode in C (if you want raw speed).
For examples,  here's the complete listing of someone's AWK spell-checking program.</p>

<pre><code>BEGIN { 
    while (getline &lt; "Usr.Dict.Words") 
        dict[$0] = 1
}
{   if (!dict[$1]) print $1
}
</code></pre>

<p>Sure, there's about a gazillion enhancements you'd like to make on this one but you gotta say, this is real succinct.</p>

<p>For me, GAWK is like some Zen thing. If I don't know what I am doing,
the code gets dirty++. But when I get it, the GAWK code is clean (IMHO).</p>

<p>GAWK (and Prolog) are my tools in my private war against late execution
of software syndrome (a.k.a. LESS). The symptoms of LESS are a huge time
delay before a new idea is executable. In extreme cases, I can hack up in
days prototypes that it takes months to years to eternity for my students
to replicate in so-called better languages like C and JAVA and ...</p>

<p>Sure, I drool over the language features offered by more advanced
languages like pointers, generic iterators, continuations, etc etc. And
GAWK's lack of data structures (except num, string, and arrray) is a real
pest. So every year I take a break from GAWK and try the latest and greatest new language (Python, Ruby, etc etc).</p>

<p>But years of bitter experience have showed me that the cleverer I get, the
smaller my audience gets. If it is possible for me to explain something
succinctly in a simple language like GAWK, then it is also possible that
more folks will read my code.</p>
                        ]]>
                </description>
        </item>

   <item>
                <title>
GAWK for AI
                </title>
                <apropos id="164" author="stuff" dob="1187662714" />
                <link>http://menzies.us/p/?bin=164</link>
                <category>gawk</category>
                <description>
                        <![CDATA[

<p><em>R. Loui <a href="mailto:loui@ai.wustl.edu">loui@ai.wustl.edu</a>  is Associate Professor of Computer Science, at Washington University in St. Louis. He has published in AI Journal, Computational Intelligence, ACM SIGART, AI Magazine, AI and Law, the ACM Computing Surveys Symposium on AI, Cognitive Science, Minds and Machines, Journal of Philosophy.
He writes here about using GAWK to teach AI.</em></p>

<p>Most people are surprised when I tell them what language we use in
our undergraduate AI programming class. That's understandable. We
use GAWK. GAWK, Gnu's version of Aho, Weinberger, and Kernighan's old
pattern scanning language isn't even viewed as a programming language by
most people. Like PERL and TCL, most prefer to view it as a `scripting
language.' It has no objects; it is not functional; it does no built-in
logic programming. Their surprise turns to puzzlement when I confide
that (a) while the students are allowed to use any language they want;
(b) with a single exception, the best work consistently results from
those working in GAWK. (footnote: The exception was a PASCAL programmer
who is now an NSF graduate fellow getting a Ph.D. in mathematics at
Harvard.) Programmers in C, C++, and LISP haven't even been close (we
have not seen work in PROLOG or JAVA).</p>

<p>There are some quick answers that have to do with the pragmatics of
undergraduate programming. Then there are more instructive answers
that might be valuable to those who debate programming paradigms or to
those who study the history of AI languages. And there are some deep
philosophical answers that expose the nature of reasoning and symbolic
AI. I think the answers, especially the last ones, can be even more
surprising than the observed effectiveness of GAWK for AI.</p>

<p>First it must be confessed that PERL programmers can cobble together
AI projects well, too. Most of GAWK's attractiveness is reproduced
in PERL, and the success of PERL forebodes some of the success of
GAWK. Both are powerful string-processing languages that allow the
programmer to exploit many of the features of a UNIX environment. Both
provide powerful constructions for manipulating a wide variety of data
in reasonably efficient ways. Both are interpreted, which can reduce
development time. Both have short learning curves. The GAWK manual can
be consumed in a single lab session and the language can be mastered by
the next morning by the average student. GAWK's automatic initialization,
implicit coercion, I/O support and lack of pointers forgive many of the
mistakes that young programmers are likely to make. Those who have seen
C but not mastered it are happy to see that GAWK retains some of the
same sensibilities while adding what must be regarded as spoonsful of
syntactic sugar. Some will argue that PERL has superior functionality,
but for quick AI applications, the additional functionality is rarely
missed. In fact, PERL's terse syntax is not friendly when regular
expressions begin to proliferate and strings contain fragments of HTML,
WWW addresses, or shell commands. PERL provides new ways of doing things,
but not necessarily ways of doing new things.</p>

<p>In the end, despite minor difference, both PERL and GAWK minimize
programmer time. Neither really provides the programmer the setting in
which to worry about minimizing run-time.</p>

<p>There are further simple answers. Probably the best is the fact that
increasingly, undergraduate AI programming is involving the Web. Oren
Etzioni (University of Washington, Seattle) has for a while been arguing
that the "softbot" is replacing the mechanical engineers' robot as the
most glamorous AI testbed. If the artifact whose behavior needs to be
controlled in an intelligent way is the software agent, then a language
that is well-suited to controlling the software environment is the
appropriate language. That would imply a scripting language. If the robot
is KAREL, then the right language is <code>turn left; turn right.</code> If the
robot is Netscape, then the right language is something that can generate
<code>netscape -remote 'openURL(http://cs.wustl.edu/~loui)</code> with elan.</p>

<p>Of course, there are deeper answers. Jon Bentley found two pearls in
GAWK: its regular expressions and its associative arrays. GAWK asks
the programmer to use the file system for data organization and the
operating system for debugging tools and subroutine libraries. There is
no issue of user-interface. This forces the programmer to return to the
question of what the program does, not how it looks. There is no time
spent programming a binsort when the data can be shipped to /bin/sort in
no time. (footnote: I am reminded of my IBM colleague Ben Grosof's advice
for Palo Alto: Don't worry about whether it's highway 101 or 280. Don't
worry if you have to head south for an entrance to go north. Just get
on the highway as quickly as possible.)</p>

<p>There are some similarities between GAWK and LISP that are
illuminating. Both provided a powerful uniform data structure
(the associative array implemented as a hash table for GAWK and the
S-expression, or list of lists, for LISP). Both were well-supported
in their environments (GAWK being a child of UNIX, and LISP being the
heart of lisp machines). Both have trivial syntax and find their power
in the programmer's willingness to use the simple blocks to build a
complex approach.</p>

<p>Deeper still, is the nature of AI programming. AI is about functionality
and exploratory programming. It is about bottom-up design and the building
of ambitions as greater behaviors can be demonstrated. Woe be to the
top-down AI programmer who finds that the bottom-level refinements, `this
subroutine parses the sentence,' cannot actually be implemented. Woe
be to the programmer who perfects the data structures for that heapsort
when the whole approach to the high-level problem needs to be rethought,
and the code is sent to the junkheap the next day.</p>

<p>AI programming requires high-level thinking. There have always been a
few gifted programmers who can write high-level programs in assembly
language. Most however need the ambient abstraction to have a higher
floor.</p>

<p>Now for the surprising philosophical answers. First, AI has discovered
that brute-force combinatorics, as an approach to generating intelligent
behavior, does not often provide the solution. Chess, neural nets, and
genetic programming show the limits of brute computation. The alternative
is clever program organization. (footnote: One might add that the former
are the AI approaches that work, but that is easily dismissed: those are
the AI approaches that work in general, precisely because cleverness is
problem-specific.) So AI programmers always want to maximize the content
of their program, not optimize the efficiency of an approach. They want
minds, not insects. Instead of enumerating large search spaces, they
define ways of reducing search, ways of bringing different knowledge
to the task. A language that maximizes what the programmer can attempt
rather than one that provides tremendous control over how to attempt it,
will be the AI choice in the end.</p>
 
<p>Second, inference is merely the expansion of notation. No matter whether
the logic that underlies an AI program is fuzzy, probabilistic, deontic,
defeasible, or deductive, the logic merely defines how strings can
be transformed into other strings. A language that provides the best
support for string processing in the end provides the best support
for logic, for the exploration of various logics, and for most forms of
symbolic processing that AI might choose to call <code>reasoning'' instead of

</code>logic.'' The implication is that PROLOG, which saves the AI programmer
from having to write a unifier, saves perhaps two dozen lines of GAWK
code at the expense of strongly biasing the logic and representational
expressiveness of any approach.</p>

<p>I view these last two points as news not only to the programming language community, but also to much of the AI community that has not reflected on the past decade's lessons.</p>

<p>In the puny language, GAWK, which Aho, Weinberger, and Kernighan thought
not much more important than grep or sed, I find lessons in AI's trends,
AI's history, and the foundations of AI. What I have found not only
surprising but also hopeful, is that when I have approached the AI people
who still enjoy programming, some of them are not the least bit surprised.</p>
                        ]]>
                </description>
        </item>


    <item>
                <title>
Key GAWK Concepts

                </title>
                <apropos id="166" author="stuff" dob="1187662753" />
                <link>http://menzies.us/p/?bin=166</link>
                <category>gawk</category>
                <description>
                        <![CDATA[


<p>Imagine GAWK as a kind of a cut-down C language with four tricks: self-initializing variables, pattern-based programming, regular expressions, and associative arrays.</p>

<h2>Self-initializing variables.</h2>

<p>You don't need to define variables- they appear as your use them.</p>

<p>There are only three types: stings, numbers, and arrays.</p>

<p>To ensure a number is a number, add zero to it.</p>

<pre><code>x=x+0
</code></pre>

<p>To ensure a string is a string, add an empty string to it.</p>

<pre><code>x= x "" "the string you really want to add"
</code></pre>

<p>To ensure your variables aren't global, use them within a function and add more variables to the call. For example if a function is passed two variables, define it with two PLUS the local variables:</p>

<pre><code> function haslocals(passed1,passed2,         local1,local2,local3) {
        passed1=passes1+1  # changes externally
        local1=7           # only changed locally
 }
</code></pre>

<p>Note that its good practice to add white space between passed and local variables.</p>

<h2>Pattern-based programming</h2>

<p>GAWK programs can contain functions AND pattern/action pairs.</p>

<p>If the pattern is satisfied, the action is called.</p>

<pre><code> /^\.P1/ { if (p != 0) print ".P1 after .P1, line", NR;
           p = 1;
         }
 /^\.P2/ { if (p != 1) print ".P2 with no preceding .P1, line", NR;
           p = 0;
         }
 END     { if (p != 0) print "missing .P2 at end" }
</code></pre>

<p>Two magic patterns are BEGIN and END. These are true before and after all the input files are read. Use END of end actions (e.g. final reports) and BEGIN for start up actions such as inializing default variables, setting the field seperator, resetting the seed of the random number generator:</p>

<pre><code> BEGIN {
        while (getline &lt; "Usr.Dict.Words") #slurp in dictionary 
                dict[$0] = 1
        FS=",";                            #set field seperator
        srand();                           #reset random seed
        Round=10;                          #always start globals with U.C.
 }
</code></pre>

<p>The default action is {print $0}; i.e. print the whole line.</p>

<p>The default pattern is <code>1</code>; i.e. true.</p>

<p>Patterns are checked, top to bottom, in source-code order.</p>

<p>Patterns can contain regular expressions. In the above example <code>/^\.P1/</code> means "front of line followed by a full stop followed by P1". 
Regular expressions are important enough for their own section.</p>

<h2>Regular Expressions</h2>

<p>Do you know what these mean?</p>

<ul>
<li>/^[ \t\n]*/</li>
<li>/[ \t\n]*$/</li>

<li>/^[+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)([eE][+-]?[0-9]+)?$/</li>
</ul>

<p>Well, the first two are leading and trailing blank spaces on a line and the last one is the definition of an IEEE-standard number written as a regular expression. Once we know that, we can do a bunch of common tasks like trimming away white space around a string:</p>

<pre><code>  function trim(s,     t) {
    t=s;
    sub(/^[ \t\n]*/,"",t);
    sub(/[ \t\n]*$/,"",t);
    return t
 }
</code></pre>

<p>or recognize something that isn't a number:</p>

<pre><code>if ( $i !~ /^[+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)([eE][+-]?[0-9]+)?$/ ) 
    {print "ERROR: " $i " not a number}
</code></pre>

<p>Regular expressions are an astonishingly useful tool supported
by many languages (e.g. Awk, Perl, Python, Java). The
following notes review the basics. For full details, see
<a href="http://www.gnu.org/manual/gawk-3.1.1/html_node/Regexp.html#Regexp">http://www.gnu.org/manual/gawk-3.1.1/html_node/Regexp.html#Regexp</a>.</p>

<p>Syntax: Here's the basic building blocks of regular expressions:</p>

<p><strong>c</strong> <br />
matches the character c (assuming c is a character with no special meaning in regexps).</p>

<p><strong>\c</strong> <br />

matches the literal character c; e.g. tabs and newlines are \t and \n respectively.</p>

<p><strong>.</strong> <br />
matches any character except newline.</p>

<p><strong>^</strong> <br />
matches the beginning of a line or a string.</p>

<p><strong>$</strong> <br />

matches the end of a line or a string.</p>

<p><strong>[abc...]</strong> <br />
matches any of the characters abc... (character class).</p>

<p><strong>[^abc...]</strong> <br />
matches any character except abc... and newline (negated character class).</p>

<p><strong>r*</strong> <br />

matches zero or more r's.</p>

<p>And that's enough to understand our trim function shown above. The regular expression <strong>/[ \t]*$/</strong> means trailing whitespace; i.e. zero-or-more spaces or tabs followed by the end of line.</p>

<h2>More Syntax:</h2>

<p>But that's only the start of regular expressions. There's lots more. For example:</p>

<p><strong>r+</strong> <br />

matches one or more r's.</p>

<p><strong>r?</strong> <br />
matches zero or one r's.</p>

<p><strong>r1|r2</strong> <br />
matches either r1 or r2 (alternation).</p>

<p><strong>r1r2</strong> <br />

matches r1, and then r2 (concatenation).</p>

<p><strong>(r)</strong> <br />
matches r (grouping).</p>

<p>Now we can read <strong>^[+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)([eE][+-]?[0-9]+)?$</strong> like this:</p>

<p><strong>^[+-]? ...</strong> <br />

Numbers begin with zero or one plus or minus signs.</p>

<p><strong>...[0-9]+...</strong> <br />
Simple numbers are just one or more numbers.</p>

<p><strong>...[.]?[0-9]*...</strong> <br />
which may be followed by a decimal point and zero or more digits.</p>

<p><strong>...|[.][0-9]+...</strong> <br />

Alternatively, a number can have zero leading numbers and just start with a decimal point.</p>

<p><strong>.... ([eE]...)?$</strong> <br />
Also, there may be an exponent added</p>

<p><strong>...[+-]?[0-9]+)?$</strong> <br />
and that exponent is a positive or negative bunch of digits.</p>

<h2>Associative arrays</h2>

<p>GAWK has arrays, but they are only indexed by strings. This can be very useful, but it can also be annoying. For example, we can count the frequency of words in a document (ignoring the icky part about printing them out):</p>

<pre><code>gawk '{for(i=1;i &lt;=NF;i++) freq[$i]++ }' filename
</code></pre>

<p>The array will hold an integer value for each word that occurred in
the file. Unfortunately, this treats <code>foo</code> and <code>Foo</code>, and
<code>foo</code> as different words. </p.<p>Oh well. How do we print out these
frequencies? GAWK has a special</code>for</code> construct that loops over
the values in an array. This script is longer than most command lines,
so it will be expressed as an executable script:</p>

<pre><code> #!/usr/bin/awk -f
  {for(i=1;i &lt;=NF;i++) freq[$i]++ }
  END{for(word in freq) print word, freq[word]  }

</code></pre>

<p>You can find out if an element exists in an array at a certain index with the expression:</p>

<pre><code>index in array
</code></pre>

<p>This expression tests whether or not the particular index exists,
    withoutthe side effect of creating that element if it is not present.</p>

<p>You can remove an individual element of an array using the delete statement:</p>

<pre><code>delete array[index]
</code></pre>

<p>It is not an error to delete an element which does not exist.</p>

<p>GAWK has a special kind of for statement for scanning an array:</p>

<pre><code> for (var in array)
        body
</code></pre>

<p>This loop executes body once for each different value that your program has previously used as an index in array, with the variable var set to that index.</p>

<p>There order in which the array is scanned is not defined.</p>

<p>To scan an array in some numeric order, you need to use keys 1,2,3,... and store somewhere that the array is N long. Then you can do the Here are some useful array functions. We begin with the usual stack stuff. These stacks have items 1,2,3,.... and position 0 is reserved for the size of the stack</p>

<pre><code> function top(a)        {return a[a[0]]}
 function push(a,x,  i) {i=++a[0]; a[i]=x; return i}
 function pop(a,   x,i) {
   i=a[0]--;  
   if (!i) {return ""} else {x=a[i]; delete a[i]; return x}}
</code></pre>

<p>The pop function can be used in the usual way:</p>

<pre><code> BEGIN {push(a,1); push(a,2); push(a,3);
        while(x=pop(a)) print x
 3
 2
 1
</code></pre>

<p>We can catch everything in an array to a string:</p>

<pre><code> function a2s(a,  i,s) {
        s=""; 
        for (i in a) {s=s " " i "= [" a[i]"]\n"}; 
        return s}

  BEGIN {push(L,1); push(L,2); push(L,3);
        print a2s(L);}
  0= [3]
  1= [1]
  2= [2]
  3= [3]

</code></pre>

<p>And we can go the other way and convert a string into an array using the builtin split function. These pod files were built using a recursive include function that seeks patterns of the form:</p>

<p><em>^=include file</em></p>

<p>This function splits likes on space characters into the array `a' then looks for =include in a[1]. If found, it calls itself recursively on a[2]. Otherwise, it just prints the line:</p>

<pre><code> function rinclude (line,    x,a) {
   split(line,a,/ /);
   if ( a[1] ~ /^\=include/ ) { 
     while ( ( getline x &lt; a[2] ) &gt; 0) rinclude(x);
     close(a[2])}
   else {print line}
 }

</code></pre>

<p>Note that the third argument of the split function can be any regular expression.</p>

<p>By the way, here's a nice trick with arrays. To print the lines in a files in a random order:</p>

<pre><code> BEGIN {srand()}
       {Array[rand()]=$0}
 END   {for(I in Array) print Array[I]}
</code></pre>

<p>Short, heh? This is not a perfect solution. GAWK can only generate
1,000,000 different random numbers so the <a href="http://burtleburtle.net/bob/hash/birthday.html">birthday theorem</a> cautions
that there is a small chance that the lines will be lost when different
lines are written to the same randomly selected location. After some
experiments, I can report that you lose around one item after 1,000
inserts and 10 to 12 items after 10,000 random inserts. Nothing to write
home about really. But for larger item sets, the above three liner is not
what you want to use. For example 10,000 to 12,000 items (more than 10%)
are lost after 100,000 random inserts. Not good!</p>
                        ]]>
                </description>
        </item>



        <item>
                <title>
Running GAWK
                </title>
                <apropos id="165" author="stuff" dob="1187662715" />
                <link>http://menzies.us/p/?bin=165</link>
                <category>gawk</category>
                <description>
                        <![CDATA[

There's four standard ways to run GAWK source code:
<ol>
<li> With a GAWK interpreter, using <tt>-f</tt>;

<li> In a "she-bang"
<li> As part of another script;
<li> With the GAWK debugging options.
</ol>

<h2>gawk -f</h2>

The most common way  is to write the source in a file <tt>x.awk</tt>, and they ask the GAWK interpreter to  run it.
E.g.
<pre>
gawk -f x.awk -f y.awk -f z.awk InputFile
</pre>

<p>Note that multiple files can be run using multiple <tt>-t</tt> flags.</p>

<h2>"She-bang"</h2>

<p>Another way, which involves less typing on the command line, is to include all you GAWK in one file called, say,
<tt>all</tt>, then  add a "she-bang" to  the first line; e.g.</p>

<pre>
#!/usr/bin/gawk -f
# /* vim: set filetype=awk : */ -*- awk -*- 

.. rest of the awk code

</pre>

<p>Line one of this file tells the operating system to run this script using the interpreter <tt>/usr/bin/gawk</tt>.
Note that if you move this code to another machine then the first line must be changed to point to the GAWK
interpreter on that machine.</p>

<p>Line two of this file is optional and contains
some editor-specific commands that tell VIM and EMACS to highlight this code
as if it was GAWK syntax.</p>

<p>Once such a <tt>all</tt> file is made executable (with <tt>chmod +x all</tt>) then it can be run on the
command line like any other GAWK script:</p>

<pre>
./all InputFile
</pre>

<h2>As part of another script</h2>

<p>It is standard to use GAWK scripts as workers in some other scripting language. For example, a Unix
BASH
script could be:</p>

<pre>
#!/bin/bash
# /* vim: set filetype=sh : */ -*- sh -*- 
gawk -f x.awk -f y.awk -f z.awk Pass=1 $1 Pass=2 $1  </pre>
(Note that many scripting languages like GAWK and BASH
support she-bang and the editor commands on lines one and two.)

This script runs some data file through GAWK in two passes (perhaps pass one collects some statistics
and pass two fills in missing values with the mean values).

<h2> With the GAWK debugging options</h2>

It is very useful to add the following line to your <tt>$HOME/.bashrc</tt> file:
<pre>
export Audit="pgawk --profile=$HOME/tmp/awkprof.out 
                    --dump-variables=$HOME/tmp/awkvars.out 
                    --lint "
</pre>

<p>Then, if you run GAWK programs as follows, you will get a lot of debugging information about your
GAWK program:</p>

<pre>
$Audit -f x.awk -f y.awk -f z.awk InputFile
</pre>

<p>Specifically, the file <tt>$HOME/tmp/awkprof.out</tt>

will show how many times each line of the program was run while it processed <tt>InputFile</tt>. This can be
used to:</p>

<ul>
<li>Find bad conditionals. If a condition is run many times, but the condition body is never run, then
the condition is never satisfied.
<li>Find dead code. If a line of code is never used, then maybe it is unnecessary.
<li>Optimize your code. If some function gets called much more than the rest of  the code, then that
is a function that might need optimization.
</ul>

<p>Also,the file <tt>$HOME/tmp/awkvars.out</tt> will list all the global variables in your GAWK code.
I read <tt>awkvars.out</tt> looking for bad globals; i.e. variables that I forgot to declare as local
and so become globals. I've lost weeks of my life debugging functions that are failing because of bad globals.
From bitter experience, I've learned to:</p>

<ul>
<li>Name all my locals using lower case;
<li>Name all my globals in MixedCase
<li>Run <tt>$Audit</tt> frequently, looking for lower case globals in <tt>awkvars.out</tt>. Such globals
are really local variables that have escaped from a function and should be declared local in their home
function.
</ul>

<p>Finally, running <tt>$Audit</tt> generates pages of lint warnings, most of which can be ignored. However,
some deserve your attention such as function called but never defined.</p>

                        ]]>
                </description>
        </item>



        <item>
                <title>
Alternatives to GAWK
                </title>
                <apropos id="167" author="stuff" dob="1187662753" />
                <link>http://menzies.us/p/?bin=167</link>
                <category>gawk</category>
                <description>
                        <![CDATA[



<p>Nice lecture notes comparing different scripting languages: 
<a href="http://www.cs.utk.edu/~plank/plank/classes/cs494/notes.html">http://www.cs.utk.edu/~plank/plank/classes/cs494/notes.html</a></p>

<p>A shoot-em-up between N languages, including GAWK: 
<a href="http://www.bagley.org/~doug/shootout/craps.shtml">http://www.bagley.org/~doug/shootout/craps.shtml</a></p>

<p>GAWK has some advantages over other scripting language like (e.g.) Perl:</p>

<ul>
<li>GAWK is simpler (especially important if deciding which to learn first). </li>

<li>GAWK syntax is far more regular (another advantage for the beginner, even without considering syntax-highlighting editors)</li>
<li>you may already know GAWK well enough for the task at hand</li>
<li>you may have only GAWK installed</li>
<li>GAWK can be smaller, thus much quicker to execute for small programs</li>
<li>GAWK variables don't have <code>$</code> in front of them :-)</li>
<li>Clear perl code is better than unclear GAWK code; but NOTHING comes close to unclear perl code</li>
<li>Tom Christiansen may have said it best: <em>GAWK is a venerable, powerful, elegant, and simple tool that everyone should know. Perl is a superset and child of AWK, but has much more power that comes at expense of sacrificing some of that simplicity.</em></li>

</ul>

<h2>Some Gawk vs PERL Samples</h2>

<p>Here are a few short programs that do the same thing in each language. When reading these examples, the question to ask is `how many language features do I need to understand in order to understand the syntax of these examples'.</p>

<p>Some of these are longer than they need to be since they don't exploit some (e.g.) command line trick to wrap the code in <code>for each line do X</code>. And that is the point- for teachability, the preferred language is the one you need to know LESS about before you can be useful in it.</p>

<h3>hello world</h3>

<p>PERL:</p>

<pre><code> print "hello world\n"
</code></pre>

<p>GAWK:</p>

<pre><code> BEGIN { print "hello world" }
</code></pre>

<h3>One plus one</h3>

<p>PERL</p>

<pre><code> $x= $x+1;
</code></pre>

<p>GAWK</p>

<pre><code> x= x+1
</code></pre>

<h3>Printing</h3>

<p>PERL</p>

<pre><code> print $x, $y, $z;
</code></pre>

<p>GAWK</p>

<pre><code> print x,y,z
</code></pre>

<h3>Printing the first field in a file</h3>

<p>PERL</p>

<pre><code> while (&lt;&gt;) { 
   split(/ /);
   print "@_[0]\n" 
 }
</code></pre>

<p>GAWK</p>

<pre><code> { print $1 }
</code></pre>

<h3>Printing lines, reversing fields</h3>

<p>PERL</p>

<pre><code> while (&lt;&gt;) { 
  split(/ /);
  print "@_[1] @_[0]\n" 
 }
</code></pre>

<p>GAWK</p>

<pre><code> { print $2, $1 }
</code></pre>

<h3>Concatenation of variables</h3>

<p>PERL</p>

<pre><code> command = "cat $fname1 $fname2 &gt; $fname3"
</code></pre>

<p>GAWK</p>

<pre><code> command = "cat " fname1 " " fname2 " &gt; " fname3
</code></pre>

<h3>Looping</h3>

<p>PERL:</p>

<pre><code> for (1..10) { print $_,"\n" }
</code></pre>

<p>GAWK:</p>

<pre><code> BEGIN { 
  for (i=1; i&lt;=10; i++) print i
 }
</code></pre>

<h3>Pairs of numbers</h3>

<p>PERL:</p>

<pre><code> for (1..10) { print "$_ ",$_-1 }
 print "\n"
</code></pre>

<p>GAWK:</p>

<pre><code> BEGIN { 
  for (i=1; i&lt;=10; i++) printf i " " i-1
  print ""
 }
</code></pre>

<h3>List of words into a hash</h3>

<p>PERL</p>

<pre><code>  foreach $x ( split(/ /,"this is not stored linearly") ) 
  { print "$x\n" }
</code></pre>

<p>GAWK</p>

<pre><code> BEGIN { 
  split("this is not stored linearly",temp)
  for (i in temp) print temp[i]
 }
</code></pre>

<h3>Printing a hash in some key order</h3>


<p>PERL</p>

<pre><code> $n = split(/ /,"this is not stored linearly");
 for $i (0..$n-1) { print "$i @_[$i]\n" }
 print "\n";
 for $i (@_) { print ++$j," ",$i,"\n" }
</code></pre>

<p>AWK</p>

<pre><code> BEGIN { 
  n = split("this is not stored linearly",temp)
  for (i=1; i&lt;=n; i++) print i, temp[i]
  print ""
  for (i in temp) print i, temp[i]
 }
</code></pre>

<h3>Printing all lines in a file</h3>

<p>PERL</p>

<pre><code> open file,"/etc/passwd";
 while (&lt;file&gt;) { print $_ }
</code></pre>

<p>GAWK</p>

<pre><code>BEGIN { 
      while (getline &lt; "/etc/passwd") print
 }

</code></pre>

<h3>Printing a string</h3>

<p>PERL</p>

<pre><code> $x = "this " . "that " . "\n";
 print $x
</code></pre>

<p>GAWK</p>

<pre><code> BEGIN {
  x = "this " "that " "\n" ; printf x
 }

</code></pre>

<h3>Building and printing an arrray</h3>

<p>PERL</p>

<pre><code> $assoc{"this"} = 4;
 $assoc{"that"} = 4;
 $assoc{"the other thing"} = 15;
 for $i (keys %assoc) { print "$i $assoc{$i}\n" }
</code></pre>

<p>GAWK</p>

<pre><code> BEGIN {
   assoc["this"] = 4
   assoc["that"] = 4
   assoc["the other thing"] = 15
   for (i in assoc) print i,assoc[i]
 }

</code></pre>

<h3>Sorting an array</h3>

<p>PERL</p>

<pre><code> split(/ /,"this will be sorted once in an array");
 foreach $i (sort @_) { print "$i\n" }
</code></pre>

<p>GAWK</p>

<pre><code> BEGIN {
  split("this will be sorted once in an array",temp," ")
  for (i in temp) print temp[i] | "sort"
  while ("sort" | getline) print
 }

</code></pre>

<h3>Sorting an array (#2)</h3>

<p>GAWK</p>

<pre><code> BEGIN {
  split("this will be sorted once in an array",temp," ")
  n=asort(temp)
  for (i=1;i&lt;=n;i++) print temp[i] 
 }
</code></pre>

<h3>Print all lines, vowels changed to stars</h3>

<p>PERL</p>

<pre><code> while (&lt;STDIN&gt;) {
  s/[aeiou]/*/g;
  print $_
 }
</code></pre>

<p>GAWK</p>

<pre><code> {gsub(/[aeiou]/,"*"); print }
</code></pre>

<h3>Report from file</h3>

<p>PERL</p>

<pre><code> #!/pkg/gnu/bin/perl
 # this is a comment
 #
 open(stream1,"w | ");
 while ($line = &lt;stream1&gt;) {
   ($user, $tty, $login, $junk) = split(/ +/, $line, 4);
   print "$user $login ",substr($line,49)
 }
</code></pre>

<p>GAWK</p>

<pre><code> #!/pkg/gnu/bin/gawk -f
 # this is a comment
 #
 BEGIN {
   while ("w" | getline) {
     user = $1; tty = $2; login = $3
     print user, login, substr($0,49)
   }
 }
</code></pre>

<h3>Web Slurping</h3>

<p>PERL</p>

<pre><code> open(stream1,"lynx -dump 'cs.wustl.edu/~loui' | ");
 while ($line = &lt;stream1&gt;) {
   if ($flag &amp;&amp; $line =~ /[0-9]/) { print $line }
   if ($line =~ /References/) { $flag = 1 }
 }
</code></pre>

<p>GAWK</p>

<pre><code> BEGIN {
  com = "lynx -dump 'cs.wustl.edu/~loui' &amp;&gt; /dev/stdout"
  while (com | getline line) {
    if (flag &amp;&amp; line ~ /[0-9]/) { print line }
    if (line ~ /References/) { flag = 1 }
  }
 }

</code></pre>
                        ]]>
                </description>
        </item>
   <item>
                <title>
Teaching GAWK

                </title>
                <apropos id="169" author="stuff" dob="1187663945" />
                <link>http://menzies.us/p/?bin=169</link>
                <category>gawk</category>
                <description>
                        <![CDATA[


<p>Whenever Ronald Loui teaches GAWK, he gives the students the choice of learning PERL instead. Ninety percent will choose GAWK after looking at a few simple examples of each language (samples shown below). Those who choose PERL do so because someone told them to learn PERL.</p>

<p>After one laboratory, more than half of the GAWK students are confident with their GAWK skills and can begin designing. Almost no student can become confident in PERL that quickly.</p>

<p>After a week, 90% of those who have attempted GAWK have mastered it, compared to fewer than 50% of PERL students attaining similar facility with the language (it would be unfair to require one to `master' PERL).</p>

<p>By the end of the semester, over 90% who have attempted GAWK have succeeded, and about two-thirds of those who have attempted PERL have succeeded.</p>

<p>To be fair, within a year, half of the GAWK programmers have also studied PERL. Most are doing so in order to read PERL and will not switch to writing PERL. No one who learns PERL migrates to GAWK.</p>

<p>PERL and GAWK appear to have similar programming, development, and debugging cycle times.</p>

<p>Finally, there seems to be a small advantage for GAWK over PERL, after a year, for the programmers willingness to begin a new program. That is, both GAWK and PERL programmers tend to enjoy writing a lot of programs, but GAWK has the slight edge here. </p>

                        ]]>
                </description>
        </item>


</items>
