September 25, 2003

Quick Referal Check Script

Pipe Madness:

#!/usr/local/bin/bash

zcat $1 | cut -f11 -d" " | grep -v marginalia | grep -v search | sort | 
uniq -c |  sort -br +1n | more

For those unfamilar with shell scripting and pipes: | basically takes the output from one command and passes it to the other. So this script pares down a referer log with lines like:

67.217.145.34 - - [02/Sep/2003:00:17:05 -0400] 
"GET /log/archives/i_accidentally_typed_wwwsmokingguncom_into
.html HTTP/1.1" 200 5594 "http://search.yahoo.com/search?p=www.smokinggun.com&fr=ieas-dns&vm=i&n=20&fl=0&x=w
rt" "Mozilla/4.0 (compatible; MSIE 5.01; MSNIA; Windows 98; YComp 5.0.2.5)"

to a list of (mostly) non-search outside referers, ordered by number of referals.

It’s things like this that make me love the command line.

Update: and here’s something that makes me hate the command line - the above was actually incorrect; there was an extra more | that snuck in there. Fixed.

Posted by Bill Stilwell at September 25, 2003 05:32 PM
Comments

That's also why people hate the command line :)
awk is also an effictive tool for munging data, but one has to ask: where does it stop? Here's an interesting article: http://www.ercb.com/ddj/2000/ddj.0008.html

"This isn't just a personal opinion. Most of the scientists and engineers in the software engineering classes I teach at Los Alamos National Laboratory (LANL) have graduate degrees in theoretical physics, mechanical engineering, or similar disciplines. They're smart, and they're used to working hard. They would like to move their programs off specialized supercomputers and onto Linux-based clusters, but their efforts are being frustrated by the unnecessary complexity of configuring Bugzilla, deleting directories in CVS, quoting shell variables in make, and so on.

LANL has therefore set up a project called "Software Carpentry" (see http://www.software-carpentry.com/), the aim of which is to produce a new generation of tools that will be easier to use than those we have today. To make it easy to install these tools, write scripts for them, and make them talk to each other, the decision was made to use Python for all implementation work. (If there had been a simple, lightweight component system that ran on both Linux and Windows, it would probably have been chosen instead, but Linux implementations of COM have been hampered by patent issues, and CORBA is neither simple nor lightweight.) "

Posted by: Miles at September 28, 2003 10:38 AM

I dunno, I have to say I have little respect for people that think that they can make computers not hard for programmers. Is there unnecessary complexity? Sure, but you really think _shell variables_ are slowing people down that much?

Posted by: Bill Stilwell at September 30, 2003 12:07 AM