Filters and Regular Expressions

55 downloads 430 Views 75KB Size Report
Regular Expressions can be used for describing Text Patterns. • Example: ^g matches text lines starting with a lowercase “g” ... whole word. Herbert Martin Dietze . 45 ... sed stands for Stream Editor.
Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

Selecting Fields with ‘cut’

• The cut command uses one delimiter between two fields • A number of whitespaces may confuse it Example: Try to print only file size and name $ ls -l gnasl -rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl $ ls -l | cut -d ’ ’ -f 5,9 staff 12 $ _ The ‘awk’ Filter

• Strictly speaking, not just a filter but a programming language • Without knowing the language, it’s still useful for some tasks Example: Select fields from ls -l output with awk $ ls -l gnasl | awk ’{ print $5, $9 }’ 2894 gnasl $ ls -l gnasl | awk ’{ print $5, "\t", $9 }’ 2894 gnasl $ _

Herbert Martin Dietze

44

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

Regular Expressions

• Regular Expressions can be used for describing Text Patterns • Example: ^g matches text lines starting with a lowercase “g” • Dialects differ, depending on the tools used Basic Operators These are understood by most tools supporting regular expressions: \ [AaBbCc] [a-z] [^a-z] . *

^ $ \< and \>

activate or deactivate an operator, example: ¨\\¨ produces a backslash matches one character from the set {A, a, B, b, C, c} matches a range, here between “a” and “z” matches one character that is not within the range specified here matches one character (any) matches zero to infinity occurrances of the preceding expression, example: ¨ *¨ matches any number of space characters matches the beginning of the current line matches the current line’s end matches the beginning and the end of a word, example: ¨\¨ matches “Hugo” as a whole word

Herbert Martin Dietze

45

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

Example: ‘ls’ Output Display only symbolic links: $ ls -l | grep "^l" lrwxrwxrwx 1 hugo staff 17 Jul 26 2001 foo -> bar lrwxrwxrwx 1 hugo staff 17 Sep 13 2001 x -> ../y $_ Example: Log File Select only the entries from the 28th and 29th of March 2001 in the Apache log file. Here’s the format from which we want to get the information: $ tail -1 access_log myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html" $ _ This is the regular expression used for getting the entries: $ grep "2[89]/Mar/2001.*/.*\.html" access_log myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html" [...] myhost [29/Mar/2001:17:00:12 +0200] "GET /b.html" [...] $ _

Herbert Martin Dietze

46

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

The ‘sed’ Filter

• • • • •

sed stands for Stream Editor It can be used to manipulate text in a data stream Like grep, sed can use regular expressions We concentrate on the substitute command here More than one expression can be specified using “-e”

Example: Evaluate a configuration file $ cat config.conf # Configuration file set A b set B c $ grep -v "^ *#" config.conf | sed ’s/^set *//’ \ > | sed ’s/ */=/’ A=b B=c $ grep -v "^ *#" config.conf \ > | sed -e ’s/^set *//’ -e ’s/ */=/’ A=b B=c $ eval ‘grep -v "^ *#" config.conf \ > | sed -e ’s/^set *//’ -e ’s/ */=/’‘ $ echo $A b $ _ Herbert Martin Dietze

47

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

More ‘sed’

• The substitute command can take options: Ignore case: “i” and global replace: “g” (replace not only the first match) • They get appended to the expression: ’s/foo/bar/gi’ • What if the source or destination pattern contains slashes? • Escape the slashes with backslashes (can be difficult if the pattern is a variable’s content) or use a different separator, any character is allowed! Example: Remove double slashes in path specs $ echo /usr//local/bin:/home/herbert///data \ > | sed ’|//*|/|g’ /usr/local/bin:/home/herbert/data $ _

• We can also reference matches from the search pattern • \( and \) address a subpattern in the search field • \1 selects the first, \2 the second etc. in the replace field Example: $ echo "Hugo " \ > | sed "s/[^.*/\1 at \2/" hugo at hotmail.com $ _ Herbert Martin Dietze

48

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

Extended Regular Expressions

• Some tools understand more than just the basic operators • Such tools are e.g. perl and egrep • Other tools may support them: use “\” to activate! ? +

{ n} {n,m} {n,} text1|text2 (text)

matches none or one occurrance of the preceding pattern matches one to infinity occurrances of the preceding pattern matches exactly n occurrances of the preceding pattern matches n to m occurrances of the preceding pattern matches at least n occurrances of the preceding pattern matches text containing either text1 or text2 bundles “text” to a unit for repetition operators (“*”, “+” etc.), and it can now be selected by “\1”, “\2” etc.

Example: $ ls -l | egrep "hugo|harry" -rw-r--r-- 1 harry staff 1315 Feb 14 11:05 annab -rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl $ _ Herbert Martin Dietze

49

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

A better ‘sed’ using ‘perl’

• • • •

The perl interpreter can be used like sed Advantage: no escaping of extended syntax necessary! Also: perl can work on more than one line! Syntax: perl -pe ’s/source/destination/’

Example: $ echo "Hugo " \ > | sed "s/[^.*/\1 at \2/" hugo at hotmail.com $ echo "Hugo " | perl \ > -pe "s/[^.*/\1 at \2/" hugo at hotmail.com Longer Example: Generate HTML from Inline Comments The problem:

• It is always nice to keep module descriptions at one place • So why not generate HTML from the program sources? • Convention: – Extract only comments starting with double hash – Ignore other comments and program code – Add tags for special elements (function, type, variable, ...)

Herbert Martin Dietze

50

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

Example source: $ cat example.sh #!/bin/sh ############################################### # ## @function hugo ## print a friendly message to stdout. ## ## This function print a "hello world" to ## stdout. Quite nice. # ############################################### hugo () { echo "hello world" } # ## @function main program ## ## The main program calls hugo and exits. # hugo $ _

Herbert Martin Dietze

51

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

Step 1: Discard unwanted lines $ egrep "^ *##" example.sh | egrep -v "^ *###" ## @function hugo ## print a friendly message to stdout. ## ## This function print a "hello world" to ## stdout. Quite nice. ## @function main program ## ## The main program calls hugo and exits. $ _ Step 2: Add HTML-Tags and remove hashes $ egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe ’s/^ *## *$/

/; s/^ *## *//’ @function hugo print a friendly message to stdout.

This function print a "hello world" to stdout. Quite nice. @function main program

The main program calls hugo and exits. $ _

Herbert Martin Dietze

52

Client/Server TechnologyPart 1: The Unix Operating System — Filters & Regular Expressions

Step 3: Translate pseudo-tags $ egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe ’s/^ *## *$/

/; s/^ *## *//’ \ > -e ’; s|@function *(.*)|

Function \1

|’

Function hugo

print a friendly message to stdout.

This function print a "hello world" to stdout. Quite nice.

Function main program

The main program calls hugo and exits. $ _ Last Step: Make it a HTML-File $ ( echo "Program Documentation >

Program Documentation

" > egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe ’s/^ *## *$/

/; s/^ *## *//’ \ > -e ’; s|@function *(.*)|

Function \1

|’ > echo "") Program Documentation

Program Documentation

[...] $ _ Herbert Martin Dietze

53