Script Archive
It is sometimes necessary to remove comments (preceded by #), since this is not
universally legal syntax.
I tried to classify these scripts according to their cleverness and interest when learning how
to use sed
. Clever ones are not always faster; sometimes, more complicated
techniques are also slower. Sometimes, longer and more complex scripts are mostly boilerplate,
thus less interesting for the sed
`student' and rated with fewer stars; readability also
and documentation influenced the rating.
(*) = very basic, little to learn
(**) = does some simple processing
(***) = shows some nice techniques
(****) = shows advanced techniques, such as lookup tables
(*****) = that's extreme sed
!
Filename manipulation
- Lowercase filenames (filter) (***)
-
Uppercase filenames (filter) (***)
-
Lowercase/uppercase list of filenames supplied from STDIN. Makes a list of
mv commands.
Example:
find /mnt/zeus/docs | tolower.sed | sh -x
- Lowercase filenames (application)
(***)
- Uppercase filenames (application)
(***)
-
Lowercase/uppercase list of filenames supplied as command line arguments. Again, makes a
list of
mv commands. This version operates on files in the
current
directory only.
Example:
down *.HTM *.INC *.sed
- Print basename of files (**)
-
Remove the directory prefix from a file path, and print remaining element. Like Unix
basename, but reads data from a file or stdin. Could easily be adapted for
DOS conventions.
- Print path of files (**)
-
Remove the filename from a file path, and print remaining elements. Like Unix
dirname, but reads data from a file or stdin. Easily adapted to DOS
conventions.
File conversion
- Convert DOS files for UNIX and vice versa
(*)
-
Changes DOS end-of-lines to UNIX end-of-lines (to be ran under UNIX). Provided in a
single gzipped tar file to avoid that the server screws up the control characters.
- Split digest (**)
-
Recreates original email messages from a list digest. The author says this should work
`at least for digests generated by Majordomo and #listserv, and FAQs using minimal
digest format.
'
- rot13 (*)
-
The simplest symmetric cypher in the world...
- TeX to XML converter (*****)
-
Changes TeX-like tags (
abc{...}) to XML-like tags
(
<abc>...</abc>). An interesting proof of concept script by
Tilmann Bitterberg, supporting nested tags and much more.
- Expand quoted strings (*****)
-
This script takes a complex configuration file format (supporting almost every quoting
style in the Bourne shell) and encodes each value that the script defines with
"dangerous" characters properly escaped; full documentation is contained in the download.
This script by
Nathan D. Ryan shows how to
do complex conversions with
sed
.
HTML utilities
- Text -> HTML
(*)
-
Converts preformatted text to
HTML ready for viewing.
- Insert boldface/italic tags
(***)
-
Takes input files with two different "toggle switches" such as the _underscore_ and
*asterisk*, and convert them into something like
<i>italic</i>
and
<b>boldface</b> in the output. A nice exercise would be to
merge this with
untroff.sed and obtain a nice
troff-to-
HTML sed script.
- ISO8859-1 -> HTML
(*)
-
Convert ISO Latin 1 characters (eg: é, £, ¥, ½) to their
equivalent
HTML character entitities.
- HTML -> ISO8859-1
(*)
-
Convert
HTML character entities to their ISO Latin 1 equivalent.
- Lowercase HTML tags
(****)
- Uppercase HTML tags
(****)
-
Change case of
HTML tags, preserving attributes.
- Index HTML links
(****)
-
This script, by
Tilman Bitterberg, adds an
index of links to an
HTML file: similar to `
lynx -dump
', but
preserving the
HTML tags in the file.
- Strip HTML comments
(**)
-
Remove all commented material from
HTML
- Extract URLs from HTML
(***)
-
Print all URLs (even commented ones) and associated ALT comments found in an
HTML file, formatted as:
URL|comment.
- Extract title from HTML
(***)
-
Print the
TITLE (or the first
H[0-7] heading located) of an
HTML document.
Text formatting
- Capitalise words 1/5 (**)
- Capitalises the first letter of each word.
- Capitalise words 2/5 (**)
- A first approach to doing it faster.
- Capitalise words 3/5 (***)
- A cleaner implementation of the idea in cflword2.sed
- Capitalise words 4/5 (***)
- This gets weirdo!
- Capitalise words 5/5 (****)
-
- Formats text lines (***)
-
Formats text so that each line is shorter than 40 characters.
- Expand tabs to spaces (****)
-
Another masterpiece by
Greg Ubben. The link above works with all sed implementations,
while
this version only works with GNU sed
3.02.80 or
ssed, but is more readable because it does
not contain control characters.
- Reverse text (**)
- Reverses the order of characters on each line of input.
- Reverse text (**)
-
A faster version.
- Reverse file (***)
-
Reverses the line order of a file, subject to the size of the hold buffer.
- Join lines (*)
-
Joins all input on a single line.
- Un-double-space lines (*)
-
Change double-spaced lines to single-spaced.
- Centre lines 1/2 (**)
- Centres lines for an 80-column device. Easily adapted to different widths.
- Centre lines 2/2 (*)
-
A different and more CPU-intensive approach.
- Squeeze blank lines (***)
-
Replace consecutive blank lines with one line, so that at most one empty line separates
two non-empty lines. Emulates cat -s
.
Beautifiers
- Intel assembler -> UNIX assembler
(**)
-
Converts Intel 386 assembly (nasm
) code to Unix 386 assembly (gas
) code.
- Strip C comments (1/4) (**)
- This one is the first script in a series of scripts that do the same task in more and
more sophisticated ways. This handles multiline comments, but not multiple comments in a
line
- Strip C comments (2/4) (***)
- This script, by Stewart Ravenhall,
unlike the previous one handles comments surrounded by code.
- Strip C/C++ comments (3/4)
(****)
-
This script, by
Brian Hiles, handles C
and C++ (
//) comments and, unlike the previous ones, correctly skips comments
inside strings. It shows a very interesting trick to build a line piecewise in hold
space, which eases more complicated parsing tasks.
- Beautify directory listing (UNIX)
(***)
-
Indents the output of ls -lR
according to the depth of each directory. Makes
output far easier to read.
- Directory tree (UNIX) (**)
-
Indents the output of
find -type d
into a nice tree format. Thanks to
Stewart Ravenhall.
- Commify numbers 1/3 (**)
- Formats numbers by placing commas before every 3 digits (eg: 1,200,573).
- Commify numbers 2/3 (**)
- A more compact script for versions of sed which recognise Extended RE's.
- Commify numbers 3/3 (**)
-
Compare with #1. This script expects 100% numeric input.
- File polisher (troff)
-
Very comprehensive suite of filters by Robert Marks which perform a large number
of beautifying operations on text files prior to processing by
troff
. These
scripts were used to produce camera-ready output for the
Australian School of
Management between 1985 and 1995. You can download a
gzipped tar archive of the scripts, or individual scripts:
polish0.sed,
polish1.sed,
polish2.sed,
polish3.sed,
polish4.sed,
polish5.sed,
polish6.sed,
polish7.sed,
polish8.sed,
polish9.sed, or visit
Robert's Web site.
- Horizontal banner (*)
-
Rotates the vertical output of banner to produce horizontal output. The script assumes a
screen size of 80x60. This could be overcome.
- Remove troff overstrikes
(***)
-
A script to convert troff
output to pure text, replacing boldfaces with
"*...*" and underlines with "_..._". Also shows how to justify
text using sed.
- Number lines (*)
- A short script to display output lines preceded by line numbers. This is similar to the
UNIX
nl
command, or cat -n
.
- Number lines (**)
-
This version demonstrates a technique for manually calculating numbers.
- Number non-empty lines (*)
- A short script to display output lines, preceding non-empty lines with a line number.
Empty lines affect the count. This is not the same as
cat -bn
, which does not count
empty lines.
- Number non-empty lines (**)
-
This version demonstrates a technique for manually calculating numbers; it emulates
cat -bn
exactly.
Information extraction / tabulation
- Find subwords (**)
-
Search for dictionary words in a string.
- Extract regular expressions and print the
context - by Greg Ubben (****)
-
Extract from a file the lines that contain a regular expression, printing the lines
containing the pattern and those that surround them.
- Extract regular expressions and print the
context - thanks to Hartmut Schaefer (***)
-
Print all the occurrences of a regular expression in a file. Each occurrence is
printed on a separate line, isolated from the non-matching text (for example,
the regex \<[A-Za-z]*\> will yield all the words in the file, one per
line.
- Find anagrams (****)
-
Search for anagrams in a list of words (one word per line).
- Indexer (****)
-
This script collates a list of references to produce an index suitable for a book or
magazine.
A detailed description of
the way it works, along with alternative versions of the script, is available on the
tutorials page. The script was used by the
Cornerstone magazine to create an index for a book after typesetting.
- Show
make
targets (***)
-
Extracts targets for a file from a makefile.
- Sort/delimit/number a list of names
(*****)
-
- Display beginning of file (*)
-
Display first 10 lines of a file. Like head
.
Miscellaneous
- Desktop calculator (*****)
-
- Add decimals (****)
-
This impressively short script adds a list of decimal numbers. It pulls this off by
transforming and concatenating units in each number into an analogue format, where a=1,
aa=2, aaa=3, etc, transforming the result back to decimal, and proceeding with the next
digit. Usage of
lookup
tables permits to do this with only 9 commands, with a 3-command inner loop; to
understand the idea better you might want to peek at
an implementation of the same algorithm without lookup
tables.
- Sierpinski triangles 1/3 (***)
- Sierpinski triangles 2/3 (***)
-
These scripts generate Sierpinski's triangle. Pass them a line made
of many underscores and a single X, something like ______X.
- Sierpinski triangles 3/3, slow and portable (***)
- Sierpinski triangles 3/3, fast and less portable (***)
-
To get below 10 commands to do the same, I had to find out the
real rule behind Sierpinski's triangle (the other two attemps were
somewhat empiric). It turns out that Sierpinski's triangle is
actually Wolfram's rule-90 cellular automaton. :-)
- Increment a number (****)
-
Interesting script to increment numbers. This algorithm is the fastest I know of that
does not use both buffers.
- Turing Machine in sed (****)
-
A totally useless but quite funny script by
Christophe Blaess: a Turing Machine is able to execute
any computable task (albeit slowly and painfully)... so
sed
can perform any computable
task!!! Here is a
description of the input file
format, including a sample automaton to increment binary numbers.
- sed sokoban by
Aurelio Marinho Jargas
(*****)
-
Yes, this is a full-featured 90-level sokoban game with color and animation!
Play with the arrow keys or with the classic vi keys hjkl
(left, down, up, right).
- sed arkanoid by
Aurelio Marinho Jargas
(*****)
-
And yet another masterpiece from the author of the sed sokoban game.
You might like the shell script
playsed which makes the ball move automatically.
- sed naughts and crosses (*****)
-
And now, here's a naughts and crosses game too.
- Brainf**k to C compiler (**)
-
This scripts convert
Brainf**k
programming language to C, ready to be compiled to machine code.
- Display month calendar (*****)
-
Display a simple calendar for the current month, à la the UNIX command cal
.
Only date
is required, math is done directly in sed.
- Display year calendar (****)
-
Display a simple calendar for the current year, very roughly based on the above script.
This time date computation is done with dc rather than date.
sed
debuggers
- Python sed debugger by Aurelio Marinho Jargas
-
A python script that reads a sed
script from a file and generate another sed
script, this one with debug commands. So, it's NOT a sed
interpreter, it
generates sed
debug file in sed
! You can debug your
sed
files with your own version of sed
(DOS, Linux, HP-UX,
...). The debug file is saved with a .sedd
extension.
You can also use it as a script beautifier (to insert and standardize spacing)
with the --indent
option which writes the beautified script on
standard output. Also, it can be used as an expert command analizer, with the
--tokenize
option, that gives all command information you need
- Korn shell debugger by Brian Hiles
-
This also instruments the debugged
sed
script. It implements spypoints on
conditional and unconditional criteria that can involve lines, regular expressions, or a
combination of both. A man page is embedded in the script. You can also download an
older version which runs in the Bourne Shell.
Updated 17 Nov 2003