Saturday, 5 July 2008

 

Extract Columns From Tabular Text - Cut

A quick way to extract one or more columns from tabular or character delimited data, such as Web pages or log files, is to use the GnuWin cut command.

Some examples:

The -f switch specifies the column to extract. By default, the delimiter is TAB and the -d switch specifies an alternative delimiter.

One limitation of cut is that you can't specify the columns relative to last column, unless you know the index of the last column. If your data has a varying number of columns, such as the path strings printed by the find . -type f command, such as the example below …

./Profiles/9ls0tqn1.default/blocklist.xml
./Profiles/9ls0tqn1.default/bookmarkbackups/bookmarks-2008-06-14.html

… you can't easily extract just the file name (the last column) in every line.

A related command is colrm, which removes character columns from the input. It's quite limited and does the opposite of what I expect, so I haven't used it.

Cut is a simple utility to extract columns of data, and it can't process the column data like a scripting language. I'll write a bit more about processing tabular data in future.

See Also

Labels:


Comments: Post a Comment
<< Home

This page is powered by Blogger. Isn't yours?