by Kam-Hung Soh (kamhung dot soh at gmail dot com) 2008/06/08 06:06:50
Here is a work-in-progress collection of methods for finding the longest line in a text file using GNU utilities (specifically GNUWin32) and several different programming languages. The same variable names are used in each sample to make it easier to compare different languages and approaches.
The first approach is iterate through all lines in a file, keeping track of the length of the longest line found and its value, then print the longest line found. Using this approach, if more than one line is the longest line, only the first one found is printed.
The second approach is to attach, to each line, its length, sort the collection based on the length property, select the line with the greatest length and disassociate the length property from the line. This approach is also known as "Decorate-Sort-Undecorate" idiom.
The third approach is to sort all lines by length, then print the last list. In this approach, if there is more than one line that is the longest, the last one, in lexicographical order, is printed.
The last two approaches can be inefficient because all lines have to be stored before they can be sorted and then they are dumped since only the longest is printed.
using System;
namespace ConsoleApplication1 {
class Program {
static void Main(string[] args) {
String s = null;
String longest = null;
int maxLen = 0;
while ((s = Console.ReadLine()) != null) {
if (maxLen < s.Length) {
maxLen = s.Length;
longest = s;
}
}
Console.WriteLine(longest);
}
}
}
gawk "{ if (maxLen < length) { maxLen = length; longest = $0 } } END { print longest }" test.txt
gawk's length function returns the length of current line, $0, when no arguments are provided.
package longestline;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
String s = null;
String longest = null;
int maxLen = 0;
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
try {
while ((s = br.readLine()) != null) {
if (maxLen < s.length()) {
longest = s;
maxLen = s.length();
}
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(longest);
}
}
perl -ne "if ($maxLen < length) { $maxLen = length; $longest = $_ } END { print $longest }" test.txt
The -n option makes Perl run an implicit while-loop for all lines in the input stream.
maxLen = 0
for line in file(r'test.txt'):
if maxLen < len(line):
maxLen = len(line)
longest = line
print longest
ruby -n -e "BEGIN { $maxLen = 0 }; if $maxLen < $_.length then $maxLen = $_.length; $longest = $_ end; END { p $longest }" < test.txt
-n makes Ruby interpreter run an implicit while-loop for all lines in the input stream. Global variables start with a dollar symbol ($). $_ is the pre-defined variable for the last value read.
2008-06-08: Not pretty but that's more due to my lack of experience with Ruby at the moment.
wc -L test.txt
wc displays the length of the longest line, but not the line itself.
gawk "{ printf("""%6d\t%s\n""", length, $0) }" test.txt | sort | cut -f2 | tail -1
The triple double-quotes is to escape the innermost quotes for Windows cmd.exe. gawk decorates each line with the length of the string and cut -f2 undecorates (removes) the line length.
get-content test.txt | sort-object -property length | select-object -last 1
The length property of each line is provided by get-content for each line.
python -c "import sys; print max((len(line), line) for line in sys.stdin)[1]" < test.txt
Python command line doesn't have a special variable for the standard input stream nor does it provide an implicit while-loop for all line in the input stream, so you have to import the sys module to get the standard input stream (sys.stdin) and iterate through all lines using a generator expression. The max() function returns the tuple (len(line), line) with the greatest value in index 0, then the array index [1] returns the longest string in index 1.