Find Longest Line in Different Programming Languages

by Kam-Hung Soh (kamhung dot soh at gmail dot com) 2008/06/08 06:06:50

Index Copyright About Blog

Introduction

Here is a work-in-progress collection of methods for finding the longest line in a text file using GNU utilities (specifically GNUWin32) and several different programming languages. The same variable names are used in each sample to make it easier to compare different languages and approaches.

The first approach is iterate through all lines in a file, keeping track of the length of the longest line found and its value, then print the longest line found. Using this approach, if more than one line is the longest line, only the first one found is printed.

The second approach is to attach, to each line, its length, sort the collection based on the length property, select the line with the greatest length and disassociate the length property from the line. This approach is also known as "Decorate-Sort-Undecorate" idiom.

The third approach is to sort all lines by length, then print the last list. In this approach, if there is more than one line that is the longest, the last one, in lexicographical order, is printed.

The last two approaches can be inefficient because all lines have to be stored before they can be sorted and then they are dumped since only the longest is printed.

Solutions

Approach 1

C#

using System;

namespace ConsoleApplication1 {
 class Program {
  static void Main(string[] args) {
   String s = null;
   String longest = null;
   int maxLen = 0;
   while ((s = Console.ReadLine()) != null) {
    if (maxLen < s.Length) {
     maxLen = s.Length;
     longest = s;
    }
   }
   Console.WriteLine(longest);
  }
 }
}

Gawk

gawk "{ if (maxLen < length) { maxLen = length; longest = $0 } } END { print longest }" test.txt

gawk's length function returns the length of current line, $0, when no arguments are provided.

Java

package longestline;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;

public class Main {
 public static void main(String[] args) {
  String s = null;
  String longest = null;
  int maxLen = 0;
  BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
  try {
   while ((s = br.readLine()) != null) {
    if (maxLen < s.length()) {
     longest = s;
     maxLen = s.length();
    }
   }
  } catch (IOException e) {
   e.printStackTrace();
  }
  System.out.println(longest);
 }
}

Perl

perl -ne "if ($maxLen < length) { $maxLen = length; $longest = $_ } END { print $longest }" test.txt

The -n option makes Perl run an implicit while-loop for all lines in the input stream.

Python

maxLen = 0
for line in file(r'test.txt'):
 if maxLen < len(line):
  maxLen = len(line)
  longest = line
print longest

Ruby

ruby -n -e "BEGIN { $maxLen = 0 }; if $maxLen < $_.length then $maxLen = $_.length; $longest = $_ end; END { p $longest }" < test.txt

-n makes Ruby interpreter run an implicit while-loop for all lines in the input stream. Global variables start with a dollar symbol ($). $_ is the pre-defined variable for the last value read.

2008-06-08: Not pretty but that's more due to my lack of experience with Ruby at the moment.

wc

wc -L test.txt

wc displays the length of the longest line, but not the line itself.

Approach 2, Decorate-Sort-Undecorate Idiom

cut, gawk, sort, tail

gawk "{ printf("""%6d\t%s\n""", length, $0) }" test.txt | sort | cut -f2 | tail -1

The triple double-quotes is to escape the innermost quotes for Windows cmd.exe. gawk decorates each line with the length of the string and cut -f2 undecorates (removes) the line length.

PowerShell

get-content test.txt | sort-object -property length | select-object -last 1

The length property of each line is provided by get-content for each line.

Python

python -c "import sys; print max((len(line), line) for line in sys.stdin)[1]" < test.txt

Python command line doesn't have a special variable for the standard input stream nor does it provide an implicit while-loop for all line in the input stream, so you have to import the sys module to get the standard input stream (sys.stdin) and iterate through all lines using a generator expression. The max() function returns the tuple (len(line), line) with the greatest value in index 0, then the array index [1] returns the longest string in index 1.

Index Copyright About Blog