2010
01.20

LINQ to Objects profiling

LINQ (Language-Integrated Query) is a cool set of extensions to the .NET framework that lets developers use query syntax similar to SQL to create result sets in a way that can be more understandable than traditional, iterative methods.

I wanted to find out if it is any faster than the old way of doing things and while I’m at it, see how it compares to filtering using another relatively new addition to C# known as Lambda Expressions.

War And Peace
To test I decided to count the number of sentences in which a particular word occurred. The classic work War and Peace by Leo Tolstoy seemed a fitting candidate because of its sizable length (692 pages in a .pdf). I found a version freely available from Penn State in .pdf format so I downloaded it and converted it to a .txt file for easier parsing. One of the main characters is named “Anna” so I set out to find out in how many sentences her name occurred.

There are a number of ways to solve this problem, but I have focused on three of them for the purpose of this profiling demonstration.

1)LINQ to Objects.
2)Lambda Expressions.
3)Traditional iterative method.

Just for fun, I also wanted to try the demo with regular expressions to see if that would make things any easier. The goal of each exercise is to end up with a List<string> that contains each sentence in which the word “Anna” occurs. For each case, I load the entire file into a string, then split that string into a List<string> that represents all the sentences of the book, then profile the parsing and filling of the final List<string> which holds the sentences where the word “Anna” occurs.

Notice that I don’t start the profiling timer until after the file has been read and the text put into a string and split into sentences.

Code for each of the methods follows, along with execution times:

Class scope regex string:
string regexExp = @”.*anna.*”; (I realize this will pick up things besides the proper name “Anna”, but it serves our purposes here).

LINQ to Objects with Regex – Duration 4.34 seconds

string sourceText = File.ReadAllText(@"C:\WarAndPeace.txt");
List sourceWords = sourceText.Split(new char[] { '.', '?', '!' }).ToList();
Regex rx = new Regex(regexExp.ToUpper());
List resultSet = new List();

DateTime startTime = DateTime.Now;

var matchingWords = from line in sourceWords
where rx.Matches(line.ToUpper()).Count > 0
select new
  {
    val = line
  };

foreach (var v in matchingWords)
{
	resultSet.Add(v.val);
}

DateTime stopTime = DateTime.Now;
TimeSpan duration = stopTime - startTime;
MessageBox.Show("Duration: " + duration.ToString() + "\r\nCount: "
    + resultSet.Count.ToString());

Lambda Expressions with Regex – Duration 4.48 Seconds

string sourceText = File.ReadAllText(@"C:\WarAndPeace.txt");
List sourceWords = sourceText.Split(new char[] { '.', '?', '!' }).ToList();
Regex rx = new Regex(regexExp.ToUpper());

DateTime startTime = DateTime.Now;

List resultSet = sourceWords.FindAll(aValue =>
  { return rx.IsMatch(aValue.ToUpper()); });

DateTime stopTime = DateTime.Now;
TimeSpan duration = stopTime - startTime;
MessageBox.Show("Duration: " + duration.ToString() + "\r\nCount: "
  + resultSet.Count.ToString());

Traditional iteration with Regex – Duration 4.34 Seconds

string sourceText = File.ReadAllText(@"C:\WarAndPeace.txt");
List sourceWords = sourceText.Split(new char[] { '.', '?', '!' }).ToList();
Regex rx = new Regex(regexExp.ToUpper());
List resultSet = new List();

DateTime startTime = DateTime.Now;

foreach (string val in sourceWords)
{
	if (rx.IsMatch(val.ToUpper()))
	{
		resultSet.Add(val);
	}
}

DateTime stopTime = DateTime.Now;
TimeSpan duration = stopTime - startTime;
MessageBox.Show("Duration: " + duration.ToString() + "\r\nCount: "
  + resultSet.Count.ToString());

So to see how much of a difference the Regex made, I took it out, and substituted a simple call to string.Contains(). It made a huge difference!
Times:

LINQ to Objects withOUT Regex – Duration 0.08 seconds

Lambda Expressions withOUT Regex – Duration 0.08 Seconds

Traditional iteration withOUT Regex – Duration 0.08 Seconds

The conclusion is that for solving this particular problem, LINQ, Lambda Expressions and Traditional iteration are all about equally quick.

The surprise (to me) is how much longer the addition of regular expressions took to find the strings. Granted this example is over-simplified and the flexibility of Regex normally outweighs the performance penalty, but clearly for simple problems (like finding substrings), other methods should be used.

I find the Lambda Expressions more visually satisfying in this case (perhaps because I am noob to LINQ), but LINQ is certainly a powerful addition to the language that will do things Lambda Expressions will not.

The code is attached in a .NET project along with the book text if you want to verify it. Just put the WarAndPeace.txt file in the same directory as the profile.exe.

In case you are wondering, “Anna” occurrs in 284 sentences in this document. All the methods return this same number, which is a good thing.

Download C# LINQ profiling project .zip file

midniteblogger

  1. wow! this is so nice,i like it very much!This link is more suitable for me and Very Wonderful, I think it is very valuable!