2010
01.20

LINQ to Objects profiling

LINQ (Language-Integrated Query) is a cool set of extensions to the .NET framework that lets developers use query syntax similar to SQL to create result sets in a way that can be more understandable than traditional, iterative methods.

I wanted to find out if it is any faster than the old way of doing things and while I’m at it, see how it compares to filtering using another relatively new addition to C# known as Lambda Expressions.

War And Peace
To test I decided to count the number of sentences in which a particular word occurred. The classic work War and Peace by Leo Tolstoy seemed a fitting candidate because of its sizable length (692 pages in a .pdf). I found a version freely available from Penn State in .pdf format so I downloaded it and converted it to a .txt file for easier parsing. One of the main characters is named “Anna” so I set out to find out in how many sentences her name occurred.



There are a number of ways to solve this problem, but I have focused on three of them for the purpose of this profiling demonstration.

1)LINQ to Objects.
2)Lambda Expressions.
3)Traditional iterative method.

Just for fun, I also wanted to try the demo with regular expressions to see if that would make things any easier. The goal of each exercise is to end up with a List<string> that contains each sentence in which the word “Anna” occurs. For each case, I load the entire file into a string, then split that string into a List<string> that represents all the sentences of the book, then profile the parsing and filling of the final List<string> which holds the sentences where the word “Anna” occurs.

Notice that I don’t start the profiling timer until after the file has been read and the text put into a string and split into sentences.

Code for each of the methods follows, along with execution times:

Class scope regex string:
string regexExp = @”.*anna.*”; (I realize this will pick up things besides the proper name “Anna”, but it serves our purposes here).

LINQ to Objects with Regex – Duration 4.34 seconds

string sourceText = File.ReadAllText(@"C:\WarAndPeace.txt");
List sourceWords = sourceText.Split(new char[] { '.', '?', '!' }).ToList();
Regex rx = new Regex(regexExp.ToUpper());
List resultSet = new List();

DateTime startTime = DateTime.Now;

var matchingWords = from line in sourceWords
where rx.Matches(line.ToUpper()).Count > 0
select new
  {
    val = line
  };

foreach (var v in matchingWords)
{
	resultSet.Add(v.val);
}

DateTime stopTime = DateTime.Now;
TimeSpan duration = stopTime - startTime;
MessageBox.Show("Duration: " + duration.ToString() + "\r\nCount: "
    + resultSet.Count.ToString());


Lambda Expressions with Regex – Duration 4.48 Seconds

string sourceText = File.ReadAllText(@"C:\WarAndPeace.txt");
List sourceWords = sourceText.Split(new char[] { '.', '?', '!' }).ToList();
Regex rx = new Regex(regexExp.ToUpper());

DateTime startTime = DateTime.Now;

List resultSet = sourceWords.FindAll(aValue =>
  { return rx.IsMatch(aValue.ToUpper()); });

DateTime stopTime = DateTime.Now;
TimeSpan duration = stopTime - startTime;
MessageBox.Show("Duration: " + duration.ToString() + "\r\nCount: "
  + resultSet.Count.ToString());


Traditional iteration with Regex – Duration 4.34 Seconds

string sourceText = File.ReadAllText(@"C:\WarAndPeace.txt");
List sourceWords = sourceText.Split(new char[] { '.', '?', '!' }).ToList();
Regex rx = new Regex(regexExp.ToUpper());
List resultSet = new List();

DateTime startTime = DateTime.Now;

foreach (string val in sourceWords)
{
	if (rx.IsMatch(val.ToUpper()))
	{
		resultSet.Add(val);
	}
}

DateTime stopTime = DateTime.Now;
TimeSpan duration = stopTime - startTime;
MessageBox.Show("Duration: " + duration.ToString() + "\r\nCount: "
  + resultSet.Count.ToString());



So to see how much of a difference the Regex made, I took it out, and substituted a simple call to string.Contains(). It made a huge difference!
Times:

LINQ to Objects withOUT Regex – Duration 0.08 seconds

Lambda Expressions withOUT Regex – Duration 0.08 Seconds

Traditional iteration withOUT Regex – Duration 0.08 Seconds



The conclusion is that for solving this particular problem, LINQ, Lambda Expressions and Traditional iteration are all about equally quick.

The surprise (to me) is how much longer the addition of regular expressions took to find the strings. Granted this example is over-simplified and the flexibility of Regex normally outweighs the performance penalty, but clearly for simple problems (like finding substrings), other methods should be used.

I find the Lambda Expressions more visually satisfying in this case (perhaps because I am noob to LINQ), but LINQ is certainly a powerful addition to the language that will do things Lambda Expressions will not.

The code is attached in a .NET project along with the book text if you want to verify it. Just put the WarAndPeace.txt file in the same directory as the profile.exe.

In case you are wondering, “Anna” occurrs in 284 sentences in this document. All the methods return this same number, which is a good thing.

Download C# LINQ profiling project .zip file

midniteblogger

2009
11.18

Recently I had a need to utilize functionality written in Python, but was restricted to doing it from a C# application. The Python code was legacy and not easily duplicated, so I wanted to leverage it if at all possible. Iron Python, which is an implementation that runs under .NET, was a possibility, but seemed a bit much just to get some information out of existing Python code.
I ran across a great series on devshed.com about how to create a COM server in Python. Step 1 solved. The next question was how to call it from C#, since there was no way to create a typelib from Python. Good old late binding to the rescue.
Following is the code I ended up creating. This Python code just calls string.swapcase, but hopefully it is enough to give you the idea. (You need the win32com Python for Windows extensions installed first: http://sourceforge.net/projects/pywin32/)

Create TestServer.py with the following contents:

import win32com.server.register

class PythonUtil:
  _public_methods_ = [ 'SwapCaseString' ]
  _reg_progid_ = "PythonUtil.Utilities"
  # NEVER copy the following ID
  # Use "print pythoncom.CreateGuid()" to make a new one.
  _reg_clsid_ = "{691830C5-6322-48b0-B1D3-0055657757D5}"

  def SwapCaseString(self, val, item=None):
    import string
    if item != None: item = str(item)
    return string.swapcase(str(val))
    
	
# Add code so that when this script is run by
# Python.exe, it self-registers.
if __name__=='__main__':
  print "COM server being registered..."
  import win32com.server.register
  win32com.server.register.UseCommandLine(PythonUtil)
  

This .py file must be run once from the command line to register the COM server.

Awesome. Now my Python COM server is ready to go. How to call it from C#?

Since I don’t have a typelib that would allow Visual Studio to create a nice wrapper class, I have to revert to a late binding solution. Late binding is just the process of invoking a method at run-time without the compiler knowing about it ahead of time.
Late binding IDispatch calls have been around since COM was invented (I think?), so all I had to do was find how to do them in the huge .NET documentation marshmallow. It turns out that Activator is the class that will do the job.
Here’s the C# client side of the code:

using System.Runtime.InteropServices;
using System.Reflection;

object instance = 
 Activator.CreateInstance(Type.GetTypeFromProgID("PythonUtil.Utilities"));
if (instance != null)
{
    Type type = instance.GetType();
    string swap = "Swap Case This String";
    object[] paramList = { swap };  //1 string in params

    object objRet = type.InvokeMember("SwapCaseString", 
     BindingFlags.Default | BindingFlags.InvokeMethod, 
     null, instance, paramList);
    string myString = (string)objRet;  
    //myString should now hold "sWAP cASE tHIS sTRING"
}

Just make sure the ProgID strings are the same on both the Python and C# sides and everything should work great.
This is a simple example, but hopefully you can extend it to solve some problems you are facing.
midniteblogger.

2009
10.13

Another thing it can be handy to do is get a List<> of all files in a directory and contained subdirectories. This is nice if you are doing something like programmatically registering a bunch of legacy COM components. Again, what I want is something like this:

File.GetFilesRecursively(string dir, List result);
Since I couldn’t find this in the .NET framework I wrote a recursive one:

private static void GetFilesRecursively(string dir, List result)
{
    string[] files = Directory.GetFiles(dir);

    foreach (string file in files)
    {
        result.Add(file);
    }

    foreach (string directory in Directory.GetDirectories(dir))
    {
        GetFilesRecursively(directory, result);
    }                        
}

Just pass in the directory name and a List that you have just new’ed and it will give you a recursive list of all files in that directory.

2009
10.13

Copy Directory Recursively

There are a few things in C# that are not as easy to do as they should be. For instance, copying a directory along with all of its subfolders and contained files. What I want is a function that looks like this:

Directory.Copy(string sourceDir, string destDir)
I couldn’t find such a function in the .NET framework so I wrote a recursive one myself:

using System.IO;
private static void CopyDirectory(string srcDir, string destDir)
{
    if (destDir[destDir.Length - 1] != Path.DirectorySeparatorChar)
        destDir += Path.DirectorySeparatorChar;

    if (!Directory.Exists(destDir))
        Directory.CreateDirectory(destDir);

    string[] entries = Directory.GetFileSystemEntries(srcDir);
    foreach (string entry in entries)
    {
        //Recursive CopyDirectory call here
        if (Directory.Exists(entry))
            CopyDirectory(entry, destDir + Path.GetFileName(entry));  
        else
            File.Copy(entry, destDir + Path.GetFileName(entry), true); 
    }
}

You just need to make sure the directories are not already contained within each other in some way or it could be copying files until your drive fills up!

2009
05.28

NXCOMPAT?

Recently, while developing a .NET Winforms desktop application, I needed to integrate a legacy .OCX that did terminal emulation for a serial device.  I had been using XP to develop and had run into no problems until I tried to deploy to a Vista machine.  When running the app on Vista, I was greeted with this exception:

Unable to get the window handle for the ‘myControl’ control. Windowless ActiveX controls are not supported.

Not much help here.  The “inner exception” was equally uninformative:

Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

The location of the exception was at ((System.ComponentModel.ISupportInitialize)(this.myControl)).EndInit();

Pretty worthless information.  After googling around for awhile, I ran across Ed Maurer’s helpful blog (thanks Ed) and he discusses some breaking changes that .NET Framework 2.0 SP1 introduces (also applies to .NET 3.5 Framework).
Apparently in the header of a PE file, there is a flag called IMAGE_DLLCHARACTERISTICS_NX_COMPAT.  This flag tells Vista whether or not to enable something called DEP (Data Execution Prevention) in your process.  If this flag is set, and your legacy .OCX does not have the flag set (which it would not by default), Vista won’t allow your .OCX to load.
Ed’s blog says that Microsoft likes DEP so much that .NET Framework 2.0 SP1 emits binaries with the IMAGE_DLLCHARACTERISTICS_NX_COMPAT on by default (it was off before).  This is going to cause unexpected expenses and time commitments for developers that deploy new builds to customers only to find that suddenly their apps don’t run on Vista anymore.  In my opinion, it is very wrong-headed to release a service pack with these kind of breaking changes.  Bad judgment, Microsoft.

Solutions?
-Disable DEP on the affected machines (run this line at a command prompt:  bcdedit.exe /set {current} nx AlwaysOff ). This is not a very good option for app developers for obvious reasons.
-Update the .OCX to build with a newer compiler (newer than ATL 7.1 and set the /NXCOMPAT linker flag – http://msdn.microsoft.com/en-us/library/ms235442(VS.80).aspx).  This is not an option with 3d party .OCXes, however.
-Run a post-build script in the .NET app that sets NXCOMPAT to NO and redeploy your app.  This script worked for me:

call "$(DevEnvDir)..\..\VC\bin\vcvars32.bat"
call "$(DevEnvDir)..\..\VC\bin\editbin.exe" /NXCOMPAT:NO "$(TargetPath)"

This will let the app run and load the OCX without a problem, but if you are trying to debug, it still has the same exception thrown.

TODOs for Microsoft:
-Don’t introduce this kind of breaking change in a “mandatory” service pack release.
-Provide better error info for the exception thrown.
-Document this somewhere easier to find than the back pages of an msdn blog.

Midniteblogger

2009
03.23

Visual Studio is incredibly slow to load and that makes it almost prohibitively hard to use.

Some examples:
-I am typing this post while waiting for Visual Studio to get done with whatever it is doing.  I merely opened an .aspx file and that was 11 minutes ago.  The application window is “Whited out” (in Vista) and the title bar says “Microsoft Visual Studio (Not Responding).”  The site has quite a few files, but is not incredibly large.  Hasn’t Microsoft ever heard of multithreading UIs?

-After the site finally loads, whenever I edit a page, there is a constant annoying delay that locks the UI for a couple of seconds after doing the simplest of things (like pasting some text).  The status bar continues to inform me that it is “Getting file ‘Web.config’ from the Web.”    How often does it need to do this?  Once would seem sufficient, but not for Visual Studio.

-Loading an .XML file.  OMG.  This can also take on the order of MINUTES to load a 20K file.  Meanwhile, Visual Studio is locked solid and “Whited out.”   What is it doing to these files, anyway?

I have really wanted to develop a new book site with Visual Studio, but am seriously considering switching to php because of the slowness of using Visual Studio.

Anyone else had this problem?  (Yes I have tried it on different machines, yes I have a fast connection, and yes, this machine is well-powered with a quad-core and 7 GB of memory).

At Tech-Ed 2007 I sat with a guy from the VS dev team on the bus and bent his ear for a few minutes about this problem (it was the same in Visual Studio 2005).  He sounded pretty clueless, but said he would look into it.   Didn’t look hard enough, I would say.  At least it sure didn’t get fixed.

Well, I can get back to my project now.  Total time to load one small aspx page: 13 minutes.

midniteblogger

2008
12.12

Recently I wanted to show a splash screen form in an application, loading the image file from the disk at run time instead of binding it into the executable. It’s easy enough to load the image with Image.FromFile, but if the file is not there, I wanted to fail gracefully and just close the form.
When I called Close() from the FormLoad event, however, the following exception greeted me: Value Close() cannot be called while doing CreateHandle().
What does that mean?
A little searching around coupled with my experience in traditional Windows programming led me to conclude that the .NET runtime doesn’t like to have it’s forms closed while they are being created. In the C++ days, we would just PostMessage with WM_CLOSE to the hWnd, and all would be well. Hmmm… What to do in C#?
In searching, I ran across the BeginInvoke method, which “Executes a delegate asynchronously on the thread that the control’s underlying handle was created on.” Sounds interesting.
What delegate to use, however, was the next question. MethodInvoker is the answer. According to the Microsoft docs, MethodInvoker “represents a delegate that can execute any method in managed code that is declared void and takes no parameters.” The intended use is for “when you need a simple delegate but do not want to define one yourself.” Exactly what I was looking for.
So I defined a simple function in my form class that would be used by the delegate:

private void CloseMe()
{
  Close();
}

In the CloseMe method, I simply do what I wanted to do in the first place – call Close() on the form.  Close() is now safe to call because we are no longer in the FormLoad event.

From the FormLoad event, if the image file was not found, simply call the asynchronous BeginInvoke method:

void FormLoad(object sender, EventArgs e)
{
  if(imageNotFound)
  {
    BeginInvoke(new MethodInvoker(CloseMe));
  }
}

It works great and no more exception!
midniteblogger

2008
09.23

Getting data across process boundaries can be tricky sometimes, especially when one of the processes is running managed code (.NET) and one is running a native executable.  It gets even more complicated if you want to do it on Windows CE.

In Win32, if I want to call from COM to .NET, it is pretty straightforward because the CLR will create a managed object and a COM Callable Wrapper (or CCW) that acts as a proxy for the COM object to talk to the managed object.  It’s a little clunky, but it works and is well documented on MSDN.

On Windows CE or Windows Mobile, however, (as best as I can determine), the CCW option does not exist.  The CE runtime won’t allow an unmanaged process to “host” the CLR so the CCW proxy can be created.  So how do you get data from unmanaged to managed code on Windows CE, then?

Good question.  I found the answer in the WM_COPYDATA message.  This cool little message lets you send data between processes, and even seems to work between managed and unmanaged code.

How to use it?

Let’s say that on the unmanaged side I want to send a string of data, “Hello”, to a managed .exe written in C#.  The unmanaged code is not COM, just regular old C++.
To set up the unmanaged code, declare a COPYDATASTRUCT, and fill out the members:

COPYDATASTRUCT cds;
cds.dwData = 0;
//Unicode chars. Don't need room for NULL on end
cds.cbData = 5 * sizeof(WCHAR);
cds.lpData = L"Hello";

Then just do a SendMessage with the address of cds:

::SendMessage(hManagedWnd, WM_COPYDATA, (WPARAM)myHwnd, (LPARAM)&cds);

Whoops, what is hManagedWnd? It is the HWND of the managed window where you are sending the data. How did I get that handle? With a call to FindWindow, of course. Just fill in the name of the window you are looking for and you get the handle back.

HWND hManagedWnd= FindWindow(NULL, L"NameOfManagedWindow");

Awesome.
So how do I catch this on the managed side? Good question.
Just to make this a little more interesting, I create a little managed window that just sits and waits for a WM_COPYDATA message. .NET has a handy class called MessageWindow that “Provides the ability to send and receive Windows-based messages.” Just what I am looking for.
Declare the COPYDATASTRUCT type:

[StructLayout(LayoutKind.Sequential)]
struct COPYDATASTRUCT
{
public IntPtr dwData;
public int cbData;
public IntPtr lpData;
}

Derive a class from MessageWindow:

public class MsgWindow : MessageWindow

Define WM_COPYDATA:

public const int WM_COPYDATA = 0x004A;

Set the text of the window in the constructor:

public MsgWindow(Form1 msgform)
{
this.Text = "NameOfManagedWindow";
}

Override the WndProc (Yes, you can get to the WndProc in .NET):

protected override void WndProc(ref Message msg)
{
  switch (msg.Msg)
  {
    case WM_COPYDATA:
    {
      COPYDATASTRUCT cds = new COPYDATASTRUCT();
      cds = (COPYDATASTRUCT)Marshal.PtrToStructure
              (msg.LParam, typeof(COPYDATASTRUCT));
      if (cds.cbData > 0)
      {
         byte[] data = new byte[cds.cbData];
         Marshal.Copy(cds.lpData, data, 0, cds.cbData);
         UnicodeEncoding ue = new UnicodeEncoding();
         String str = ue.GetString(data, 0, data.Length);
         //At this point, str will contain "Hello"
      }
    }
  }
  base.WndProc(ref msg);
}

Create an instance of MsgWindow in your form’s constructor:

public partial class Form1 : Form
{
  MsgWindow MsgWin;
  public Form1()
  {
    MsgWin = new MsgWindow(this);
  }
}

In this way, even though it is a bit awkward, you can get data across from unmanaged to managed code. Working with binary data would be a bit trickier, but will still work. Underneath it all, Windows creates a memory-mapped file to handle the actual data transfer, which saves you the trouble of doing it yourself.
midniteblogger

2008
08.21

Malware everywhere

So, regarding my last post, perhaps I owe an apology to YouTube regarding malware distribution.  I have been chasing the problem for about a month now, and have finally narrowed it down to a virus known as “MPack” from Russia.  This is a really nasty one and appears to have several “ports of entry” into a web site.

 From what I can tell, the attacks came in through the following php that I innocently installed on my site (not this site), but have since removed:

  •  php Hamweather installation (nice app that gives weather in various locations).  Hacked through appending .include to the URL with a perl script reference on the end.  I notified them of the attack.  Hopefully they have corrected it by now.
  • php Mortgage calculator.  Supposed to give mortgage estimations at various rates.  Looks like it was malicious from the start (especially in some caching stuff with names like ‘smartie’).
  • WordPress blog – could be the ‘free’ template I downloaded.  Still not sure how they hacked this one.

What happened then, was that bad files got copied all over the site through the above-mentioned holes:

  • modsoap.php – This one periodically injected really evil Java Script into the <head> section of the index.php page.  The script looks like the following:   <script language=JavaScript>function dc(x){var l=x.length,b=1024,i,j,r,p=0,s=0,w=0,t=Array(lots of numbers and more code here…);  Once executing, this script tries to download ActiveX controls and run .exes, exploiting IE weaknesses.  These guys did a good job of summarizing the details of the attack.
  • news.php – Looks like a bunch of random characters wrapped in a php base64_decode, unzip and eval calls.  I think it places modsoap.php elsewhere. 

Hopefully its gone now, but it will never be forgotten!

This is definitely a lesson learned in not trusting php script written by others even if they are reputable folks.

 midniteblogger.

2008
08.05

Publishing and managing web sites is always an interesting challenge and it helps to be able to generate extra income from them from programs like AdSense.  Recently, Google has allowed its publishers to monetize YouTube views/clicks by putting some script on their sites.

That is great, or so I thought, until the other morning I got a couple of emails from Google saying my site was distributing “malware.”  Quite a surprise to me since I strive to keep my sites as clean as possible.  Sure enough, upon going to the site, my anti-virus protector went crazy, complaining about JS/Psyme.J virus, Bugnraw!generic virus (Bugnraw!generic was detected in …\UIIL.EXE), and Hopee!generic virus.  It kept trying to download something from golnanosat.com/in (213.155.0.242) and wanting me to approve an exe to run that was signed by HiPoint Ltd, S.A.

What is going on here??  After a couple of days of digging, I believe the source was the YouTube video strip that I was embedding in my pages.  I took it off the page, and the virus warnings stopped.

It’s nearly impossible to get Google to take the “malware” warning off your site, but how ironic that it might be caused by a company that they own.  I hope they will at least take a look at this problem and determine if someone is hacking their YouTube publisher scripts.

Anyone else experiencing the same problem?

midniteblogger