He's making a list and checking it twice, Gonna find out who's naughty and nice, Santa Claus is coming to town. Did this song creep you out as a kid? It did me! Now that you're all grown up well, some of us anyway!

Looing Did this song creep you out as a kid? It did me! Now that you're all grown up well, some of us anyway! Do you really want to take that chance though? I didn't think so! Read the article for some background information, then explore the code yourself. This application uses the BackgroundWorker, WebClient, and works with regular expressions with the Regex object. You can download the Express editions for free from here. Background An Lookong Chinese proverb states that "The entries in your blog reveal your inner self.

Are you grumpy, happy, hopeful, crazy?

Chances are it's obvious from your blog posts. This application will take a URL, scan it for a list of "naughty" and "nice" words, and come up with a score of niceness. But beware: s linked from the given URL will also be scanned naugjty taken into. This Lookinng be accomplished by opening a socket, issuing an HTTP GET request, and reading back the resulting stream of bytes one at a time.

On the other hand, it can also be performed in two lines of code using the System. WebClient object. For me, the choice was simple. The WebClient has a of methods to download a file synchronously or asynchronously, either as a string, a byte array, or directly to a file. Once the object is instantiated, a single line naughtty code does the actual work.

It's so easy, you'll be adding URL download functionality into every application before you know it! You won't get an error otherwise, it just won't appear to work right.

Downlotring url ; Searching for Patterns Once you have the contents of the URL in a string, you can anything with it as with any other string. Perform IndexOf searches, save it to a database, or apply regular expressions.

This is the use of interest for this application. Regular expression support is found in the System. RegularExpressions namespace with the Regex object. Using the Regex object is pretty easy, but coming up with the expression itself can be Lookint challenge.

If you haven't used regular expressions before, you may want to take a few minutes to read about them. Naufhty good place to start is the MSDN reference. Unfortunately, regular expressions aren't very intuitive at first or nauyhty for some people! Creating your own expression can be difficult, but you can often find pre-built ones online. Note that regular expressions are used in conjunction with string verification, formatting, searching, and replacing.

This application will use three different expressions. One will be used to search for "nice" keywords, another for "naughty" keywords. The third expression will locate hyperlinks based on the href attribute of the a element found in HTML. Creating a Regex object nie some overhead, not only from object creation, but also from parsing the expression.

To minimize this, all three Regex objects are created lr the application starts up. Linked s are not scanned for additional links to avoid overload. As it is, some sites already take a full minute to process! The process of downloading a then performing the searches is contained in the DetermineScore method. In order to actually perform the search, invoke the Matches method.

This returns a MatchCollection object for iteration. Matches body. Count; Another one line of code performing lots of work! If you needed the information, you could then use the returned MatchCollection to determine specifically which words were returned, along with much other information about each match. The expression for nice and naughty words is simple: the vertical pipe bar acts like an OR Boolean operator.

In other words, the regular expression parser will scan the entire input string the downloaded HTML and add a Match object each time one of the words is found. After all, the types of sites you link to also say something about how naughty or nice you probably are! That third regular expression is much more complex than the first two.

In both languages, you can't just have quotes within a quoted string you'll notice that this expression is slight different in the Fr and C code samples above. In Visual Basic, you double the doublequote "".

What does this mess mean? Well, suffice to say that is looking for the href attribute, and the quotes and brackets after it. Unfortunately, the resulting match is always more than we actually want. For each valid match found, I call the DetermineScore method to count up naughty and nice words. Note that the scores on linked s are cut in half.

More weight is given to your own site! A site with many links can really slow things down. When you click the button to start things off, any work that you perform will occur on the user interface thread. All of a sudden, the application is unresponsive. The BackgroundWorker object makes Lloking easy. As each link is analyzed, progress is reported by raising the ProgressChanged event. Add ls. Url lvi. Naugnty, ls. Blocks overallProgressBar.

Url ; lvi.

ToStringls. Blocks; overallProgressBar. The worker thread periodically checks the CancellationRequested flag to exit early if necessary. While the initial link is being downloaded, a ProgressBar control goes into marquee mode think Knight Rider! It changes to a standard progress bar as discovered links are analyzed. Finally, the RunWorkerCompleted event fires when everything is done based on the DoWork event handler completing.

This updates the progress percentage, enables the Cancel button, and hides the progress bar. Next Steps The application isn't all that useful, but it's fun! It could easily be extended to perform different actions on discovered s, or simply foor search for different Lookibg. Several enhancements that would be fairly easy would be: Add a checkbox to prevent linked s from being counted. This would speed things up. Restrict the of links to follow.

If the given URL has links to other s, it may never finish! Better filtering of links. Links on the same site may not be needed either.

Checking for the same base URL would be a pretty easy addition. Create multiple worker thre to share the load of analyzing linked s. This is a good scenario for parallelizing. Conclusion I hope that you had fun with this application.

It was fun coming up with a naughty-or-nice formula! I struggled a bit with the best balance. It's not perfect, but it works pretty well.

If you have a better approach, by all means tweak it. Best of all, have fun! Get started by downloading Visual Studio Express and download the sample code today. Arian Kulp is an independent software developer and writer working in the Midwest. He has been coding since the fifth grade on various platforms, and also enjoys photography, nature, and spending time with his family.