Principles of simplicity

Published Jul 24, 2007

As I’ve wrote before, simplicity is very subjective and is not considered a first class design principle as unit testing, design patterns, UML etc. However, it is in my opinion possible to use guidelines of simplicity in an objective manor and make simplicity a part of the coding standard in any company. As with many other design principles, it can only be reinforced using code reviews and that is exactly how we enforce it at Traceworks.

Here is a list of some of the principles of simplicity that we use. Some of them are borrowed from other principles or philosophies, but together they form a nice simplicity guideline I think.

1. Simplicity or not at all

Some developers tend to over-complicate a task and ends up writing too many classes to solve a simple problem. The ability to take a step back and observe the problem space before digging in to a detail is hard, but it will keep you from over-complicating matters. The more experience a developer have, the easier it becomes to find out how many steps backwards to take in order to fully understand the problem space.

I’ve always believed that if you cannot find a simple solution then don’t do it at all. That’s because if you cannot take a step back and get a good feeling for the problem, then you don’t understand it enough to see a simple solution – and trust me, there is always a simple solution. If you don’t understand the problem you are trying to solve, then you probably cannot solve it. More

2. Don’t build submarines

It’s a common fact that IT projects take longer than scheduled even if you schedule for delays. Books have been written on this fact, but it also has an impact on the simplicity of the code written.

If you design a huge project with many components, tiers, whistles and bells then you most likely will be delayed. In the eleventh hour up to a deadline, you will probable not have time for proper testing and code reviews. The problem is that you might never get back to that code before it breaks 6 months later. Maybe it’s your co-worker who will fix the problem, but end up looking frustrated at a complex submarine not knowing where to begin.

If possible, make sure that you design the project in modules that can be added after the release of the core product. This can be done in web projects easily, but prove very difficult for desktop applications.

3. Test when appropriate

Testing is one very important factor of the development cycle and there are many different tests to perform. Unit testing, use case testing, performance testing etc. Tests are good, but if you test everything to the smallest detail every time, then you probably waste a lot of time on it. Test where it makes sense and use different combinations of the various tests for different libraries. Not all code needs unit testing and not all code needs performance testing, but they need some tests. Be smart and don’t spend more time writing test cases than you do on writing or designing your application unless it’s your job. Read more at Testing your code.

4. Be precise when naming methods

A method must have a name that tells exactly what it does. Luckily, a method should at all time only do one thing to be simple, so this is easy. A method called Send could be a bad name if it actually creates a mail object and fill out the sender, body and subject and then passes it on to another method that actually executes it. A more precise name would be CreateMailMessage.

When the method names become precise it also becomes very simple to understand exactly what is going on where.

5. Comment your code the simple way

Code commenting can be done in a myriad of ways, but there really is only one that keeps your code simple at the same time.

Don’t comment the individual code lines in a method or property, but keep the comments above the class, method, property or interface. The comment should answer the questions what, why and how. This can be done quite easy when you have giving the method a precise name and you can spend more time explaining the why than the what because the name tells what it does.

This wouldn’t be a rule if there aren’t any exceptions, so of course there is - one to be precise. If your code contains a strange quirk to work, then you should write a note above that piece explaining why it is done that way.

Clear comments also tell others what your code does and are therefore a good method to make your code simpler to understand.

6. Steal borrow and simplify

We all use code pieces found on the Internet all the time. Search Driven Development is the new developer philosophy, but be careful about the code found on the web. It might work, do the job and get you over the finish line, but it might not be the easiest code to understand. If you don’t understand it when you paste it into your code file, then you probably don’t understand why it breaks 6 months later and then what do you do?

Make sure to clean up the code you find on the web and simplify it as much as possible. Don’t wait to do it, because then you probably forgot exactly what it does and how. Do it right away.

7. It’s not a question of fewer code lines

I get this a lot. Apparently it is a common misunderstanding that simplicity is a question of writing fewer lines of code. That is not the case. Simplicity is about removing everything that can complicate the process of writing and maintaining software.

8. Don’t be a rock star

Know your limitations and don’t be afraid to ask for help to solve a problem. If you continue in the wrong direction because your rock star mentality doesn’t allow you to ask for help, the code you write will end in a mess. If you can’t get your head around the problem, then you should ask for help. Otherwise you are clearly not the right person to solve it.

By asking others for help, you might just learn something.

9. Learn much about much

To be able to find the simple solution to any problem, you need diversity. The ability to see a problem from multiple angles is a powerful tool in problem solving but it requires that you have a lot of angles to use.

Read books, play with new tools, languages and technologies. The more you know about different approaches to a certain problem, chances are you find the right solution a lot earlier.

10. Don’t trust your simplicity instinct

This might be the single most important rule of simplicity. The only person who can tell if your code is simple is anyone but yourself. A code review is the best way of finding out and at Traceworks, it is always the reviewer who decides whether or not the code is simple enough and if the comments are sufficient.

Over time we have all learned what gets you through a code review and therefore have begun to write simple the first time. Learning by doing.

Conclusion

Simplicity is an implementation philosophy and it’s a very important one. The list of 10 points are guidelines and there are many more, but then you’d probably fall a sleep reading them all. Congrats for coming all the way down to the conclusion btw. If you have read it all, then you probably find it interesting either because you agree or disagree. Either way, simplicity is the most important factor in my work and it has prevented many headaches in the past.

Trackback spam fighting

Published Jul 22, 2007

Recently, I joined the Subkismet project which is an open source stand-alone comment spam filtering library for ASP.NET web applications founded by Phil Haack. My task is to write mechanisms for fighting trackback and pingback spam comments. More precisely, I will be writing two base classes for handling trackbacks and pingbacks that anyone can use in their own project.

Before I got actively involved in Subkismet, I wrote a short paper on the principles of trackback spam fighting. These principles were originally used for BlogEngine.NET and now also a part of Subkismet. When the classes are done I will port the updated code back to BlogEngine.NET again.

I thought that others might be able to make use of these principles and decided to share. Here it is:

Fight trackback spam

A trackback request is a standard POST request sent to a web server. It is similar to posting back a form on a webpage in that it also sends parameters with the request. These parameters are used by the receiver to handle the request and register the trackback. The parameters are:

id – the id of the post the request tries to send a trackback to
title – the title of the trackback
excerpt – the message the sender want to send to the receiver
blog_name – the name of the sending blog
url – the url of the sender’s webpage containing the trackback link

To fight spammers, we can analyse many different things from the information received in the request parameters above. This document tries to provide a basic introduction into the analysis and what measures to take in case the sender is a spammer.

Confirm the sender

When a trackback request is sent to a trackback enabled website, the website has the ability to validate the sender before accepting the request. The sending website has to have a link to your website; otherwise it is not a valid trackback according to the specifications. To make sure that it does, you can follow these steps.

1: Trackback request received
2: Check the sending website for link
3: If link is confirmed, register the trackback.
4: If link is NOT confirmed, end the response and send HTTP status code 404.

The reason why the response has to end if the sender is not confirmed is because there is no point in telling the spammer whether or not we actually support trackbacks. The clever solution is to send a status code 404 back to the spammer, indicating that it makes no sense trying again because the trackback handler does not exist.

Here is an example in C# 2.0 that shows how to examine the sender’s webpage:

private bool IsSenderConfirmed(string sendingUrl, string receivingUrl)
{
try
{
    using (WebClient client = new WebClient())
    {
      string html = client.DownloadString(sendingUrl);
      return html.ToLowerInvariant().Contains(receivingUrl.ToLowerInvariant());
    }
}
catch (WebException)
{
    return false;
}
}

This technique is very basic but maybe the most important factor for fighting spammers. However, there exist link farms with the sole purpose of beating this approach, so there is a need to be even stricter.

Restrict the number of allowed trackbacks

When a spammer finds a website that allow him to create trackback spam, he will keep on doing so with as many trackbacks as possible – maybe over time so you won’t notice it right away. That’s why it is very important to only allow 1 trackback per sender per post.

After the sender has been confirmed the trackback handler must now check if another trackback from this sender has already been registered. If so, the sender must be rejected nicely because he might not be a spammer.

Because a trackback spammer uses multiple websites, user agents and IP addresses to bypass spam filters, the handler must use all information possible and check for them all individually. Two different spam requests might come from the same IP address, but with different referring websites. Make sure to check both.

Now the flow looks like this:

1: Trackback request received
2: Check the sending website for link
3: If link is confirmed, register the trackback according to specs
4: If link is NOT confirmed, end the response and send HTTP status code 404
5: If sender has been registered before, nicely decline the request according to the specs
6: If sender has NOT been registered before, register the trackback according to the specs

Check for URL’s

The request’s excerpt – the trackback message – has to be checked for suspicious content. A spammer always tries to send URL’s so that your visitors might click on them. That’s the purpose of trackback spam. If the handler receives an excerpt with a URL it raises the chances of the sender being a spammer, but it is not a certainty. If it receives 2 or more URL’s, then it almost certainly is a spammer and should be rejected.

You can use this method to determine how many URL’s the excerpt contains:

private static int UrlCount(string excerpt)
{
string pattern = "((http://|www\\.)([A-Z0-9.-]{1,})\\.[0-9A-Z?&=\\-_\\./]{2,})";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
return regex.Matches(excerpt).Count;
}

If a URL is embedded in a HTML link tag (<a href=”example.com”>link text</a>) it certainly is a spammer. No blog engine sends HTML in the trackback message, so this is a clear indication that it was sent by a spammer.

To find out if the excerpt contains HTML, you can use this method:

private static bool ContainHtml(string excerpt)
{
string pattern = @"</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>";
Regex regex = new Regex(pattern, RegexOptions.Singleline);
return regex.IsMatch(excerpt);
}

The flow now looks like this:

If you have any other ideas for fighting trackback spam, please tell me so we can make Subkismet as bulletproof as possible.