18 May 2012

Opinions and Realisations: Software Development Goal Shift

Abstract

 

When I was still too young to understand fully what developing software for a living would mean, I was anxiously engaged in the cause of furthering my knowledge in the field of Computer Science. I was first introduced to the concrete subject of Computer Science in seventh grade, and I remember it well.

At a ripe young age of nine (granted many are learning younger these days), I stumbled across a book that my parents had received from my grandparents titled, "GW-BASIC Reference Manual" when they gave us their old computer, a Packard Bell Microcomputer with an Intel 80286 processor, 64 kB or RAM (static type, and at some point I received a 1 MB expansion card, but the card was too large for my computer's box, so I never installed it), a 33.3 MB IDE hard drive, and two floppy drives - one 5.25" single-sided floppy and one 3.5" single-sided floppy. Those were the days. I don't fully know what sparked my interest in the topic of programming, but this book definitely helped my move in that direction.

Even in those days, my computer was considered slow. Moore's Law was rampant in the marketplace. But I was more than content - I had reasonable power at my fingertips with my clicky keyboard, no mouse to speak of (but I was using MS-DOS 6.22 anyway), I was armed with tools like DEBUG.EXE, GW-BASIC, QBASIC, EDIT, and later I worked with Microsoft's C Compiler.

I had several goals in learning how to program:
  • Make the computer do what I wanted it to do, in the way I wanted it to do it.
  • Make video games.
  • Make animations (those were pretty much programmed in those days)
  • Learn about software and file types (like images)
  • Make video games.
  • Program the Internet.
  • And many more...
When I started programming in junior high and high school, I was taught to think about solving problems. A lot of focus was around mathematics, puzzles, and visualisations, like three-dimensional models and other mathematical models. I learned how to apply my programming skills to a problem space, and then how to utilise my program to solve problems. In junior high I even began to develop a system that could tell its user what colour a strip of paper dipped in beat juice certain bases and acids would change it to. I was that interested in solving problems.

In high school I had the opportunity to solve a real problem - not that it was that big of a deal (to me, but to my teacher it was a big deal). My teacher, running DOS on her classroom teacher's computer, somehow accidentally deleted her AUTOEXEC.BAT file. Two of us in the class were pretty adept at computing and solving problems, and we were both in the same Computer Science class in school, so thinking maybe I could prove something, I allowed my classmate to try first to fix the problem - he spent ten minutes or so, and couldn't figure out how to solve the problem, but he was used to using MS Windows. Then I was up, and since I was used to DOS, I felt right at home. First I located the program that she needed to run - that was the real problem, that the program she usually ran wasn't running, because it's executable wasn't in her PATH variable anymore. After locating it, I tried the EDIT command - nothing, then I tried HELP - nothing, so then thinking like a problem solver I tried GWBASIC, and voilĂ ! I was able to write a program that rewrote her AUTOEXEC.BAT file in the correct location. We executed the AUTOEXEC.BAT file and the program my teacher ran was again runnable by her.

I want to place emphasis on the fact that software is a tool for solving problems. It effectively enables us to solve problems much more quickly and effectively than by doing the same problem solving manually. Software controls automation that enables us to repeat tasks perfectly. Software does things that we don't want to, or that requires a skill that not all people are good at. Software helps us fight wars better and helps us keep peace better. And software entertains us. With all of the problems in the world that need solving, what would we do without software?

Software Development Goal Shift

 

In recent years I have noticed that, having worked in the information technology industry as a software tester, software developer, software architect, and software engineer, the goals of software development have shifted greatly, especially in commercial software. While I believe there are some good reasons for this shift, I don't think that the shift is good for IT or for the software customer.

This shift has been generally from solving problems to solving requirements. "How is that different," you may ask. Well I don't think that it is much different, except that requirements don't always solve the problem at hand. Let me share an example from my past.

Recently I worked for a company who has an international business. Their online presence internationally is not as well developed as their domestic presence online. The international web sites don't provide as much functionality including: no online ordering, a poor representation of the online catalog (some products are restricted internationally), and  an overall bad experience for anybody using the international web sites. To solve this problem, we decided to make a goal to create a single web site that provided all of the domestic functionality for international users. The problem space was clear in my mind.

We were a LAMP, with the addition of Oracle ERP, shop, having at least 30 web sites running on the LAMP stack. We were instructed in the requirements for the new system to implement it using Microsoft SharePoint. Internally and externally I battled with this requirement. The requirement did not fit the problem space very well. My leaders argued otherwise.

They suggested that, "SharePoint provides multi-lingual support out of the box." I argued that multi-lingual support was already to be had internally - we just needed translation files, and the filters were easily developed, and that web site design was what needed to change in order to support this feature.

They suggested that, "SharePoint can be 'programmed' by business people more easily, like in order to create landing pages, and marketing pages." I argued that business people shouldn't be programming, as they don't understand web standards, or the need for standards in web marketing, and they'd probably just use flash movies, which isn't fully supported by all systems anymore (like IOS-based devices).

Generally speaking, these requirements seem to have come about because my supervisor liked SharePoint, and perceived it to be a silver bullet of sorts. Especially suited somehow for solving the problem of an international client-facing web sites. I disagreed because of several reasons, one being the cost of retooling and training. I have left since the decision was finalised and many other left as well, and the result has been that they have hired multiple consultants and paid for a lot of training in order to implement the systems. Not that such a scenario is uncommon in the software industry, but retooling a system because on solution seems to be more shiny than another solution is not good for business - it's at least expensive.

 

Conclusion


So I wonder why this has been the case? Is the business too involved in software development? Are software developers not ballsy enough to let their voices be heard? Why after all of these years of software development is there still a search for "silver bullet" software solutions?

Recently I learned something interesting about the software industry in the United States, that I did not know. In the West, such as in the areas of Silicone Valley and the Silicone Slopes, we are creative minds. We create software and build software solutions, often from scratch. We build commercial and open source software, and a lot of it, enough to have a lot of conferences surrounding these ideas. In the East, which I have never lived in, apparently they do not build software as much, rather they integrate existing software, and often as many of us in the West know, integrations are not automatic or easy, but this idea of integration over innovation seems to be creeping into the West more and more.

I haven't been in college for but a few years (about four, I think), but I assume that teachers of Computer Science courses are still teaching problem solving. Maybe it depends on the school. Problem solving isn't as much of a topic if you are writing video games, like they teach you to do at new-age CS universities and colleges like Neumont. In fact I find that theoretical Computer Science isn't nearly as common as it used to be - I used to hate theoretical Computer Science in high school - writing instead of programming was annoying, at best, but I wrote a great number of programs first without a computer, and very successfully, and solved a lot of problems before sitting down at the keyboard. Now that I'm all "grown up" I'm realising that the theoretical foundations that I was taught in high school, and somewhat, though not as much, in college have given me more foundation to build on for my career.

So that's it! How do we keep our programmers from becoming click-and-drag aliens like the customers? For a select few I think it is still a priority to understand and study the foundations of Computer Science, but many a business person doesn't understand software or software development well enough to save his company. What can be done to solve this problem?

16 March 2011

How to Write Software Right

I have been working on, testing, developing, breaking, engineering, architecting, designing, and creating software for over twenty years. Seven and a half of them have been professional years. I have been in school, participating in Computer Science related activities for seven of my twenty years. I have ready thousands of books, essays, tutorials, forums, blog posts, and reports on various subjects related to all aspects of the IT industry. I have worked with eight companies, on five open community projects run by others, more than ten open source projects of my own, with hundreds of coworkers both domestically and abroad, and on several contracts both of my own initiation and because others wanted my help. Also, I participate casually in four local programming language and API user groups, a couple of global API user groups, and have attended Microsoft Tech Ed, and Agile Roots conferences. My point is that I have worked a fairly large percentage of my life on software, and I have experienced a lot of different economic situations, team and solo environments, and software communities. And in more than eighty percent of my overall experience, the resulting software has been unacceptably buggy, politics play a greater role as to when software is complete than discrete quality does, programmers don't have time and in many cases don't have the experience to engineer software properly, and management, of both software activities and people, is poorly executed, in part due to lack of understanding the problems being solved, the solutions to those problems, and also and again due to lack of experience.

The Problem, As I See It

The problem as I see it, has a lot to do with management. As projects grow, management becomes a greater necessity. This is well-founded, but projects need the right kind of management. The right kind of management is an aspect of Software Engineering that in my mind has yet to become fully understood. MBA students learn about strategies for being successful in the marketplace. While these strategies work well in supply-demand situations where the product is a physical, tangible commodity, the same strategies do not work as well when the product is abstract and service-based, as is the situation with software. Software does not follow supply-demand rules in the same way as commodities like gasoline, vegetables, and fast food. In software, the commodity isn't what the end-user gets, rather it is what the producer has. In software the commodity is not the software, it is the programmer, the software architect, the technical writer, the software engineer, and the software quality assurance professional. And that commodity needs more than just carnuba wax to preserve it. Programmers need training, motivation, and benefits. Good software engineers need to know that they are trusted and entrusted with the final product. Software architects need to be listened to and adhered, not downplayed and criticized. I know a good number of these commodities that are extremely well-versed and know what they're doing, but when their skill-set is undercut by the bottom line, then their performance suffers, and as a result so does the product that they are working on.

A sub-problem of the management problem seems to me to be the bottom line, and while just like any in other commercial industry, the product must be profitable to survive the market, the commodity, the worker must be able to engage in the highest possible quality work to produce a profitable product. When a consumer purchases an engineered product other than software, such as an HD television, an electric mixer, or an automobile (excluding the software-driven components), it is expected to be of the highest quality.

A Case In Point

Nine years ago, my wife purchased a two-year old 2000 Chevrolet Malibu sedan. Chevrolet is generally a good company that produces high quality vehicles at an astounding rate and at a lower cost to the consumer than many companies. On the other hand a company like BMW has been producing superiorly engineered high quality products for years as well, but their vehicles cost significantly more than a Chevrolet. As an example a 2010 Chevrolet Malibu four door sedan has an invoice cost of $\$$20,733, while a comparable 2010 BMW 3 Series four door sedan has an invoice cost of $\$$30,500, costing the consumer about 47% more at the bank. The BMW 3 Series is a quality vehicle. The Chevrolet Malibu is a quality vehicle. And while I have never driven or owned a BMW 3 Series, somehow I expect that it has never had any of the engineering problems that my wife's 2000 Chevrolet Malibu has experienced: turn signals stop working unless you jimmy the emergency signal button (why? because the connection between the turn signals, emergency signals, the emergency signal button, and the turn signal wand was soldered together), brake pads and rotors need to be replaced every three to six thousand miles (professionals expect to change brake pads and rotors on vehicles around 20,000 to 50,000 miles depending on usage, but on my wife's Malibu they really wore out that fast, and we were purchasing performance grade pads), and finally several hose fittings to the cooling system burst, costing hundreds of dollars in repairs, hundreds of dollars in coolant, and a highly pressurized system that continued to fail. Mechanics were finally able to solve the turn signal issues with my persistent calling to GM and complaining about it. After reading on the NHTSA's web site about how many people had complained about it, and how many had received notice of a recall, I decided to find out why I hadn't received notice of a recall, and their cop-out answer was that my vehicle hadn't been manufactured at the same plant as those vehicles that were recalled for the exact same turn signal problem, same year, make, model combination. After several months I finally received a letter stating that they would repair the problem for free at a certified Chevrolet mechanic, but that my situation was not considered a recall. What was the real reason for the problem? In one word, management. Either somebody didn't manage the quality of the product at its inception, or somebody didn't manage the quality of the product at production, or both. Either way, I'm certain that any electrical system vehicle engineer would tell you that a soldered connection would not last very long under the stress that a motor vehicle would put on it. Whether they were given time to engineer it properly is a good question. But on the front of vehicle manufacturers, quality appears to be suffering more and more. Especially recently were several Toyota recalls and similar recalls and issues by other manufacturers over the years. These types of issues are problems that can be solved by careful engineering, and quality assurance through testing and continuous integration. Motor vehicles should be higher quality than this, especially for as expensive as they are, but also because lives depend on that kind of quality.
Most of the time software isn't so system-critical, but sometimes it is even more system critical. Imagine if a commercial airliner suddenly lost altitude because the autopilot software thought is was nearing time to land, while in fact the airliner was flying in the middle of the ocean with no place to land, or over a jagged mountain range. Imagine if a space shuttle computer suddenly locked up causing a fatal crash due to a software defect. Luckily that didn't happen. Imagine if an automated cannon suddenly discharged on its own military troops due to a software defect, killing some. While most of us in the IT industry don't work on projects of the life-threatening system-critical nature of these incidents, some do, and we shouldn't be blaming anybody but ourselves when these sorts of costly problems occur. If management is getting in the way of quality, then we need to step up and let them know, and for those MBAs out there, you need to know that the world doesn't happen according to a text book.

Acceptable Defect Density

"The software needs to get released at some point,"
your managers are certainly saying. And I completely agree. If you don't release software, then you won't be able to get a return on it. So is there an acceptable number of defects or issues that production software can contain? I don't feel like it's fair to lay down an IT-industry standard, rather each industry using software to control any portion of their products must define their own standard. Better yet each company must strive to the highest standard possible. In my opinion zero defects is the most acceptable number of defects in production ready software. It is pretty unlikely that all defects can be averted in any software project, but I believe that a 99% defect free rate based on the size of the system (however you choose to measure that, i.e. defects per lines of code), is a realistic goal. How do we attain a 99% defect free rate? The answer, though simple in my opinion, is not necessarily easy to implement. The answer is that you need to fully engineer software. "Fully engineering" means that all parties involved in creating the software need to participate in analyzing the problem you are trying to solve and the risks associated with it. Management needs to be there so that they can help make the problem clear. Project management needs to be there so that you can understand what kind of resources are available to solve the problem. Developers, architects, engineers, and technical writers need to be there so that the project is understood by those who are actually solving the problem and doing the work. And the problem and solutions need to be discussed and developed until the end goal is clear in everybody's mind. Now I don't mean have meeting after meeting and fully document every single caveat or problem situation possible in the system, but don't be satisfied with the highest level of explaining what is wanted. Give enough detail that questions can be asked. Answer the questions in a way that satisfactorily resolves any concerns. And always leave your door open in case issues arise during the development process. After that comes the development, and then come the real challenges. It has been said that in order to fully understand how to solve a problem with software, you must begin solving it. Even with tools like programming language knowledge, computer architecture knowledge, design patterns knowledge, and all of the experience and wisdom in the world, every problem solved with software is different. You might even come to solve the same problem in a different way given the nature of the reason for solving the problem.
Back to defect density in the real world, Steve McConnell, author of Code Complete, and his blog IEEE Software, says that [Bibliography-1]
"[A software company's] task is treacherous, treading the line between releasing poor quality software early and high quality software late. A good answer to the question, 'Is the software good enough to release now?'" can be critical to a company’s survival.
Once again the problem with the software industry is fully exposed. Why can't we just write software correctly? I recently read a compilation of reports organized for the United States Air Force pertaining to systems engineering needs by Edward R. Comer of Software Productivity Solutions, Inc. It is not surprising to me that their conclusions are the same as the conclusions of most of the industry. For example in the Needs Survey of the "1975 NRL Navy Software Development Problems Report", the "result of a year-long investigation into Navy software problems...based on interviews with...people associated with Navy software development" the following twelve recommendations were made [Bibliography-2]:
  1. Unify life cycle control of software. Development responsibilities for a system should not be split, and maintenance activity should not be independent of development activity.
  2. Require the participation of experienced software engineers in all system discussions. This is especially crucial for early decisions such as the determination of the system configuration, assignment of development responsibility, and choice of support software.
  3. Require the participation of system users in the development cycle from the time requirements are established until the system is delivered. Changes which are inexpensive and easy at system design time are often extremely expensive and difficult after the software has been written.
  4. Write acceptance criteria into software development contracts. This will help avoid unnecessary misunderstandings and delays for negotiation before a system is delivered.
  5. Develop software on a system that provides good support facilities. If necessary, consider developing support software prior to or in conjunction with system development.
  6. Design software for maximum compatibility and reusability. Premature design decisions should be avoided; logically related systems should have their differences isolated and easily traceable to a few design decisions.
  7. Allocate development time properly among design, coding and checkout. Since manpower-allocation estimates are based in part on the time estimates for different phases of development, improper estimation can be quite expensive.
  8. List, in advance of design, all areas in which requirements are likely to change. This can be done at the time requirements are stated and will help the designer partition the software to isolate areas most likely to change.
  9. Use state-of-the-art design principles, such as information hiding. Principles which optimize reliability, cost reduction, and maintainability should be emphasized.
  10. Critical design reviews should be active reviews and not passive tutorials. Sufficient time must be allowed to read design documents before the review, and the documents must be readable.
  11. Do not depend on progress reports to know the state of the system. Programmer estimates are typically biased; milestones are a more accurate indication of development progress.
  12. Require executable milestones that can be satisfactorily demonstrated. Milestones demonstrating system capabilities that will rest on major design decisions should be written into development contracts.
These twelve recommendations follow suit with what many people in the industry state are needed, and what many software development processes try to provide. Perhaps the problem with software processes however is the same exact problem that generally exists with software, there is no silver bullet! I have worked with several companies and groups who have use Agile as part of their process. One of the attractions of the Agile software development process is that there are several different methods of implementing it. Back when Agile was young, there were many schools of thought pertaining to the Agile development methodology that said, "if you're not doing this, then you aren't Agile". More recently Alistair Cockburn basically told us, at an Agile Roots conference held in Salt Lake City, UT in July of 2010, that "Agile is agile", and there is no wrong way to implement the Agile software development methodology. More importantly, implementing only the pieces of different methods of Agile that are important to your organization is the best way to implement Agile. For example, make daily stand-up meetings part of your day-to-day process, and make visibility into your projects transparent. Those are two pieces of different methods of Agile. But again processes don't solve software problems, people do. And if a process or bureaucratic red tape are causing your project to fall behind, then get rid of them. Process are meant to help teams with little or no direction to make direction in my opinion. But teams outgrow processes on a daily basis. Making a team conform to a standard that isn't working for them holds back the potential of that team.

And Finally There's Quality

Software quality assurance is becoming a larger part of software in the industry. More and more various software development shops are opening up quality assurance departments in their companies. But this trend does not reveal that software is coming out at a higher quality, rather in underlines the issues that most companies have experienced in the past, and will continue to experience with quality. The companies building up their QA departments are just the ones who realize that quality is becoming a bigger issue than they can muster on their own, without dedicated resources.
How many companies would be willing to release their software into the wild even if it had a defect to lines of code ratio of 10%? What if your QA department didn't sign off on it and told you that you would lose more money by releasing it now with the 10% defect ratio than you would by releasing it late with a 1% defect ratio? Would you still release it? In my experience companies tend to release on time with a higher defect ratio than what is internally acceptable more often than not. So where is the problem again? QA is telling you not to release it. But why? Because they know that the risk of needing to refund customers or do non-billable work is higher. Well QA should be listened to now. Understanding assessments made by QA should not be regarded any less than making sure that your developers know how to develop, that your architects have designed a complete and well thought-out product, or that you're implementing time-tested design patterns that have been proven to be correct. If you have a QA department, then you probably realize that you have a problem with your software. It's also probable that you don't know where the problem is. It is estimated that it is far less expensive to design software well and completely in the beginning than to need to rely on customers to find your issues. It is still better if an internal quality assurance pass finds the problems before you release the software. If you have a QA department and you aren't listening to their advice, then you might as well not have a QA department, because now you're probably losing at least one and a half times the money - you're still relying on customers to find the issues before you'll fix them because you're downplaying the idea that what your QA department has to say is valuable. Of course there's always the other side of the coin that says that their utilization of a software quality assurance team is in order to find the defects that their design, unit testing, and code reviews didn't find. Steve McConnell provides us with some insight in his book, Code Complete [Bibliography-3]:
...software testing alone has limited effectiveness -- the average defect detection rate is only 25 percent for unit testing, 35 percent for function testing, and 45 percent for integration testing. In contrast, the average effectiveness of design and code inspections are 55 and 60 percent.
In conclusion engineering software is difficult. There are bad ways to do it, good ways to do it, and better ways to do it. My utopious dream of one day working on the perfect software project, may always remain a dream. But if we don't start working smarter on software than working harder, then there are going to continue to be huge consequences that are bad. Software isn't going away, and it will only continue to become more complex. Software will continue to be used in more and more applications. And software will only become more correct and defect-free if professionals in the software industry become better educated, and try harder to make software the right way.

Bibliography
  1. Steve McConnell, "IEEE Software", [http://www.stevemcconnell.com/ieeesoftware/bp09.htm]
  2. Edward R. Comer, "System Engineering Concept Demonstration, Systems Engineering Needs", Copyright © 1992, Software Productivity Solutions, Inc.[http://www.dtic.mil/cgi-bin/GetTRDoc?Location=U2&doc=GetTRDoc.pdf&AD=ADA265468]
  3. Steve McConnell, "Code Complete", third party quotation from book: http://www.codinghorror.com/blog/2006/01/code-reviews-just-do-it.html, Code Complete 2nd Edition Home Page, On Amazon.com

21 October 2010

Patterns and Practices: Scope and Type Inference Through Syntactic Sugar

Code Complete

Once again I'm referring to Code Complete. Code Complete taught me a lot of things and coding style is one of those that I hold important to this day. Let me preface this by stating that Code Complete was built around C++ and later C# coding, still I believe that utilizing coding style that is indicative of scope and usage makes programs more maintainable. I also believe that more maintainable code can more easily be proven to work and be proven to be correct. Many programming languages, especially dynamically typed programming languages, which are typically scripting languages, use syntax to identify scope and possibly type. Here are some examples:

From C++
#include <iostream>
#include <string>
using namespace std;

class MyObject
{
private:
 string name;
public:
 MyObject(const string & name);
 string getName();
};

// The :: token indicates the relationship of this function definition.
MyObject::MyObject(const string & name)
{
 this->name = name; // this indicates the relationship of the left name versus the right name.
}

string MyObject::getName()
{
 return this->name;
}

int main()
{
 MyObject object("Object 1");

 cout << "First object's name: " << object.getName() << endl;

 return 0;
}
From Ruby
class MyObject
 attr_reader :name # The : prefix indicates that name is a symbol.
 
 def initialize(name)
  @name = name # The @ prefix indicates the relationship of the left name versus the right name.
 end
end

object = MyObject.new("Object 1")

puts "First object's name: " + object.name
From Python
# Tabs indicate containers in Python - there are no curly braces or end statements.
class MyObject:
    def __init__(self, name):
        self._name = name # self and _ indicates the relationship of the _name to the class.
        
    def getName(self):
        return self._name
        
object = MyObject('Object 1')

print "First object's name: ", object.getName()
From Perl
package MyObject;

sub new {
 my $class = shift; # The $ prefix indicates that the variable is a scalar.
 my $self = {
  _name => shift
 };
 
 bless $self, $class;
 
 return $self;
}

sub getName {
 my($self) = @_; # The @ prefix indicates an array.
 
 return $self->{_name}; # The { and } dereference hash values by key.
}

package main;

$object = new MyObject("Object 1");

print "First object's name: " . $object->getName();
1;
Anyway hopefully you get the point that many languages hint in some way how to use certain keywords. C++, a heavily typed language, requires you to signify who a method belongs to. The main or global context is inferred, but a class context requires ::. In Ruby a single colon, :, indicates a symbol, and you can reference that symbol in a class via @symbolName. Python requires tabs to indicate contained code or code in a container, such as a function or a class. And finally in Perl (where everything is a string) there are several indicators or what kind a variable is or how to use it, for example $ = scalar, % = hash, @ = array. All of these things are essentially syntactic sugar.

...and Scope Inference

In Code Complete there are several hints to infer scope in a program. Some of what I am about to show you were developed somewhat more by a team I worked on recently.

SyntaxConditions
PascalCasing
  • Namespaces
  • Class names
  • Method names*
  • Properties and public fields
IInterfaceName
  • All interfaces
TGenericType
  • All generic types
  • All template types
_camelCasing
  • Private class-scoped fields
__camelCasing
  • Private class-scoped static fields
camelCasing
  • Method arguments
  • Method-body-scoped variables
  • Method names*
* this may vary depending upon language
Using these scope hints it is very simple to see what the purpose and scope of each object is. For example if I see a variable named _name then I instantly know that it is privately owned by a class, or in the case of the Python example I know that I should not change it manually (Python doesn't have a private class scope).

11 February 2010

Patterns and Practices: Valid Reasons to Create a Routine

Code Complete

Several months ago I finished reading Steve McConnell's Code Complete 2nd Edition. I learned a lot from it and took notes. Some of what the book contained were methodologies that I already knew and used on a daily basis. Other parts of the book opened my eyes to a higher road in programming. My goal in reading the book was to become a better programmer, a better program designer or architect, and to better understand the reasons behind doing things a certain way. Especially helpful was the fact that most of the practices described in the book were practices we had implemented internally at my current job at the time. So aside from learning new things, they were new things that I was expected to know.

As I said, I already bought into a lot of the material of "Code Complete" before I had read it, but there were some things that I did just because, yet I knew there must be a good reason why I did them. In this next series of posts I am going to review parts of the book by reviewing my notes. I recommend that every programmer read this book. It really applies to everyone at every level, and every programming language. Nobody is exempt. In the past I have talked to some programmers of certain languages and they tried to convince me that "Code Complete" was only for C, C++ and C#. Those are the primary languages referenced in the book, but "Code Complete" is deeper than programming language choice. As I review my notes I may draw information from other sources as well. Please bear with me as I strive to convey my new knowledge in a meaningful fashion.

Valid Reasons to Create a Routine

For all intents and purposes the term "routine" here means any language construct the resembles (per the language) a function, procedure, sub-routine, method, and in some cases accessors and mutators (getters and setters).

  1. In order to reduce complexity

    The first reason that Code Complete gives that a programmer may choose to introduce a new routine is to reduce complexity. One of the general examples given was a situation when you might have a nasty long elaborate conditional expression that you want to pop into an if statement. Here is an example:

    namespace Nathandelane.Examples.Conditionals
    {
     public class ShowDifficultConditional
     {
      public ShowDifficultConditional()
      {
       string name = "Nathan Lane";
       string position = "Software Developer";
       int age = 29;
       
       if (name.StartsWith("Na") &&
        name.Contains("La") &&
        (position.Contains("Dev") ||
         position.Contains("Quality"))
        && (age < 35))
       {
        Console.WriteLine("{0} is a young {1} at age {2}", name, position, age);
       }
       else
       {
        Console.WriteLine("{0} is not such a young {1} at age {2}", name, position, age);
       }
      }
     }
    }
    
    I define a "elaborate...conditional expression" as any conditional expression that combines more than a single conditional operator. In this case I have five conditional statements of which two are their own compound conditional. (Aside: I don't normally like to stack my conditional statements, but because this blog has a relatively small amount of horizontal real estate, I chose to display them this way, which in my opinion is pretty ugly. But ignore it for now.)

    The principle of creating a routine to reduce complexity extracts a nasty long elaborate conditional expression into its own routine and replaces its original nastiness with a call to that routine. So in the case of the above, following this principle might look something like this:

    namespace Nathandelane.Examples.Conditionals
    {
     public class ShowDifficultConditional
     {
      public ShowDifficultConditional()
      {
       string name = "Nathan Lane";
       string position = "Software Developer";
       int age = 29;
       
       if (NastyConditionalIsTrue())
       {
        Console.WriteLine("{0} is a young {1} at age {2}", name, position, age);
       }
       else
       {
        Console.WriteLine("{0} is not such a young {1} at age {2}", name, position, age);
       }
      }
    
      private bool NastyConditionalIsTrue()
      {
       bool result = false;
    
       result = name.StartsWith("Na");
       result = result && name.Contains("La");
    
       bool subResult = (position.Contains("Dev") || position.Contains("Quality"));
    
       result = result && subResult;
       result = result && (age < 35);
    
       return result;
      }
     }
    }
    
    Now as you can see the nastiness isn't really gone, but the code complexity has been reduced drastically, and now I can deal with problems in that conditional expression without worrying about making sure that the if statement is formatted correctly, because I can see that it is without thinking too much about it.

    While this principle is cool I wouldn't recommend using it too often. If you have an ugly conditional expression, you may want to re-evaluate your program in general. For example a while back I wrote a parser for a calculator program that had a huge dispatch conditional block that consisted of about thirty nasty embedded conditional blocks. I reworked it several times and it just turned out to always be a mess, and there were always more "if's" and "else if's" to add in. So I changed the whole thing over to the state pattern, did away with all of the conditional-guess-work, and simplified work work greatly. In a way I did follow this principle, but to a greater degree than simply creating a new routine; I created a new class of routines.

  2. In order to introduce an intermediate, understandable abstraction

    In my HGrep program I have a lot of elaborate configuration settings. The software itself has 19 documented command-line argument options, some of which have multiple sub-options. The optional command-line arguments that are available is just one method of abstracting the interface for the program into intermediate, more understandable abstractions.

    To better accommodate these abstractions in the code, I create separate methods that provide settings to the agent for each of the possible command-line argument states. The immediate benefit of this is that if one of my arguments changes for some reason, then I can accommodate that change simply in the code by making a change to the method that corresponds to the argument.

  3. In order to avoid duplicate code

    This one should be a given to anybody who has been programming for more than a year, even if you're just a hobbyist. Duplicating code is generally speaking a great big no-no. Not that it's really that bad, except when it comes to maintenance. Think about it. What if you had duplicated six line of code 12 times in your program, and one day you decided that two of those lines needed to change. You would really need to make the change to 24 lines, and what would happen if you missed or forgot one? Ideally you would extract those six lines of code into a single routine, and then call that routine in those 12 places in your program. Then when your two-line code change came along, you would only need to worry about changing those two lines of code.

    See how simple that it? It even promotes increased productivity and reduces the likelihood of you making a mistake while making such a change and therefor introducing new defects.

  4. In order to support sub-classing

    This principle is an indicator of another of the great design patterns. Allowing for a hook routine is that pattern. Hook routines are routines that may be called by a process, but perhaps doesn't do anything unless it is implemented by the sub class. Basically the unimplemented routine is the hook, and an inheriting class can override that routine to allow for special usage. Hooks

    Hooks allow for dynamically class-enhancing child classes to have a little bit more say in what goes on behind the scenes. One example that I can remember from "Head First Design Patterns" is a pizza store program. This program allowed for the pizza store to implement its own toppings for its pizzas but maintain the same process of making the perfect pizza. The pizza store base class took care of that process, but each implementing pizza store had the ability to add certain toppings, like a special sauce or cheese by overriding certain addTopping() routines.

  5. In order to hide sequences

    Probably the most common sequence you might find in programming (especially object-oriented programming), that is also important is the initialization of an object. In object-oriented programming, objects are initialized by a class constructor, which is a special routine found in the class. The constructor is where we usually initialize instance variables, and we can call other private methods to do special tasks if one of the constructor arguments is a state variable.

    Wanting to hide a sequence from the developer using your API is an honorable desire in creating routines. Above I talked about hooks, which are routines that are found in sequences but are left to the implementor to take care of. These hooks, when implemented, may cause minor or significant changes to a sequence. But the sequence is hidden in another routine that ensures the implementor doesn't mess with the overall sequence.

  6. In order to hide pointer operations

    In low-level programming languages or programming languages that provide access to the lower levels of operating systems such as C and C++ pointer operations are exposable at the rawest level. Pointer operations can often be confusing, and as such should be encapsulated so that they are more easily maintainable. Some such pointer operations might include allocating and de-allocating memory, instantiation of a class or struct,or even the de-referencing of a pointer. These can all be simplified by wrapping their operations in a high-level routine.

  7. In order to improve portability

    Most programming languages are portable across comuter architectures to some extent, but almost all programming languages have some caveats. Some examples might include the inclusion of Win32 API extensions in order to better support Windows 32-bit architectures, or the Curses library to support Intel's console extensibility. One more common instance that we might see for using a method to improve portability comes from JavaScript. JavaScript, though an ECMA standard, still roughly comes in at least two different flavors: Internet Explorer and everything else (stadards-compliance). Because of this many things still differ to some extent across these two general platforms of browsers. Let's take for example th method of attaching events to an object.

    /**
     * alertMe is a function that calls alert with the object passed to it.
     * @param {event} e
     */
    function alertMe(e) {
     alert(e);
    }
    
    someElement = document.getElementById("someElementsId");
    
    if(someElement.attachEvent) {
     // Internet Explorer's method of attaching events:
     someElement.attacheEvent('onclick', alertMe);
    } else {
     // W3C's standard:
     someElement.addEventListener('click', alertMe, false);
     // The third argument above specifies whether the event should bubble up. No control on the event attachment in IE for this.
    }
    
    Now I am certain that I would not want to type that code every time I wanted to attach an event to a particular element, so I might do something like this:
    /**
     * myAttachEvent attaches an event handler to an element based on the
     * browser compatibility.
     * @param {element} element
     * @param {string} eventName Like 'click' or 'mouseover'
     * @param {function} callback
     * @param {bool} bubbleUp (Optional) Whether or not the event should bubble
     * up in the browser (not supported in IE)
     */
    function myAttachEvent(element, eventName, callback, bubbleUp) {
     if(!bubbleUp) {
      bubbleUp = false;
     }
     
     if(element.attachEvent) {
      element.attachEvent('on' + eventName, callback);
     } else {
      element.addEventListener(eventName, callback, bubbleUp);
     }
    }
    
    So now I have a function that will consistently attach events to elements across all [modern] browsers, and it even handles bubbleUp as an optional argument. Most of the time we don't want events to bubble up, so if the argument is null, I set it to false. Once again this is a very good reason for creating a routine, and I use it regularly.

  8. In order to simplify boolean tests

    Several times in my experience in programming I have had to deal with the inevitable large list of boolean tests for a single result.Sometimes when you experience this, it means you have a design flaw. However design flaw or not, you have to deal with them. Most languages also offer boolean short circuiting, which makes large boolean tests simpler, but large tests can still become overwhelming. Short ciruiting in boolean tests means that tests are read from left to right and perenthesised conditionals are read in order of outer-most perentheses, so no race conditions apply. Because of short circuiting, the results of a test like this are easy to determine:

    if(true && !false)
    {
     Console.WriteLine("True");
    }
    else
    {
     Console.WriteLine("False");
    }
    
    This expression gets true read first, but if true fails, then !false is never evaluated. In C and C++ short circuiting isn't guaranteed to work, so if you want to ensure that the above appears to get read in order then you have to write:
    if((true) && !false)
    {
     printf("True");
    }
    else
    {
     printf("False");
    }
    
    Anyway back to complex boolean expressions, lets say that you had a number of criteria pertaining to an employee used to determine the number of paid days off they receive during a single year: years with company, number of hours worked per week, employment status (full-time, part-time, contractor, intern), and whether or not they are enrolled in the incentive program. Now let's pretend like you put all of that into a huge conditional (this is obviously bad programming but I'm trying to make a point with what little I have to go on):
    int paidDaysOff = 0;
    
    if(_yearsEmployed > 5 && _weeklyHours >= 30 && (_employmentStatus ==
     Employment.FULL_TIME || _employmentStatus == Emploment.PART_TIME)
     && _incentive == true)
    {
     paidDaysOff = 10;
    }
    ...
    
    While this long conditional is not extremely complex, just the fact that it is long though makes it a good candidate to ensure ease of maintenance and readability:
    bool emplyeeDeserves10DaysOff()
    {
     bool result = _yearsEmployed > 5 && _weeklyHours >= 30 && 
     (_employmentStatus == Employment.FULL_TIME || 
     _employmentStatus == Emploment.PART_TIME) && _incentive
     == true;
     
     return result;
    }
    
    int getPaidDaysOff()
    {
     int paidDaysOff = 0;
     
     if(emplyeeDeserves10DaysOff())
     {
      paidDaysOff = 10;
     }
    }
    
    See how much more readable that is?

  9. In order to improve performance

    The final good reason on this list to create a routine is to improve performance.I have had little to no experience with this particular scenario, as either I have not ever had a need to improve performance, or I haven't known a need to improve performance. Either way I hope that if you do find yourself in this situation, that rather than assuming that you need to make a routine, you step back and take a look at the big picture. Are you utilizing your compiler's option to optimize for performance? Are you doing this the way that the programming language publisher recommends you do them? Are you following design patterns or do you just have a mess of code? Also do you have a good set of fully functioning and passing unit tests. These things will help your program to be more efficient, thus increasing performance.

I have many more note pages like this one, and I hope that I have the opportunity to review and share those as well. I hope this was as informative for you as it has been for me. Thanks for letting me take some of your time.

08 February 2010

Web Testing Tools

For six years now I have served my time as an IT professional in the Quality Assurance sector. Over that period of time I have tested a wide variety of systems from desktop applications to video games, and from web sites to embedded wifi networked systems. All of these systems required a different means to test them effectively. In each system the same mindset (detail-oriented) was employed, but different techniques were applied to utilize that mindset. In some cases a different toolset was required as well.

For the past three years I have developed and tested web-based solutions and I have come to enjoy the vast amount of software technology available to Quality Assurance professionals today. As I continue my career as a software developer I lean a lot on my past as a QA professional. The following is a list of tools that I have found to be extremely useful in my career.

Web-based Testing

Web-based testing tools are my most recent experience and so I will begin here. This first set of tools are simply browsers that I have found to increase my ability to ensure performance and compatibility metrics.

Browsers

Browsers are the primary tool of the Internet. Customers use browsers of all varieties including those that are no longer supported by the companies who created them. This part is sad, and hopefully we all have the guts to tell our customers that. Last year a man called me up to get support on his Netscape Navigator 9, and I had to tell him that not only do we not support that browser, but neither does the creator. I advised him to move to Firefox, which is similar, because that's what Netscape became, but it is highly supported. Anyway, here's the list of browsers and browser tools I use in my testing efforts:
  • IETester from DebugBar is a requirement in my book. IETester gives you the Internet Explorer suite of browsers. It even includes the last currently unsupported Microsoft browser, IE 5.5. But with the lastest installment you get IE 6, IE 7, and IE 8 also. IETester is actively developed, and works very well. It doesn't use emulation, rather it actually collects the old Trident-based rendering engines and allows you to see your web site in different tabs for each browser version.
  • Next on my list is Mozilla Firefox. This browser combines tabbed browsing, with ease-of-use, incredible stability, and a huge assortment of supported browser extensions. Some of my required extensions include Firebug, Firecookie, HttpFox, Screengrab, and Modify Headers. Having the ability to be extended with add-ons makes Firefox a versatile tool and makes it very valuable.
  • Of course new browsers get a lot of use in the market, and so Google Chrome and Apple Safari (both WebKit-based browsers) are required arsenal. While these browsers are built on the same rendering engine, their other internals are very different. Google and Apple are always trying to better the Web. A lot of people use these browsers also because of their stability and relatively high performance.
  • And finally the Opera Browser though not always of high use is an important asset. Out of the box it comes with Dragonfly, which is similar in functionality and purpose to Firebug. It also provides for another rendering standard, and Opera Mini is an important browser on the mobile platform, and uses the same rendering engine.

Automation

This next set of tools is a set of web testing tools that may or may not require a browser. Most of them I have used, some I have not used or haven't used very much, but their purpose is something I think highly of.

Automation: Browser-based

Browser-based automation tools come in at least two varieties. The first variety uses a browser like a Microsoft OLE (Object Linking and Embedding) object and the second utilizes the browser's core functionality to automate web-based testing.

Automation: Browser-based: OLE-type

  • Under free software licensing we've got several options. Python has a library named PyWinAuto for Windows that grabs a connection to the OLE server for Internet Explorer. The library then exposes the OLE API as a simpler Python API that can be used to control the browser.
  • Similarly, Ruby supports a system of grabbing a connection to Internet Explorer's OLE server and Watir (pronounced like "water") exposes this in a Ruby-based API. Watir's API is built in a familiar fashion such that people migrating from HP's QuickTest Pro web testing suite can grasp the concepts readily. Watir also has counterparts for other browsers, which are named for the browsers they support, such as FireWatir, ChromeWatir, and SafariWatir. Work is being done to integrate these all into a single codebase. Currently only FireWatir is integrated, and Watir uses a factory to determine which browser to launch. Firefox's only requirement is the JSSH (JavaScript SHell) extension which can be downloaded at the FireWatir page.
  • Finally .NET got its feet wet in this arena also with WatiN, which is loosely based on the API of Watir and thus follows some of the same conventions. Its licensing allows for free download, but modifications and additions are supported through paid support. The source is freely available and the licensing also supports personal (for business or private) modifications. The advantage of WatiN is in the .NET - if your testers know C# better than Ruby then it's a plus.
While there are probably more, they aren't likely to be as highly developed.

Automation: Browser-based: Browser-Type

Browser-type automation toolkits utilize a browser's core internal functionality. They may be browser extensions or they might use JavaScript to control suites of unit-like tests. Because of this some of these tools are inherently browser- and operating-system-independent.
  • The first on my list is the iMacros browser extension for Firefox. iMacros allows a user of the Firefox browser to record a series of steps taken on a webpage and then play it back at any time. You can also add in validations for particular elements on a web page. It appears as though Chrome also has an iMacros extension now. In order to use it, you will need to download and install the beta or developer branch for Chrome. Because this is a browser extension for specific browsers it is not browser- and operating-system-independent.
  • Next on my list is Selenium. Selenium comes in several flavors. The most independent variant is Selenium Core. Selenium core is an HTML-JavaScript-based testing system which utilizes frames to automate your website and validate various points on it. Because of HTTP security protocols, Selenium Core must be installed next to your website and be accessed under the same domain. So normally you would probably not use this solution to test a web site in a production environment. Selenium RC or Selenium Remote Control on the other hand still utilizes the browser but creates a proxy to your site through a browser that uses Selenium Core to run tests outside of the actual domain. This means that no testing code resides on your server, but you still get the same effect. Selenium is highly developed and highly supported. There are several other offerings from Selenium HQ as well.
There are other tools that fall under this category, but these two are the best. And with Selenium's vast offerings, I don't think you could go wrong.

Automation: HTTP-based

HTTP-based automation of the web sites offers some speed and simplicity to the mix. There is no reliance on browsers whether they are buggy or not and whether they fully support JavaScript or not. On the other hand, HTTP-base testing solutions generally don't support JavaScript, so if your site relies heavily on it then this may not be what you're looking for. I have successfully used HTTP-based testing solutions to test web services, most static and dynamically generated web pages, and XML and RSS products. Here are a few of those products.
  • HttpUnit is the first tool on this list. It is written in Java so generally speaking you can use it on any operating system as long as Java is supported by the operating system. HttpUnit goes about web-based testing like a series of unit tests based on the xUnit testing pattern. Generally you would use a third-party xUnit library to build your tests on, such as JUnit, then you would also use that library to run the tests. Unit testing utilizes the assertion pattern which means you assert a condition to be true, like an element that exists.
  • HtmlUnit is another offering, again written in Java. The major difference between HttpUnit and HtmlUnit is that HtmlUnit allows you to test JavaScript. It sandboxes the JavaScript in a website similar to the way a browser does it, but it still remains headless and keeps the browser reliance low and the portability and stability high.
  • HGrep is one of my own tools that I developed because sometimes I just wanted to know about headers and see things faster than a browser could provide them for me. Again it is a headless tool that I have successfully used to write automation. The tool itself does not provide any automation hooks, but scripts can be easily written in PowerShell which utilize HGrep for automation.
  • Apache's JMeter is a load testing tool for the Web. It specifically focuses on HTTP-based testing and can read and write to SOAP, LDAP, JMS, POP and IMAP mail and regular HTTP protocols.
This gives you a good variety of web application testing tools. These are all highly developed and I recommend them all. If you have any other favorites, please feel free to post them here. Thanks.

03 December 2009

Code Quality

As I program more and more the quantity of files, classes, enumerations, structs, and fields increases dramatically. With that increase comes the ever increasing possibility of defects. When I am programming I try to follow a set of guidelines known as a programming style guide. Style guides help us to be consistent in whatever we're doing. It is probably more common for style guides to be used in artistic endeavors like in creating a magazine or newspaper, a text book, or even a web site. Somewhere where readability and usability matter. But wait, doesn't readability and usability matter in code as well? I believe that it absolutely matters in code.

IDEs

1. Using an IDE for all of your coding is one good way to increase code quality.
A lot of style is handled easily by using and IDE (Integrated Development Environment). IDEs like Eclipse, NetBeans and Visual Studio .NET take care of indentation, tabs, closing curly braces and many other important semantics while programming. One feature that I really enjoy about most IDEs (even Vim an Emacs do this) is the ability to ensure that tabs are always tab characters, but their length in spaces can be fixed for different developers. This ensures consistency in the way code is formed, but allows each developer to retain his own look, or rather the look of his code for his own purposes.

Code Complete

2. Reading about best and good, sound practices is another great way to boost the quality of your code.
Code Complete[1] is a great book that I read recently. Some programmers will say that Code Complete is good for some programming languages but not for others, but I don't think that is correct. While Code Complete was written around C and C++ programming, the practices described in it are good practices when coding in an language and most of them can be applied to any language.
The practices described in Code Complete entail things like when to write a method, how to formulate field names, how to ensure that you are using fields in the correct scope, how to write self-documenting code, and how to keep your code from being too confusing to other programmers, or you after a year or two away from it.
I advise all people pursuing programming as a career or as a hobby (or both, as in my case) to read this book. If you can't fund the $50 ($40 if you buy it from Barnes and Noble online) to get the book, then check out your city library. They are very likely to have it. If they don't have it, then you can always stop into your favorite major bookseller, sit down, and take a few minutes to read it every couple of days. I recommend saving up and buying it though.

Design Patterns

3. Learn and use design patterns.
For a very long time now I have been aware of design patterns. What I was not aware of until recently is that design patterns are an incredible way for programmers to communicate ideas about projects. For example, instead of saying,
"I only want to have one global instance of this class available at any given time, so I'm going to put it into a settings class, and hopefully nobody will try to make their own instance",
you can say,
"I only want to have one instance of this class available at any given time, so I'll make it into a Singleton."
Singleton is one of those design patterns. But the second statement does two things for the programmer and his colleagues that the first one does not:
  1. The first statement conveys concern that somebody will want to make multiple instances of his class, while the second one, by stating that the Singleton design pattern will be used, ensures that nobody will be able to make new instances of his class.
  2. The first statement does little to convey intent, while the second one says his class will be a Singleton, so programmers who know that pattern already know how he is going to accomplish ensuring nobody will be able to make multiple instances and that it will be global. See the second statement never used the word "global".
Aside from knowing when to create classes or methods, design patterns are probably the programmer's best toolkit when it comes to communicating intent, and reasoning why something should be written a certain way. A good book about some design patterns that I am currently reading as a refresher is "Head First Design Patterns"[2] from O'Reilly. It hits several of the very common design patterns and highlights design principles, such as
"Program to and interface, not an implementation"
which help you keep the quality of your code in high standing.

Code Reviews

4. Ask for code reviews and do them for your peers.
Very few things can keep a programmer from writing messy code better than peer reviews of code. We did it in high school, we did it in college, and it wasn't just a good idea or for fun, because we should do it professionally also. If we don't do code reviews professionally then we are missing out on a time-tested proven method of increasing code quality. It all boils down to this: if your peers can't understand what's going on, then how can you?
The next time you write a class, pass it on to your neighbor in the next cubicle in an email or link him to your code and let him take a look to see if what you did makes sense. Ideally you probably shouldn't need to document every single line. If it's good and clean then your cube-neighbor will likely be able to see your intent and see the paths through your code.
Likewise ask to view others' code. One way to get better is to see how others program, share ideas, and take an active role in improving your code quality.

Conclusion

Nobody should be afraid to ask for help to ensure that you have high quality code. By following the above four principles in your coding, your code quality will improve drastically. I guarantee it, and if it doesn't then you're not trying hard enough. Here are the four principles again:
  1. Using an IDE for all of your coding is one good way to increase code quality.
  2. Reading about best and good, sound practices is another great way to boost the quality of your code.
  3. Learn and use design patterns.
  4. Ask for code reviews and do them for your peers.
If you have some other ideas, please share them.

Bibliography
1. "Code Complete", by Steven McConnell, Microsoft Press, (C) 1993-2007 Steven C. McConnell.
2. "Head First Design Patterns", by Eric Freeman, Elizabeth Freeman, Kathy Sierra, and Bert Bates, O'Reilly, (C) 2004 O'Reilly Media, Inc.

13 October 2009

Two or Three Ways to Do Things

As I continually program throughout my career and daily life, I come across the problem often that there are simply too many ways to program a system. The other problem is that there really isn't much of a correct way to program any specific system. It seems that the means to an end often involves trends. To clarify, I have been trying to program a decent text-based calculator for some time now. I began in Java and moved to .NET because it was a little bit simpler and higher level for me. I'll likely go back to Java simply because it suits my needs a little better, whereas I need a tool that can run anywhere. Let me backtrack a little. Since my first attempt at the calculator, it has seen at least six incarnations, two as a Java application and four as a .NET application.
  1. My first attempt involved simply trying to parse out numbers and symbols, transform them into tokens, and finally using the Shunting-yard algorithm, solve the given equation. That first attempt failed miserably, and I actually got feedback on it to prove it. One of my biggest problems was, has been, and likely will always be negation, though it isn't much of a problem now, it always looms as an issue to be resolved late in the development cycle (usually last after completing the basic system).
  2. My second attempt was to try and do the same thing in .NET only do it a little better (usually when you redo something, you try and learn from your initial mistakes and do better). This time I thought more about something that one of my contacts said I should try, namely using regular expressions. I also though I'd use Linq to see if I could make it work. Basically I used regular expressions to see what was coming next in the expression and to remove the next item from the original expression. I (like in the first attempt) first turned the whole string into an array of chars. So when I was looking for tokens like numbers, I was reconstructing the number from individual characters. Ultimately I ended up with two clunky types: a Posfixator class, which also parsed the expression, and an Evaluator class, which took the postfixated expression and evaluated it for an answer. Some of the difficulties I had with this incarnation included parentheses and functions. One very difficult thing was implementing multiple-argument functions, like Power (defined in many languages as Math.Power(double x, double y)). I had reserved the comma for that purpose, but it wasn't really working.
  3. The third attempt to make a good calculator involved going back to Java. The reason for this was to try and get the low-level concepts clarified. Since Java doesn't have frameworks like Linq, and its enumerable framework isn't quite as developed (in my opinion, or at least it has taken a different route from .NET), I thought the challenge would help me figure out where I could improve. I continued to use the idea of regular expressions to match patterns at the beginning (left side) of the expression and remove those bits from the expression, but this time I thought, why do I need to break the expression down into an array of chars, and why do I need to process whitespace? So I got rid of whitespace in the expression before tokenizing it, and left it as a string. The result was a little bit cleaner.
  4. Right away I went back to .NET to fully clean up the design and followed the same pattern of first removing whitespace and then upon finding tokens based on regular expressions, removing them from the expression to be evaluated. This time I also decided to separate all of the possible tokens into their own classes that contained their regular expressions. It served to be a very manageable system, while at the same time it felt somewhat cluttered to have 28 classes all in a single project. I also started to work on the idea of having matrices and variables in the system. That required making a State class to keep track of the internal heap. I perfected [at least as much as I could] the notion of negative numbers, in that I kept track of the last token I read and if it was either nothing or a number then the hyphen was a subtraction symbol, but if I read a parenthesis or another operator, then I was pretty certain that I was dealing with negation. Then I would set myself up to negate the next number. That worked very well. I still had the function problem, and didn't find a way to resolve it at that point.
  5. My fifth attempt took a wild turn. With all of this experience behind me I determined to figure out whether using regular expressions more would be worthwhile. I remained in .NET this time. I decided that I would use recursion and regular expressions to evaluate sub-expressions hierarchically and then instead of needing to worry about postfixation of the tokens, I would be able to immediately know the answers to individual complete components of the whole. This worked out very well in the end. I had a couple of giant regular expression to test for things like decimal and hexadecimal numbers, sub-expressions defined by what was contained in parentheses, and about this time I had discovered Perl more intimately, which has its own operator for the power function, **. I immediately included it in my own evaluation, and didn't have to worry too much anymore about multiple-argument functions. I also used regular expressions to find functions quickly and knowingly evaluate their contents first, and then them. I overcame a lot of previous hurdles in this attempt, however it ultimately was just a learning session. The regular-expression-recursion-based calculator, the one I had initially wanted to use, but didn't know how to do it just yet, turned out to be a flop. The evaluation of simple expressions such as
    24 + 12
    were taking far too long. I thought I had reduced complexity a little, by getting rid of the Shunting-yard algorithm altogether, and by evaluating the expression immediately rather than using lazy evaluation. I mentioned these facts about this particular incarnation of the calculator in my previous post. So I decided that I needed to try again to make a text-based calculator tool that I would be able to use on a daily basis that would be relatively fast and simple in design and structure as well. I wanted to ensure that it would be easy to add new functionality to it, and that I could prove its correctness.
  6. Enter sixth incarnation of Personal Calculator. The latest and truly greatest attempt for the personal calculator was somewhat inspired by another project, totally unrelated to calculators at all that I happened to notice in my Visual Studio RSS Feed Reader.
    Introducing Elevate. Elevate is a Boost-like library fot .NET that features many utilities and high performance tools that may be of value to developers.
    I didn't really know what Boost was, and I thought, "what can it hurt? I'll take a look." So I did, and I found something in Elevate that I really liked. Elevate is an extension project for .NET to extend class to have the utilities that make programming easier and more sensible in other languages, like Ruby's "everything is an object" mentality. Or Haskell's Currying. Instantly I thought about how I had done parsing in the past to pull tokens from a mathematical expression using regular expressions. And I thought about the real differences between the new String.Split methods used in many languages and the old StringTokenizer class in Java. That class never really did what I expected it to, as all it does is provide a class to create an enumerable collection of tokens from a string using a delimiter. Isn't that what String.Split does? So I created a string extension method for .NET that tokenized the current string based on an array of regular expressions. It would follow the same pattern as the fourth and third attempts at parsing to tokenize a string, by looking for specific tokens at the beginning of the string and then removing the whole token upon finding them. This wouldn't differentiate between certain types of tokens though, that would be done later in the calculator. It worked like a charm. Here is the implementation:
    public static string[] Tokenize(this string s, string[] patterns)
    {
        List tokens = new List();
        bool noMatchesFound = false;
        while (!String.IsNullOrEmpty(s) && !noMatchesFound)
        {
            for(int index = 0; index <>
            {
                Regex pattern = new Regex(patterns[index]);
                if (pattern.IsMatch(s))
                {
                    string match = pattern.Matches(s)[0].Value;
                    tokens.Add(match);
                    s = s.Substring(match.Length);
                    index = -1;
                }
    
                if (index == patterns.Length - 1)
                {
                    noMatchesFound = true;
                }
            }
        }
    
        return tokens.ToArray();
    }
    And here is an example of usage:
    string[] tokens = expression.Tokenize(new string[] {
        "^[\\d]+([.]{1}[\\d]+){0,1}",
        "^[-]{0,1}[\\d]+([.]{1}[\\d]+){0,1}",
        "^[+]{1}",
        "^[-]{1}",
        "^(\\*\\*){1}",
        "^[*]{1}",
        "^[/]{1}"
    });
    In all this worked out great on reducing complexity in the calculator because I didn't have to worry about tokenizing anymore. Then I could focus on evaluation, which would still require the Shunting-yard algorithm, but I already knew that my implementation was relatively fast, and I still didn't really need to keep track of token types, because I could always look them up, which in a hashtable is a 1-1 relationship and O1 is all of the processor time I'd need for that.
As you can see I haven't gotten to implementing parentheses or functions yet in this version, but it only took me six hours to write it. And I've been able to abstract a good portion of the program out into a separate project that I call Nathandelane.System. I am also now able to accept command-line arguments to set the initial state of the calculator. Also my classes or more acceptable: Calculator which contains state and pattern matching, ExpressionEvaluator which puts the expression into postfix format and evaluates it, PrecedenceMap which defines precedence for operators and functions, TokenPatterns which defines all of the patterns for individual tokens, and TokenType which is a simple enumeration to keep track of the token type during evaluation. So with six different methods of conquering the same task, some of them worked better than others, some were more complex than others, some of them ended up taking more processing power than others. I guess programming boils down to experience. I have learned a lot doing this project. I probably would have done better if I had based my project fully on somebody else's project. If I could extract common patterns to use that make sense, which I have then I could also provide a better outcome in the end.