Voting Excel Template and Add-In (Borda Counting and Schulze Method)

It is often necessary to find group opinion on a subject when making technical (or other) decision. Whether looking for group consensus, either a single option or prioritizing a set of conflicting candidates either human or technical in nature. I’ve had to sit through many of these electoral processes over the years, and can understand, having seen first-hand, just how lacking in rigor they often are. In order to get true buy-in, any ballot or ranking system must be seen to be fair, even by those who lose.

In an effort to improve scoring and voting systems, I’ve created a template spreadsheet, and Excel Add-In that allows two common election methods to be carried out in a consistent way.

Download the template and add-in from here

The two voting systems I’ve chosen are the Borda Counting system, and the Schulze Method, the latter becoming the preferred ballot system for many public elections (Wikipedia, amongst a long list of others). First of all, let me briefly describe the two voting algorithms.

Borda Counting

Borda Counting is a simple voting algorithm that is often used in the workplace to rank a variety of options, candidates or any other electoral system. It simply asks voters to order the list of candidates (candidates may take any form, people, products, or animals). Each ballot is tallied by giving the least preferred candidate a zero (0) score, and each subsequently preferred candidate in order, plus one (+1) from the previous candidate until all candidates are scored. For example if four candidates, A, B, C and D were voted on in a ballot A>B>C>D, then A would receive a score of three (3), B would get two (2), C get one (1) and D get zero (0). Every ballot would be tallied and summed to get a total score for each candidate. The winner is the candidate with the highest score.

Borda Counting’s strength is its simplicity, and its ability to bring to the forefront a winner that may not be the one voted most preferred the most often. This sounds weird, but if more people vote for a candidate in the higher positions, and vote the simple first-past-the-post winner last, then the other candidate has been voted preferred in the election as a whole. This makes ballots appear to represent the voters intention more clearly and properly considers voter preferences rather than just the most preferred candidate.

For example -

5 voters vote BACD, 1 voter ACDB, 2 voters CADB, and 1 voter DACB.

The Borda Counting results are: A>B>C>D. Surprised? 5 people voted for B being first. But this wasn’t enough to push out A which was voted for second place 4 times. B was voted in last place too often, and A was voted in first or second place in all cases. This is referred to as a Condorcet Paradox – big words meaning -”even though more people voted for a candidate in a pair-wise run-off, a candidate still loses!”

Before writing off Borda Counting, it is one of the better voting algorithms, and is easy for all voters to understand the math and logic; this isn’t always the case with other methods, lets consider the Schulze Method.

The Schulze Method

The Schulze Method takes an approach that the winner of an election should be the candidate that not only the most people want to win, but also the least people want that candidate to lose. It builds a pair-win (or pair-defeat) matrix of the number of time one candidate is preferred over another in all ballots. Unlike the Borda Counting algorithm, Schulze Method looks solely at a “win” when one candidate is preferred over another. Once this matrix is built, the relative strength between all candidates and paths to other candidates is built looking for the weakest value between all candidates and all paths, either directly or indirectly. For example, if A beats B, and B beats C, then A would indirectly beat candidate C. The winner is the candidate who has the strongest path win over all other candidates. The permutations grow as the candidate count grows, and for full details on the algorithm, the inventors article “A New Monotonic, Clone-Independent, Reversal Symmetric, and Condorcet-Consistent Single-Winner Election Method” and Wikipedia article offer pages of demonstration. Schulze Method’s weakness is its complexity; explaining it to a group of people, especially in a single sentence is hard – but it has undergone many years of mathematical scrutiny and offers one of the most tamper-proof (susceptible to tactical voting) options available.

The Schulze Method satisfies the criteria for not being susceptible to the Condorcet Paradox, and given our prior example in Borda Counting for example -

5 voters vote BACD, 1 voter ACDB, 2 voters CADB, and 1 voter DACB.

The result using the Schulze Method is B>A>C>D. I’ll leave it up to you to decide if that represents more voters choice that the Borda Counting result (which made A the winner, even though more people voted B as their most preferred). Before you choose Schulze Method universally, a Condorcet Paradox is easy to build when dealing with small numbers of voters, but rarely eventuated when the number of ballots grows substantially. I support both, and investigate the results if they don’t agree – which in practicality is rarer than this example might indicate.

Voting Systems Excel Add-In

Download the template and add-in from here

I wrote this add-in to make it easier to run fair ballots, and to experiment with the different voting systems. The features of this template and add-in are –

  1. Supports Borda Counting where a total score for each candidate is arrived at by their position in a listed set of most-preferred to least preferred candidates
  2. Supports Schulze Method of voting where the most preferred candidate that wins a pair-wise election over every other candidate is the winner. This voting system is becoming a favored choice for many public elections, due to its resilience in the face of tactical voting which many other systems (like Borda Counting) can suffer.
  3. Support Tied candidates where the voter wishes to express no opinion of one candidate being better or worse than another.
  4. Support skipping candidates, where the voter expresses skipped candidates are less preferred than all candidates voted for, but no better or worse than each other.
  5. Provide a printable ballot template that explains the rules clearly to voters.
  6. Provide a set of Excel functions that help interpret and analyze the results.

The spreadsheet template allows ballots to be tallied in shorthand syntax (B more preferred than A, then D then C for the first votes). The most simple ballot case might be 5 ballots, split this way -

Count Preference Sequence
3 BADC
2 BCAD

This would give results using both of the supported systems of –

Borda Counting
Result: B>A>C>D
Score Rank
A 8 2
B 15 1
C 4 3
D 3 4
Schulze Voting
Result: B>A>D>C
Rank
A 2
B 1
C 4
D 3

Installing the Add-In

This add-in makes Excel User Defined Functions (UDF’s) available to the open excel spreadsheet. To use this add-in,

  1. Download the “Voting – GeekSpeakDecoded Ballot Template 1_0.zip” file and unzip it to a directory of your choice. There are numerous files in this zip file, and they all must be in the same folder. Download the template and add-in from here
  2. Open the spreadsheet “Voting – GeekSpeakDecoded Ballot Template.xlsx” file
  3. Open the add-in “VotingSystems.xll” file (seems strange doesn’t it, but it won’t close the current spreadsheet, just choose File-Open and navigate to the VotingSystems.xll file)
  4. Add-ins are a particular security risk for Excel, if you trust the add-in, enable Macro’s for the session by clicking “Enable Add-In for this session only”

What Is Included in the Spreadsheet Template?

The spreadsheet template has four main pages. These worksheets are -

Examples – Shows each of the functions added by this add-in in operation. It also shows how to get the resulting ranking using both of the supported voting systems, and how to analyze in more detail the results.
Your Ballot – A blank worksheet ready to enter the results for a ballot of your own.
Printable Ballot Sheets – A template for creating your own paper ballots. This template contains the rules for proper voting using this spreadsheet and the supported skipping and tied value features.
Reference – More detail on the algorithms used, and detail on each function available in the VotingSystems.xll add-in.

The examples and reference worksheets form the bulk of the documentation required, and by experimenting with the Your Ballot worksheet, it is easy to get up and running.

For more advanced worksheets, the custom functions added can be used for building your own template in a format of your choosing. The custom Excel functions added when this add-in is installed are:

BordaRankings Given a list of ballots, returns the resulting preference order using Borda Counting.
SchulzeRankings Given a list of ballots, returns the resulting preference order using Schulze Method for voting.
IsBallotValid Returns True (for a valid vote) or False (given an invalid vote) given a ballot vote string.
InvalidBallotReason Returns the reason a ballot vote string is invalid.
BordaCandidateScore Returns the individual candidate score for a set of ballots in a Borda Counting election.
BordaCandidateRank Returns the individual candidate rankingfor a set of ballots in a Borda Counting election.
SchulzeCandidateRank Returns the individual candidate rankingfor a set of ballots in a Schulze election.
PairWinScore Returns the number of wins from one candidate over another in a Schulze election (or Borda Counting election, but pair wins play no part in the Borda Count result).
PathStrengthScore Returns the path strength from one candidate over another in a Schulze election. See http://en.wikipedia.org/wiki/Schulze_method
RankOrderCentroidValueBorda Retuns the ROC for a given candidates final ranking in a Borda Counting election. ROC’s are used to distribute weights that sum to 1 when for decision support (e.g. when choosing a software package)
RankOrderCentroidValueSchulze Retuns the ROC for a given candidates final ranking in a Schulze election. ROC’s are used to distribute weights that sum to 1 when for decision support (e.g. when choosing a software package)

Summary

I’ll be doing future work on this add-in and future articles will cover the algorithms and the pro’s and con’s of each voting systems (when to choose one over another), and this spreadsheet will be the tool used to examine the voting results. Please comment on how to improve the experience of this template and share your thoughts on how you used it in your business and technical decisions.

Download the template and add-in from here

Software as a Service (SaaS) versus Cloud Computing

Software as a Service is often considered the same as Cloud Computing, and the over-use of the phrase Cloud Computing can lead to confusion. Cloud Computing by my definition is around the provisioning and managing of borrowed computing resources, whereas, Software as a Service is around using software accessed over a network (normally the Internet) rather than buying and maintaining that software locally. Muddying the waters between hosted software, and hardware, for the marketing cache of “cloud” is needlessly confusing.

Software as a Service (SaaS) is best explained by looking at a couple of examples –

  1. Email (Google Apps – gMail for your domain) – It is common for companies to install their own server equipment with Microsoft Exchange installed to allow their customers to send and receive email (Exchange does more, but this is an example). An alternative SaaS is to employ Google’s offering of branded Gmail to allow your employees access to email.
  2. Customer Relationship Management (Salesforce.com) – Managing you client base and sales pipeline is important aspect for many businesses. Even though this data is highly sensitive and competitive, many companies have outsources this to Salesforce.com which offers this and many other services online.
  3. Microsoft Office (Office 365) – Microsoft Office has often been the tool of choice for creating documents, spreadsheets and presentations. Microsoft has recently launched online versions of these tools. This is an interesting SaaS play, because they are betting that most people will use both online and offline version, and be more productive for that benefit. This online/offline combination hedges the bet, and removes the “online while connected” objection for adoption (for example, you can still work whilst on a non-wifi enabled plane).

Although Cloud Computing shares some similarities, the differentiator is in the delivery of a software solution that you otherwise would have purchased. The cost of installed software is often beyond the sticker price, many elements combine in a total cost of ownership for in-house software –

  1. Cost per package or user, depending on the model
  2. Yearly maintenance to keep most recent version installed
  3. Hardware, each desktop or servers required to support this package
  4. Storage space for data (locally and for the company archives)
  5. Backing up of the data, and being able to recover if a disaster occurs
  6. Managing, powering, cooling and housing the servers required

The cost model for SaaS is different per vendor, but most offer a service at a per user amount with certain storage constraints that can be expander for an extra cost. Google Apps for instance is currently $50 per user; this gives each user a 25Gb of storage space. On the surface, it seems like a great deal, and it probably is, unless email is your core business. You gain the ability to access email from anywhere on the Internet, but you may lose the ability to work offline or obtain a backup of your data (many providers address these detractions). When modeling the costs, consider the following elements –

  1. Per user cost, or is it in groups (one to five, six to ten, etc.) users.
  2. Likely storage costs. If the service you are purchasing allows uploads and attachments, consider how much storage per user there will be for the first and subsequent years.
  3. Bandwidth costs. If you have one-hundred employees fighting for the Internet bandwidth into your office, the experience might be slower than the demo; ensure you include the cost of buying more Internet connectivity.
  4. Switching costs – if you needed to change vendors, or the current vendor stops service – how would you get your data? Some vendors will send you regular copies, but check.
  5. Support costs – does the vendor have telephone or email support that you might pay for if problems arise?
  6. Training costs – does the vendor have training, if so when and where?
  7. Professional services – Will there be costs in configuring and migrating data from current systems to the new service?

There is also the concern around security. For the email access instance, an employee’s full email is accessible if their account details get compromised. Traditional in-house installed email packages also have some security concerns of their own; loss of the laptop at a coffee shop with the all documents and data on the hard disk comes to mind. Setting good security policies around password strength, and how often the password might get changes is the easiest way to improve the basic attack of guessing a password and username. Most of the major SaaS vendors allow this policy to be controlled, and some go even further with a special authorization necessary the first time you access their online system from a specific location (country or even computer in some cases). Internal misuse and access of data is reported as common, and outsourcing systems to a major vendor like Google or Microsoft can be seen as reducing this attack surface for sensitive data; as long as you keep the front-door secure (good passwords!).

When choosing a SaaS vendor, the service level needs to be considered. This is normally measured in percentage availability per year. 99.9% for instance would allow a vendor to be un-available for 8 hours, 45 minutes year. This is a sticky point; some vendors have a service level agreement, but that only extends to a credit for the downtime. If the function of the service you are considering is so central to your business that you would have extreme hardship if this service wasn’t available (a core business capability), then investing in your own hardware, software and disaster recovery procedures might be prudent to consider.

Employing online software services is a growing trend, and can offer significant cost savings over owning, caring and feeding in-house platforms. It is now a mainstream way of doing business for many companies, and should be investigated for every IT function that is not core and central to business (and even then, still worth a look).

Questions to ask your technical team about Software as a Service -

  • Have we considered using an online vendor for this service rather than buying this package in-house?
  • Have we modeled the long-term costs of using the online vendor for the first and subsequent years? (storage, user growth, etc.)
  • What would be the process if we needed to move to a competing vendor, or this vendor closes shop?
  • Do we have a backup strategy for the data stored on this service?
  • Have we considered PCI Compliance (credit card security), and Privacy requirements?

Factors Influencing Page Speed (How Web Pages are Downloaded, and How to Improve Speed)

How Web Pages Are Downloaded

In order to understand and join discussions around web page performance, i’ts important to have a basic understanding of how a web page is downloaded and drawn in a web browser application like Internet explorer, Firefox, Chrome, etc. It’s not a complex process, but there are many repetitive steps – and each of these steps has an opportunity for download and drawing time improvement.

When you point a web browser at a website by clicking on a link, or typing in a URL to your favorite web browser, you initiate a complex multi-step process. A web page is assembled from at least four (4) different types of information formats, and each of these four types of information will be split across many individual files (there are more, but these are the major ones). For example, to build a popular websites home page, eighty-one (81) different files are downloaded and assembled by the browser application; Image files containing the pictures and icons on the screen, files containing the text and layout instructions of the screen (for example, HTML, aspx, jsp, or php), files containing the font, colors and position of elements on the screen (CSS files),  files containing the code to execute when the page downloads (Javascript files), are the main information file types that make up common pages, but there are often more. Having fewer files is an obvious win, Google use this approach – notice the lack of images on their pages.

To describe the page download process step by step, one initial call to the server is carried out to get the page code, this is in a language called HTML and contains all of the instructions to the web browser on how to draw a page. Once received, the browser interprets this file and requests all of the other files required – some in parallel, some one after the other. The web page actually starts appearing after the HTML file is analyzed, and some of the necessary files are downloaded – not every file is needed for the user to begin reading and clicking on that page. Smart companies pay close attention to the order files are coded into the HTML file, so that the web browser can get the earliest start. Figure 1  shows a very simplistic page download process, from getting the main page content HTML file, to retrieving all of the other resource files required to draw the complete page. When the user first sees and can interact with the page is indicated by the upside-down triangle at the top right of the time line – its somewhere after the main page HTML is retrieved, and sometime before all of the resource files are downloaded (you will see our job is to not only get the complete page downloaded faster, but also to make the page appear usable in the window the customer can see as early as possible). Figure 1 also contains a pie-chart of an example mix of file types and their size downloaded for a common website.

 

Figure 1 - Simplified web page download process

 

In the early days of the Internet, back in the 1990′s, pages were very simple and mostly text based. This allowed most of the page to be contained in the initial HTML file (average 80% text, 20% graphic in the early days). This is most definitely not the way pages are built today. Pages today have many graphic elements, and a lot of embedded code (written in a language the web browsers can interpret and run client side), reversing the percentages into 20% of the page being text, and 80% other resources. Now pages are hundreds of times bigger in file size, and download an average of forty support files to draw their final rendition.

There are multiple “taxes” paid in speed when downloading each file required for a page rendition. One for the amount of data transferred based on the file sizes being sent and the client’s connection to the Internet. And the second tax is the overhead of requesting a separate file, and the logistics of setting up a connection between two servers across the Internet. Combine these two Internet taxes and we have the two main components that drive sluggish page download performance. A third tax that is smaller, but becoming more prevalent is the time taken to interpret and draw the page in the browser window. With the increase in complex styling, and the process of interpreting the Javascript code that allows interactive behavior to apply in the web browser without coming back to the server, is emerging as a factor to be considered (but only after  minimizing the size and number of files to download).

Understanding basic construction of a request to a page being displayed in the users’ browser is the first step to understanding why pages perform poorly. The only guaranteed ways to reduce page speed that we can draw at the moment are -

  • Fewer images and support files to download means a faster page (Google take this to an extreme)
  • Smaller images means a faster page (do we really need a full resolution image?, and yes, these third party Flash banner adverts are huge)
  • Fewer calls to third party analytics or advertising platforms means a faster page

Factors Influencing Page-Speed

Diving into more detail on factors influencing web page download speed from an end customer perspective, we need to categorize some of the more important ones. The following short list is what I’ll cover in this section -

  1. Customer geography (with respect to where the servers are located) – Some countries and regions are less adorned with Internet access speed around the world. Where your servers are located, and where the customers request a page will impact web-page speed.
  2. Server side execution time- Some pages require a lot more work to build than others. If the content on the page is static (doesn’t need to be computed), or dynamic (build per request), different page speed goals need to be considered. Server side code often makes calls to databases, or even call other companies web services, and these computations and transmission costs take time.
  3. Resource transfer time - The main response from your server describing the page contains links and instructions on what else is required to render the page. Images, videos, and code to be executed in the web browser itself need to be obtained. This is often the majority of the time, and not all of it is required for the page to be usable.
  4. Client side speed and rendering time - Once the resources have been downloaded (or at the same time in most cases), the web browser starts to build the final page layout on the customers screen. This involves interpreting the main page’s instructions, executing any code in the browser that is required, and putting text and images in the right location, using the right typeface, color and size. As pages get more interactive and complex this becomes more important to optimize. Every brand (and version of brand) of browser does this slightly differently, so it is important to profile all of the major ones.
  5. Third party content – Advertising banners, and analytic tracking beacons have emerged as an area causing page speed degradation beyond acceptable limits. In most cases, this is peripheral content, and should be organized in a way to load after the main customer useful page content.

Customer Geography

Understanding how pages are built should make this constraint self-evident. The further the end-customer is from the servers hosting a web page, the larger the transmission losses, and the longer the setup time and transmission time for data across the network channel. Customer geography impact is determined first by the connectivity into the Internet backbone for the country the user is in, and then the connectivity that customer has to their local Internet Service Provider (ISP) and telecommunications provider.

Some countries have good connectivity to the global Internet backbone, but data flowing around the world still has an upper limit of being sent at the speed of light – fast, but this adds up given that if your servers are in North America, but the customer is in London. Underwater cables form the basis of most Internet traffic transfer between countries, and due to geography, some countries get a better connection into North America than others. North America is the hub of most connectivity owing to the way the Internet rolled-out,and  is still the epicenter and hub of most connectivity to other nations or traffic. This isn’t to say always put your servers in North America alone. Positioning your servers closer to the majority of your customers is obviously faster than hosting in another country, but this isn’t always convenient or cost efficient. In an ideal world, putting a data-center next to each major customer population is best, but if you have to have one, put it in the USA for North American traffic, Japan for an Asian hub, or UK for Europe to be closest to the main communication cable landing locations. The rationale is that there is most bandwidth into and out of these locations because a lot of traffic moves past these locations (the downfall is that a lot of traffic moves past these locations, but I’m hoping that commercial pressures means that these locations keep up with demand). To understand the Internet bandwidth connections between countries, visit the Telegeography.com website and view their maps indicating underwater cables and connectivity between regions.

 

Telegeography.com Internet Connectivity Map

Figure 2 - Internet Connectivity Map (purchase a hardcopy from Telegeography.com)

 

The second inter-country factor and the last mile to your customer are much harder to predict. However, in China for instance, there is poor connectivity between the North of the country and the South – and its factors like this that you need to research if a country has a particular density of customers in your case. In general – if a customer has poor Internet connectivity, they will have a poorer experience. If your customer population is in the developing world (communication wise, with poor or expensive high-speed Internet access to the general population), then you need to work extra hard to reduce the bandwidth used to send supporting resource files, and to send fewer files in general. If your website is highly graphically rich with video and complex pictures, then launching into Internet connectivity poor countries will require a website redesign – or local country hosting.

Some vendors offer servers in many countries to host the images and scripts on a server physically in-country to the requesting customer. When a customer asks for a web-page by name, the first part of that string, the www.google.com gets converted to a set of numbers called an IP address, this IP address is what is used to open a connection between the customer browser and your servers. This look-up conversion from friendly name to number sequence is performed by what is called Domain Name Servers (DNS). The service offered by some vendors is to is host your domain name (normally just for these assets), and return to the customer a server physically closer to them, whilst letting your original servers in your home territory keep the master copy and serve the basic page information. This is called a Content Data Network (CDN). This avoids you having to cover the costs of having your own servers in-country for the dynamic parts of the page, whilst reaping the benefit for the larger number of elements requested by the customer’s browser. The major players in ths market are Akamai, and Limelight Networks, although many new players are entering the market (Amazon Cloudfront and Microsoft Azure CDN to name a couple of the smaller ones). Figure 3 shows the basic concept of a CDN; the main content comes from your origin server wherever that may be, and after the first ever request from a customer in a particular region (depending on your CDN vendor), that content is servers from in-country closer to the customer from that time forward.

 

Content Data Network Basics

Figure 3 - A CDN puts static content like images on servers around the world closer to your customers. The main page text still comes from your origin servers, but other files get replicated around the world.

 

Server side execution time

The customer will see a blank screen until their browser gets the main HTML file, and it goes without saying that this should be as short as possible. Some web pages need substantial server side computing power to build the data they need to send to the customer. A common example is when you need to do real-time price or availability calls on third-party service providers. For example, if your website allowed people to book air tickets, you might need to call a third party service to get itineraries for the customers request. This can take many seconds, and in looking at some popular booking sites five (5) to twenty (20) seconds to even get initial results. Leaving the customer unsure what is happening viewing a white screen is a sure way to have them bail on your website.

My rule here is that if the server side computations will jeopardize the two-second rule of giving acknowledgment to the customer, then I redirect through a status page first to give the user clear feedback that the click registered, and that data is being gathered to give them valuable information in a few more moments. Clever use of new patterns involving using the client-side programming language to request and build a page in parts (called AJAX), is the newer way to achieve this purpose – any kind of fake it until you make it strategy serves the purpose. The actual patterns for achieving this user experience vary, and I’ll cover those in a later more detailed technical post [add cross reference link here when its written]. For now, it’s just important to make it clear that long-server side requests do occur, sometimes beyond your development teams ability to tune further – and if these requests might cause the customer to see a blank white screen for longer than two (2) seconds, address the issue with a progress page, or load the logo, borders and template for the page first, then download and draw the content progressively.

Optimizing the server side calls for these length computationally, or third-party content pages is important. Some common practices if you are dependent on a third-party data source (or your own, but outside of your web development teams immediate control to speed up), is to store a local copy of that data for your web servers to call on subsequent requests for identical data. For example, if a page returns the weather for Seattle, Washington – making a third-party call to another website for the seven day forecast more than a few times an hour is redundant. Instead, consider storing the returned result in a local database using the call parameters required to make it unique, and when subsequent calls to your site ask for the Seattle, WA weather, avoid the lengthy third party call and retrieve it locally from disk. The coding pattern goes something like this – 1. Do we have a local copy that is less than x minutes old? If so, return that local copy, 2. If we don’t have a local copy, retrieve the data externally and save a local copy of it timestamped, and with a unique identifier (normally by joining together the elements that make this call unique, for example ‘SEATTLE-WA-USA’). This issues to solve are how long until the data in the local storage cache becomes out of date, and this varies depending on your data. Figure 4 shows the most basic server side content caching to improve end customer response. The downside of all caching is that the data get stale and might cause incorrect data to be shown to the customer. Setting the expiry rules, reducing the caching refresh time, and making sure the correct set of parameters control whether a cached data is the same as required from the user’s requirements need to be carefully considered.

 

Figure 4 - Basic content caching to improve server side performance

 

Resource transfer time

Resource transfer time offers the largest opportunity for speeding up customer experience. Banner advertising and complex graphics take time to download, so the first obvious choice is to reduce that clutter (or balance versus how much revenue they generate). Fewer images and eye-candy is the clearest way, and Google make an art-form of minimalist page clutter. In fact, when they had to put a link on the home page to their privacy policy, it is reported that the CEO made them find another word to remove (they change ‘Copyright (c) 2010′ to just ‘(c) 2010′). I’m not saying that is the best choice for other sites, but if there is opportunity to reduce the design aspects of the page to contain fewer images, then you should. Making the images the lowest resolution as possible without losing clarity, avoiding background images (that aren’t smaller images tiled) are some other easy strategies to keep pages lighter.

Size is one aspect of resource transfer time, the other aspect is the number of requests for files. When making data transfers across the Internet, it is faster to make a single call for a 10,000 byte file rather than ten calls for 1,000 byte files separately. There is setup time for each call, and this adds up, especially when the country of origin isn’t near the customer. There are numerous techniques your developers can employ – combining many files into one larger file for images (this is a process called ‘Spriting’, the webpage needs to know how to pull the right tile out of the larger-image to get the smaller images), style-sheets and the client side Javascript language files. This is a best practice, but often not performed because it is difficult for the developers to keep track of where individual resources are located when editing and making changes. One way around this is to have the servers do this step automatically when publishing changes, or to have a server do this transparently on your behalf on-the-fly (this is an emerging area – Strangeloop Networks, Aptimize, are example companies with tools for this).

We previously covered using an external Content Data Network ( CDN) to store these static resources closer to the end customer when dealing with overseas calls. CDN do offer other benefits for local customers as well, because the vendor hosting the CDN content likely has servers on each coast near major population centers – improving response that little bit extra. In addition, it was traditional that web-browsers like Internet Explorer would only make five or ten parallel calls to the same domain name; If all the images are hosted on the same server, they share the same domain prefix (the google.com, or yourcompany.com part), but having some of this content on another domain increases the ability for the web-browser to do more calls concurrently. When you get to this level of change though, you are talking small parts of a second reduction in call time.

Client speed and rendering time

When a page and its other support files are downloaded, the web-browser needs to assemble, layout and draw the page. For all but the most graphic intensive website, this time is minimal. In the coming years, this might become a bigger issue as more native HTML is used for animation and advertising, and the use of Flash (a particular graphic file format that supports animation and complex interaction). Flash bases graphics was once the major tool for graphic intensive websites, but some companies (I’m thinking Apple here, and Google to some degree) have been forcing a more open-standard approach in the native language of the web-browsers. This change will mean that in the future spending extra time improving performance of this code might be needed.

It’s unlikely your pages suffers from rendering time issues at the moment, but when it does your first indication will be that some web-browser versions have more trouble than others. You should see that shopper conversion percentage might be lower from one browser version to another, and this is the only time you need to invest in this area. Web browser drawing speed and Javascript execution speed is a constant battle amongst the major vendors – Microsoft with Internet Explorer, Mozilla with Firefox, Google with Chrome, and Apple with Safari. These vendors jostle and improve rendering and execution speed with each release. At the moment, the browsers likely to have issues will be those two versions behind current – Internet Explorer 6, and Internet Explorer 7, and Firefox 2. Unless you have significant customers on these browsers – spend optimization time elsewhere (and put an upgrade your browser banner on your site to help customers upgrade to a more secure and performance web experience).

Third Party Content

A recent trend when analyzing pages my teams have been hunting for performance improvements has been the influence of third party content. Most companies don’t host their own banner advertisements, or analytics software. These parts of the page are included by calls to another vendor’s website, and problems arise when they block the main page content from being seen first.

This content should be added in a way that they download and show last, and in a way that doesn’t bump the other content around when a customer is reading that content. Work with your technical team to pick fewer vendors to handle analytic tracking and advertising positions with a few key vendors.

Questions to ask you technical team about web page performance

  1. Do we have our servers in a geography closest to our customers?
  2. Are we using a Content Data Network (CDN) for static resources like images and support files?
  3. Can we reduce the number of images or reduce the resolution of the images smaller?
  4. Can we reduce the number of Javascript, and CSS files by simplifying our design or compressing and combining these files?
  5. Can we use server caching of external content to reduce response time?

Webpage Performance

I often hear comparisons of  a certain website’s speed versus competitors, apparently the entire fate of the company is website speed dependent. There is evidence that poor page speed decreases visitor conversion (the amount of visitors who shop, versus buy), but page speed isn’t the only factor influencing customer conversion and investing in page speed improvement has to be balanced against other potential revenue increasing work (adding new features for example), and holding up a release looking for an extra few milliseconds may not be the best use of the development team’s time.

Some search engines, Google for instance, have publicly stated that they will use page download speed as one element in determining page rank (although it will likely be after relevance has been established, and used as a tie-breaker), so as well as better customer conversion, potentially a better Google ranking is a benefit of spending some investment in web-page performance. Google hasn’t disclosed (and is unlikely ever to disclose) how much influence this factor has in its ranking equation – but it has some, and shouldn’t be left unchecked.

How Long is Too Long?

The controversial aspect to page speed discussions, is how long is too long. Some definitions might be -

  • Longer than the users’ expectation

–Too long an impatient user leaves the site
–Too long an average user leaves the site

  • Too long that it has a measurable impact on conversion
  • Too long that it becomes a factor in search engine ranking
  • Longer than the competition

These are all valid, but still mostly subjective (longer than the competition isn’t a constant and changes over time). It’s important that clear goals be set for proper effort to be allocated by the design and development team (covered shortly). The most logical discussion I’ve seen of this topic links customer behavior to the amount of time they have to decide if the content is enticing enough to stay a little longer. Here is the discussion -

“My rule of thumb is that you have 7 seconds to sell a prospect that they have reached a page of interest to them. You don’t have to make a sale in 7 seconds, you just have 7 seconds to keep them for a while longer.

If your initial page takes 6 seconds to load, you have 1 second of selling time. If your initial page takes 1 second or less to load you have 6-7 seconds of selling time. Therefore to maximize sales you should give yourself maximum selling time. That means pages that load as fast as possible.

Source: http://www.webmasterworld.com/forum116/41.htm

I think there is merit to this approach; I go a little further and set goals for initial confirmation (I heard you, I’m thinking, stop hitting F5 to refresh the browser), and then getting compelling data on the screen for the user to interact before the seven second clock goes off (my longest target is four (4) seconds, giving me three (3) seconds sell time).

Setting Clear Goals

Many target page load speed numbers floating around the Internet, two (2) seconds, four (4) seconds, seven (7) seconds. Google has set the pace, and when they launched their performance was so dramatically better than the competition that they become the defacto search engine overnight; Now a multi-billion dollar success story. Considering they can return ranked results for topics requested across hundreds of million (conservatively) almost instantly is amazing, but it does make the rest of us look bad! Try and not infuriate your technical team by comparing to Google – they have a few more resources and designed from the ground-up with up-most performance in mind.

It is vital that page speed goals be defined before the tech team starts developing. The decisions they make designing an approach will change dramatically based on these goals, and retrofitting optimizations at the end of the project is technically difficult and often delays projects. Without clear performance goals, technical staff may spend too much time in some areas, and not enough in others (like releasing and testing). Its important to prioritize the pages that need to be improved, and to make sure that these pages hit their target and stay that way over time. But, what is the best way to communicate these goals? Setting unrealistic goals is unwise – correct results must come before speed – make this point very clear.

Here are some suggestions to start the conversation with your teams -

Report cards – Having a consistent way to determine how a page rates, and if there is room to be improved is by using one of the online page speed checks. Make sure your staff have a way to measure page speed, and that they understand what is acceptable. One popular free tool is Webpagetest.org, which is an online website that gives you the ability to check for common coding errors (or improvements that might be made to improve performance), and the ability to measure response time from many different geographic locations. This website scored 90/100 which is quite good (see blow); anything below a ‘C’ rating or a total score less than 75 should be looked at urgently. Google also has page speed testing tools, that allow testing to be carried out within the Firefox browser.

Demonstrating progress quickly (the two second rule) - One goal to ask your tech staff to focus on is to quickly show a click has been accepted, and work is ongoing within two (2) seconds. Beyond this time it is likely that the customer is going to not realize a click event has been registered – either hitting again repeatedly, or just leaving your site thinking it has frozen. Your options here are to show an interstitial page (a progress page) and then redirect to the results. Alternatively, get the next page loaded with some content within the first two (2) seconds – then let the rest of the page download quickly after that. It is the acknowledgment ‘I heard you’, and then ‘I’m making progress’ perception that shape the user perception of speed.

Above the fold first (within four seconds – the four second rule); Progressive building the rest – Most pages are bigger than the window the customer is browsing. Putting the content in the portion the user can see (above the fold) is the highest priority. Make sure the technical staff focuses on getting content at the top most likely visible section of the page first. When you look at pages on your own website, mention to the developers when you see opportunities to get page content candy on the screen early for the user to read and comprehend whilst the remaining parts of the page download off screen.

Total download time – total download time is often the measurement quoted and compared, however, if the two second feedback rule, and the four second above the fold rule are followed, customer impact is likely to be low. Setting a clear goal on total download time is difficult and not likely to improve customer conversion. My suggestion is to set an upper limit, and re-enforce the point that after retrieving correct results, if a page takes longer than this – start brainstorming and trading off early. Avoid relying on tools that report total download time. It is likely that invisible third-party content, and banner advertising animations (large content sizes) will skew the results many seconds past when the page is perceived usable by the customer, and therefore not likely impacting conversion. Focus on the two second and four second rules!

This post has looked at the major factors influencing page speed, and the ways to determine if page speed is likely to cause major customer impact.

Questions you should ask your team about web-page performance are -

  1. For our top x pages, do we regularly measure the page speed ratings of our website? Do we using webpagetest.org, or Google’s Page Speed tool?
  2. For our top x pages, do we always give feedback within two (2) seconds?
  3. For our top x pages, do we render above the fold within four (4) seconds?

In future posts we will discuss specific ways to measure page speed, and ways to improve page speed with specific coding patterns.

Cloud Computing – Definition

Cloud Computing has a lot of hype surrounding it. Most of this hype is justified; It will play a major role in the future of IT, and every online company will embrace some Cloud Computing implementation, if they haven’t (knowingly, or unknowingly) already. The issues with Cloud Computing isn’t technological, it’s with definition. There isn’t even for geeks, a consistent definition when the words ‘Cloud’ and ‘Computing’ are capitalized and joined as a Noun (or is it a verb, does a company ‘Cloud Compute’ or does it use ‘Cloud Computing’).

Like all aspects of technology (and life), old ideas get revisited and reworked. Cloud Computing has in some form or another been with us long before it was given the moniker. Using a server remote from your premises dates back many decades, although when computing power on the desktop increased to productive levels, the use of remote computing resources lessened – but now, with the help of the Internet, it has returned. Some of this is driven because of marketing, and some of it through clever rethinking of what computing resources are needed day-to-day in a given circumstance, and how that may grow and shrink over time. When I get to defining what Cloud Computing is, you will recognize aspects you already know and understand, and then I’ll throw some new concepts at you (I know your geeky IT friends, start at the new aspects, and suspect that is why Cloud Computing seems so mysterious.)

I’ve mentioned Cloud Computing is hard to define, so I’m not going to give it an exact definition, that would be pointless and out of date before I hit the post button. Rather, I’m going to describe the characteristics of the technology, and let you draw your own definition given your IT demands.

1. Mixing on-premises and off-premises computing resources (computing, storage, network)

Cloud Computing allows (although doesn’t demand) computing resources located externally to be utilized to solve business problems. If you have ever had your website hosted on an Internet Service Provider – then you have technically participated in Cloud Computing. It was never referred to as Cloud Computing, but renting space from an ISP, or renting space from the branded Cloud Computing vendors like Amazon, Microsoft, Google, etc. is at points indistinguishable. The ability to outsource the management of computing resource, is the key point – someone else is housing, maintaining, and care and feeding the computing resources, rather than your data center premises and staff. Some of the major vendors are now allowing you to buy the exact equipment they are hosting on your behalf to benefit from the other aspects of Cloud Computing we cover next.

2. On-Demand provisioning (and de-provisioning) of servers

Where the newer Cloud Computing vendors start to differentiate is the on-demand time-frame of provisioning new computing power, the ability to grow and shrink those resources over time, and the process for how these are billed to your credit card.  Be under no illusion, when you use a Cloud Computing service, you are getting  a server, network and Internet bandwidth, and storage provided to you without having to have that machine in your Data Center – you are borrowing those tools somewhere else (but they exist, and you could house all the components yourself).

When Internally hosted, every new project that requires more servers means a server to be installed (although some of these servers exist only in software form, we will cover Virtualization in another entry), and that server to be connected to your network, software installed and configured, and maintained from that time forward by your staff. The lead time for these new servers is often longer than you want, and believe me, longer than your IT staff wants. The Cloud Computing vendors allow new servers to be provisioned in a consistent fashion, ready to use in minutes – especially useful, if that server isn’t needed longer term, or to handle peak times of the year (why pay for hosting and housing all of the servers you need for Christmas time peak for the rest of the year?).

3. Elastic Storage – pay only for what you use

The amount of unused disk space you are paying for is staggering. Every server you provision has space allocated for its predicted future need, and often for the minimum disk space you can buy. This is often mitigated by using shared drive space, called Network Attached Storage (NAS), or its bigger brothers. But, when using the Cloud Hosting model, you only get charged for the storage space you use. The downside to this is you also get charged for each time you access or submit that data, but the costs are in the cents per Giga-byte range (a DVD holds 6-9 Giga-bytes of data), which I guarantee is smaller than you are paying for fixed storage of your own (probably even less than the electrical power for cooling, and powering those storage servers – throw in backup and maintenance staff, and its ever more compelling).

4. Elastic Bandwidth – pay only for what you use

If your Cloud Computing resources are accessible to Internet users, ensuring that you have bandwidth capacity to grow to those requirements is important. For example, adding the ability to watch a video demonstration of your product could quickly overwhelm the capacity of your Internet connectivity; With the major Cloud Computing vendors, they provide this connectivity to the Internet, and charge you according to the amount of traffic demanded by your eager customers. Whilst still requiring planning to avoid a shock invoice at the end of the month, the ability to expand when required, and only pay for what was used in low traffic times is compelling.

That would be my top 4 characteristics that define Cloud Computing for me. In future posts, I’ll expand out each of these aspects and give more detail of the pro’s and con’s of this type of computing infrastructure. For now,I’ll leave you with the questions to ask your IT Staff if they suggest using Cloud Computing in your company.

Questions to ask your technical team

  1. Does the vendor you plan to use provide a service level agreement? If they have an outage will we be losing money or disappointing customers?
  2. Are we exposing sensitive customer data to these providers? I want to make sure our customer’s information is safe, so tell me how you are securing that data from theft or loss?
  3. How are we backing up the data?
  4. Have you considered how we might move vendors easily if we need too?
  5. Have you costed out the storage and bandwidth costs in addition to just the computing cost? Have you considered how much it will cost if we double the traffic we need to handle.

Welcome to Geek Speak Decoded

I’ve been in technology for over 25 years. In that time, I’ve managed to confuse all of my executive team by speaking a language they often don’t fully understand; and when they do understand, I learn about something new and confuse them with a new acronym – it’s a vicious process. Now, I’m an executive – and my technical team are confusing me with new terms and technologies – it’s hard to keep up!

So, my solution is to take what I currently know (and learn over time) – and write it here! What’s in it for you?

If you are an executive – I’ll explain the concepts and technologies you need to know, and give to questions to ask you technical team; you will immediately look smarter and more credible in meetings.

If you are an IT technologist (developer, architect, team lead) – I’ll explain to you how you need to slow down and dumb down the message into something we (I) can work with.

I hope you enjoy the journey,

Geek Speak

Follow

Get every new post delivered to your Inbox.