Only 10–20% of the end user response time is spent downloading the HTML document. The other 80–90% is spent downloading all the components in the page..
This book will change your approach to performance optimization.
When Steve began researching performance for our Platform Engineering group at Yahoo!, I believed performance was mainly a backend issue. But he showed that frontend issues account for 80% of total time.
In reality, for most web pages, less than 10–20% of the end user response time is spent getting the HTML document from the web server to the browser.
If you want to dramatically reduce the response times of your web pages, you have to focus on the other 80–90% of the end user experience.
What is that 80–90% spent on? How can it be reduced? The chapters that follow lay the groundwork for understanding today’s web and provide 14 rules for making them faster.
Looking at the HTTP traffic in this way, we see that at least 80% of the end user response time is spent on the components in the page.
If we dig deeper into the details of these charts, we start to see how complex the interplay between browsers and HTTP becomes. Earlier, I mentioned how the HTTP status codes and headers affect the browser’s cache. In addition, we can make these observations:
- The cached scenario doesn’t have as much download activity.
- Varying numbers of HTTP requests occur in parallel.
Figure A-2 has a maximum of three HTTP requests happening in parallel, whereas in Figure A-1, there are as many as six or seven simultaneous HTTP requests. This behavior is due to the number of different hostnames being used, and whether they use HTTP/1.0 or HTTP/1.1. Chapter 6 explains these issues in the section “Parallel Downloads.”
- Parallel requests don’t happen during requests for scripts.
That’s because in most situations, browsers block additional HTTP requests while they download scripts. See Chapter 6 to understand why this happens and how to use this knowledge to improve page load times.
Figuring out exactly where the time goes is a challenge. But it’s easy to see where the time does not go—it does not go into downloading the HTML document, including any backend processing.
That’s why frontend performance is important.
In any optimization effort, it’s critical to profile current performance to identify where you can achieve the greatest improvements. It’s clear that the place to focus is frontend performance.
- First, there is more potential for improvement in focusing on the frontend.
If we were able to cut backend response times in half, the end user response time would decrease only 5–10% overall. If, instead, we reduce the frontend performance by half, we would reduce overall response times by 40–45%.
- Second, frontend improvements typically require less time and fewer resources.
Reducing backend latency involves projects such as redesigning application architecture and code, finding and optimizing critical code paths, adding or modifying hardware, distributing databases, etc.
These projects take weeks or months.
Most of the frontend performance improvements described in the following chapters involve best practices, such as changing web server configuration files (Chapters 3 and 4); placing scripts and stylesheets in certain places within the page (Chapters 5 and 6); and combining images, scripts, and stylesheets (Chapter 1).
These projects take hours or days—much less than the time required for most backend improvements.
- Third, frontend performance tuning has been proven to work.
Over 50 teams at Yahoo! have reduced their end user response times by following the best practices described here, many by 25% or more.
In some cases, we’ve had to go beyond these rules and identify improvements more specific to the site being analyzed, but generally, it’s possible to achieve a 25% or greater reduction just by following these best practices.
the Performance Golden Rule:
- Only 10–20% of the end user response time is spent downloading the HTML document. The other 80–90% is spent downloading all the components in the page.
After that come the 14 rules for faster performance, each in its own chapter.
The rules are listed in general order of priority. A rule’s applicability to your specific web site may vary.
For example, Rule 2 is more appropriate for commercial web sites and less feasible for personal web pages. If you follow all the rules that are applicable to your web site, you’ll make your pages 25–50% faster and improve the user experience.
The last part of the book shows how to analysis the 10 top U.S. web sites from a performance perspective.
HyperText Transfer Protocol (HTTP) is how browsers and servers communicate with each other over the Internet.
The HTTP specification was coordinated by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF), resulting in RFC 2616. HTTP/1.1 is the most common version today, but some browsers and servers still use HTTP/1.0.
HTTP is a client/server protocol made up of requests and responses.
- A browser sends an HTTP request for a specific URL, and a server hosting that URL sends back an HTTP response.
Like many Internet services, the protocol uses a simple, plaintext format. The types of requests are GET, POST, HEAD, PUT, DELETE, OPTIONS, and TRACE. I’m going to focus on the GET request, which is the most common.
A GET request includes a URL followed by headers.
The HTTP response contains a status code, headers, and a body.
The following example shows the possible HTTP headers when requesting the script yahoo_2.0.0-b2.js.
The size of the response is reduced using compression if both the browser and server support it.
Browsers announce their support of compression using the Accept-Encoding header. Servers identify compressed responses using the Content-Encoding header.
If the browser has a copy of the component in its cache, but isn’t sure whether it’s still valid, a conditional GET request is made.
If the cached copy is still valid, the browser uses the copy from its cache, resulting in a smaller response and a faster user experience.
Typically, the validity of the cached copy is derived from the date it was last modified.
The browser knows when the component was last modified based on the Last-Modified header in the response. Just like browser is essentially saying, “I have a version of this resource with the following last modified date. May I just use it?”
If the component has not been modified since the specified date, the server returns a “304 Not Modified” status code and skips sending the body of the response, resulting in a smaller and faster response.
In HTTP/1.1 the ETag and If-None-Match headers are another way to make conditional GET requests. Both approaches are discussed in Chapter 13.
Conditional GET requests and 304 responses help pages load faster, but they still require making a roundtrip between the client and serverto perform the validity check.
The Expires header eliminates the need to check with the server by making it clear whether the browser can use its cached copy of a component.
When the browser sees an Expires header in the response, it saves the expiration date with the component in its cache.
As long as the component hasn’t expired, the browser uses the cached version and avoids making any HTTP requests.
Chapter 3 talks about the Expires and Cache-Control headers in more detail.
HTTP is built on top of Transmission Control Protocol (TCP).
In early implementations of HTTP, each HTTP request required opening a new socket connection.
This is inefficient because many HTTP requests in a web page go to the same server.
Persistent Connections (also known as Keep-Alive in HTTP/1.0) was introduced to solve the inefficiency of opening and closing multiple socket connections to the same server.
It lets browsers make multiple requests over a single connection.
Browsers and servers use the Connection header to indicate Keep-Alive support. The Connection header looks the same in the server’s response.
The browser or server can close the connection by sending a Connection: close header. Technically, the Connection: keep-alive header is not required in HTTP/1.1, but most browsers and servers still include it.
Pipelining, defined in HTTP/1.1, allows for sending multiple requests over a single socket without waiting for a response. Pipelining has better performance than persistent connections.
Unfortunately, pipelining is not supported in Internet Explorer (up to and including version 7), and it’s turned off by default in Firefox through version 2. Until pipelining is more widely adopted, Keep-Alive is the way browsers and servers can more efficiently use socket connections for HTTP.
This is even more important for HTTPS because establishing new secure socket connections is more time consuming.
This chapter contains just an overview of HTTP and focuses only on the aspects that affect performance.
To learn more, read the HTTP specification (http://www.w3.org/ Protocols/rfc2616/rfc2616.html) and HTTP: The Definitive Guide by David Gourley and Brian Totty (O’Reilly; http://www.oreilly.com/catalog/httptdg).
The parts highlighted here are sufficient for understanding the best practices described in the following chapters.
The Performance Golden Rule, reveals that only 10–20% of the end user response time involves retrieving the requested HTML document. The remaining 80–90% of the time is spent making HTTP requests for all the components (images, scripts, stylesheets, Flash, etc.) referenced in the HTML document.
Thus, a simple way to improve response time is to reduce the number of components, and, in turn, reduce the number of HTTP requests.
Suggesting the idea of removing components from the page often creates tension between performance and product design. In this chapter, I describe techniques for eliminating HTTP requests while avoiding the difficult tradeoff decisions between performance and design.
These techniques include using image maps, CSS sprites, inline images, and combined scripts and stylesheets.
If you use multiple hyperlinked images in this way, image maps may be a way to reduce the number of HTTP requests without changing the page’s look and feel.
An image map allows you to associate multiple URLs with a single image. The destination URL is chosen based on where the user clicks on the image.
There are two types of image maps.
Server-side image maps submit all clicks to the same destination URL, passing along the x,y coordinates of where the user clicked. The web application maps the x,y coordinates to the appropriate action.
Client-side image maps are more typical because they map the user’s click to an action without requiring a backend application. The mapping is achieved via HTML’s MAP tag.
No Image Map: http://stevesouders.com/hpws/imagemap-no.php
Image Map: http://stevesouders.com/hpws/imagemap.php
There are drawbacks to using image maps. Defining the area coordinates of the image map, if done manually, is tedious and error-prone, and it is next to impossible for any shape other than rectangles.
Like image maps, CSS sprites allow you to combine images, but they’re much more flexible.
To use CSS sprites, multiple images are combined into a single image.
The “planchette” is any HTML element that supports background images, such as a SPAN or DIV.
The HTML element is positioned over the desired part of the background image using the CSS background-position property.
For example, each SPAN has a different class that specifies the into the CSS sprite using the background-position property:
CSS Sprites: http://stevesouders.com/hpws/sprites.php
It is about as fast as the image map example: 342 milliseconds versus 354 milliseconds, respectively, but this difference is too small to be significant.
Whereas the images in an image map must be contiguous, CSS sprites don’t have that limitation.
More importantly, it is 57% faster than the alternative of using separate images.
In fact, the combined image tends to be smaller than the sum of the separate images as a result of reducing the amount of image overhead (color tables, formatting information, etc.).
If you use a lot of images in your pages for backgrounds, buttons, navbars, links, etc., CSS sprites are an elegant solution that results in clean markup, fewer images to deal with, and faster response times.
It’s possible to include images in your web page without any additional HTTP requests by using the data: URL scheme.
The data: URL scheme was first proposed in 1995. The specification (http://tools.ietf. org/html/rfc2397) says it “allows inclusion of small data items as ‘immediate’ data.” The data is in the URL itself following this format:
Inline Images: http://stevesouders.com/hpws/inline-images.php
Because data: URLs are embedded in the page, they won’t be cached across different pages. You might not want to inline your company logo, because it would make every page grow by the encoded size of the logo.
A clever way is placing this CSS rule in an external stylesheet means that the data is cached inside the stylesheet. In the following example, the background images used for each link in the navbar are implemented using inline images in an external stylesheet.
Inline CSS Images: http://stevesouders.com/hpws/inline-css-images.php
Comparing this example to the previous examples, we see that it has about the same response time as image maps and CSS sprites.
Putting the inline image in an external stylesheet adds an extra HTTP request, but has the additional benefit of being cached with the stylesheet.
The rules described in later chapters also present guidelines that help reduce the number of HTTP requests, but they focus primarily on subsequent page views.
For components that are not critical to the initial rendering of the page, the post-onload download technique described in Chapter 8 helps by postponing these HTTP requests until after the page is loaded.
The correct first step is found by recalling the Performance Golden Rule:
Only 10–20% of the end user response time is spent downloading the HTML document. The other 80–90% is spent downloading all the components in the page.
If the application web servers are closer to the user,the response time of one HTTP request is improved. On the other hand, if the component web servers are closer to the user, the response times of many HTTP requests are improved.
Rather than starting with the difficult task of redesigning your application in order to disperse the application web servers, it’s better to first disperse the component web servers. This not only achieves a bigger reduction in response times, it’s also easier thanks to content delivery networks.
A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content to users more efficiently.
When optimizing for performance, the server selected for delivering content to a specific user is based on a measure of network proximity. For example, the CDN may choose the server with the fewest network hops or the server with the quickest response time.
In addition to improved response times, CDNs bring other benefits. Their services include backups, extended storage capacity, and caching. A CDN can also help absorb spikes in traffic, for example, 11.11 :)
One drawback to relying on a CDN is that your response times can be affected by traffic from other web sites, possibly even those of your competitors. A CDN service provider typically shares its web servers across all its clients.
Another drawback is the occasional inconvenience of not having direct control of the content servers. For example, modifying HTTP response headers must be done through the service provider rather than directly by your ops team.
Finally, if your CDN service provider’s performance degrades, so does yours.
CDNs are used to deliver static content, such as images, scripts, stylesheets, and Flash.
Serving dynamic HTML pages involves specialized hosting requirements: database connections, state management, authentication, hardware and OS optimizations, etc. These complexities are beyond what a CDN provides.
Static files, on the other hand, are easy to host and have few dependencies. That is why a CDN is easily leveraged to improve the response times for a geographically dispersed user population.
If you conduct your own response time tests to gauge the benefits of using a CDN, it’s important to keep in mind that the location from which you run your test has an impact on the results.
At Yahoo!, this factor threw us off for awhile. Before switching Yahoo! Shopping to Akamai, our preliminary tests were run from a lab at Yahoo! headquarters, located near a Yahoo! data center. The response time improvements gained by switching to Akamai’s CDN—as measured from that lab—were less than 5% (not very impressive).
When we exposed the change to end users, there was an overall 20% reduction in response times on the Yahoo! Shopping site, just from moving all the static components to a CDN.
Fast response time is not your only consideration when designing web pages.
If it were, then we’d all take Rule 1 to an extreme and place no images, scripts, or stylesheets in our pages.
However, we all understand that images, scripts, and stylesheets can enhance the user experience, even if it means that the page will take longer to load.
Rule 3, described in this chapter, shows how you can improve page performance by making sure these components are configured to maximize the browser’s caching capabilities.