News

Net filtration 101 - and why it can't protect Australia

Richard Chirgwin

I have stayed away from the politics of Internet filtering because it's such a fraught subject. But just as individual owners need to understand filtering, so do SMEs and network owners, because the impact of filtering goes beyond the headline that filtering will slow down your broadband access speeds.

The problem in discussing filtering is that there are many approaches – too many to discuss in detail here. So please consider this a primer for the business, rather than an in-depth technology piece.

The government is trying to confront a problem that has been with us since before the invention of the World Wide Web: given an open communications medium, some people will abuse that medium. That, at least, is agreed, even if opinions of the extent and danger of that abuse may differ.

The distribution of pornography is just one such abuse. As we can see in the news each day, there are many others: the operation of “botnets” and other pieces of hostile software that aim to harvest user data for later identity-based fraud; online bullying; even selfish applications that impact other applications.

But it's the pornography – and particularly that which relates to children, or that which may (deliberately or otherwise) be viewed by children – that the filter proposals attempt to address.

To understand the basic approaches taken by different filters, we first need a quick refresher covering how your Internet browser – whether Internet Explorer,

Requires Free Membership to View

Firefox, Safari or some other browser – fetches content from a Web server.

All Internet communications are between devices that have IP addresses. Since these aren't particularly human-friendly, the Web browser avoids using the IP address to fetch pages (although anyone who has set up a home ADSL modem will recognise one of the universal addresses, like 192.168.0.1). So the Internet has a second infrastructure called the Domain Name Service, or DNS, that stores URLs (such as www.searchnetworking.com.au) and relates them to the IP address hosting that URL.

When you start browsing, your browser first queries the DNS to get the IP address it wants (for example, the address of Google). It's worth remembering – because this is relevant to filtering – that the DNS is not a single “thing”. Rather, it's a hugely distributed database. A DNS server within a company may keep a relatively small number of often-used IP addresses, and there are many other DNS servers in a hierarchy all the way up to so-called “root” servers that maintain a complete database. The root servers are a kind of “server of last resort”: if nobody else knows the address, the root server will know it, but generally things work best if addresses are as widely known as possible, so that the root servers themselves don't become a speed bottleneck.

After the browser has found the IP address of the Web server you're looking for, it contacts the Web server, asks for the home page – and the rest you're probably familiar with!

At a simplified level, then, we have three places where a filter could potentially manage user traffic:

  • The URL;
  • The IP address; and
  • The content.

Let's take a closer look at each of these, in the context of mandated national filtering.

URL Filtering

This looks simple. If a user is trying to load the Web page www.fubar.com.au/home.html, and that Web page is included on a filter's blacklist, then the user will get a message saying that the page is blocked.

If it were that simple, then filtering might not slow down the users' browsing too much. After all, once the filter has said “that Web page is allowed”, it doesn't have any more to do with the download. If the Web page is not allowed, the ISP might simply send a “Blocked” page to the user instead of the Web page as requested.

The problem is that most Web pages – and certainly those of popular content sites like national newspapers and multimedia – aren't a single URL and haven't been for quite some time. If www.fubar.com.au happened to be the home page of a major news publisher, then the “page” would obtain its content from a host of URLs. For example, there might be:

  • The “Fubar News” home page;
  • Images fed into the home page, by a content management system, each with a different URL (so they can be streamed into any page the publisher desires);
  • Aggregated feeds from other content servers, each arriving in their own panels on the home page;
  • Advertisements streamed from Google, Sensis, Doubleclick and others.

In a complex page like this, content may be obtained from dozens of URLs, each of which has to be okayed by the filter before they download – and that will slow down the download of the page as a whole, even though the user probably thinks of www.fubar.com.au as a single URL.

Users probably even have experience of this in today's unfiltered browsing: if there's a problem with an advertising server, it can dramatically slow down the time it takes to display a newspaper's Web site.

The other problem with URL filtering is sensitivity.

You could, for example, assume that because someone noticed prohibited content on www.fubar.com.au, everything from that URL should be banned. That doesn't work, however, if you're talking about a blog site. Just because one user misused his or her blog doesn't mean every blog from that site should be banned.

One response to this might be to make the filter much more specific: if you ban blogs.fubar.com.au/evil_user_blog/illegal_images.html then you take out the “evil user's” images and nobody else. However, the “evil user” needs only rename the offending link to bypass the filter.

Moreover, content management systems make the business of changing URLs so smooth as to be almost automatic.

Filtering IP Addresses

Instead of filtering the URL, what about the IP address?

Here, the filtering is easiest to implement at the DNS lookup stage: a user types a URL into their browser, the browser asks for the DNS of the server, but because the server is on the banned list, the user gets a “404 – host not found” error.

This should, in theory, reduce the impact on users, because we already wait for the DNS lookups in our usual Web browsing. If the DNS isn't blocked, we might expect the rest of the browsing to happen as normal.

Except ...

DNS-based filtering suffers from the same granularity problems as URL-based filtering. If the filter blocks a server's IP address, and that server is (for example) a server hosting thousands of users' Web pages, then everybody is blocked, not just the offending content.

Moreover, for the technically-literate, the relationship between the Web page and the IP address need not be fixed, permanently and forever – so a Web page that's unobtainable today may be re-hosted at another IP address tomorrow.

There is, however, another underlying issue with DNS-based filtering.

Addressing is fundamental to the Internet. Perhaps foolishly, we trust the Internet's infrastructure not to interfere with addressing.

Old-timers might remember how irritated they became if their telephone suffered from “crossed wires”, because they expected to be calling one number and got another number – that's an addressing problem. On the Internet, we have to trust addresses (even though they're only weakly trustworthy). And that trust is assumed – most people aren't even aware they're trusting the addressing system when they visit the login of their Internet banking server.

The degree of that trust is worrying to security experts, who are trying to create a more trustworthy domain name system in which an address owner has a way to prove that they have the right to use a particular address.

Filtering creates problems with users' ability to trust the addresses – because instead of being a fundamental part of Internet infrastructure, addresses are handed out according to the behaviour and rules of the filter. As a result, the filter becomes a single point of failure, not just for users whose lookups are blocked, but for every user behind the filter.

Filter the Content?

The third approach mentioned above is to look at the content itself: does the data inside the user's packets “look like” banned images?

The biggest problems with this approach are scale and privacy. To apply filtering to all the user content demands what's called “deep packet inspection”: like a wartime mail censor, the filter looks at the data inside every IP packet to see whether it contains nasties.

Apart from the obvious civil liberty question – should a government, except in wartime, examine all of its citizens' communications? – there is the problem of scale. To build a machine able to examine every packet in every communication by ever citizen would be very expensive. Each of Australia's Internet users downloads, on average, around 20 gigabytes per month, according to the ABS – and each of those packets would have to be examined on its way through. That really would create a huge World Wide Wait for everybody.

And, of course, deep packet inspection relies on the algorithms developed by filter companies, which may or may not make the right decisions about what to block and what to pass.

So in short, these approaches are suitable for corporate environments, where users are “borrowing” someone else's network and are expected to obey company policy, and where blocking errors can be managed by the company's systems administrators. But such filtering would not work on a national basis.