Content Filter

Planning
In order to build a successful content filter, some questions need answering. How many users will be filtered? What will be filtered? How will it be filtered?

DansGuardian is a free open source plugin for Squid, that facilities easy content filtering. The DansGuardian wiki has a good over view of filter architecture.

There are also many other alternatives, methods, and solutions apart from DansGuardian that enable content filtering for Squid proxy, Squidblacklist.org is one such commercial service.

Single Machines And Small Networks
For single machines and small networks I find that Tinyproxy and Dansguardian work best.

Tinyproxy is a very lightweight and easy to configure proxy server. Dansguardian actually does all the filtering work, but is unable to fetch pages by itself, this is where Tinyproxy comes in.

A user will open their web browser and request a web page, this request typically travels over port 80, we'll redirect port 80 to a port that Dansguardian is listening on, then Dansguardian will forward the request onto Tinyproxy, who will then actually go out and fetch the web page.

Installing and Configuring Tinyproxy
By default a few USE flags are set for Tinyproxy, most are not needed for this guide but don't hurt either. However be sure that the transparent-proxy USE flag is not set. It is not compatible with this guide.

First, install Tinyproxy:

Next, back up the default configuration in case things go astray:

Finally, open /etc/tinyproxy.conf in your favorite text editor, make sure you have suid root privileged. Make sure it looks something like this:

Start Tinyproxy by issuing this command:

Finally, add Tinyproxy to the start up list:

Installing and Configuring DansGuardian
One thing you need to understand about DansGuardian is that it supports multiple filter groups, if you are interested you can read more about them in the large network section below. For the single machine or small network we'll set up a single filter group that everyone will be a member of.

First lets install DansGuardian:

You may want/need to enable the pcre USE flag, this enables Perl Compatible Regular Expressions.

First, lets back up the default configuration, just to be safe:

Now lets start configuring the global settings, for now just change the parameters here, leave everything else, it's a big file:

Now that the global configuration has been taken care of, lets configure the per-group settings. If there is a conflict between per-group and global settings, per-group settings will always win; just keep that in mind while configuring and troubleshooting.

Again, lets make a backup of the defaults, just in case:

Lets configure the first filter group. Again, this is a big file with lots going on, if a parameter isn't mentioned in this outline, usually its ok to leave it at default settings:

Under /etc/dansguardian/lists are a set of files that control the mechanics of filtering. There is a lot going on here, usually the defaults are ok. Although a few files are worth noting individually.
 * bannedsitelist - here, there is a section to explicitly list sites to be blocked.
 * exceptionsitelist - similar to bannedsitelist, except for allowing sites rather then blocking.
 * contentregexlist - this is a rather hairy file, especially if you have never worked with regular expressions before, enabling too many of these options often causes over filter problems and performance problems.

Unfortunately, the only official documentation on how to use these lists are the comments in the list files themselves.

Start DansGuardian by issuing this command:

Finally, add DansGuardian to the start up list:

Large Networks
Larger networks pose a few challenges to administrators; the most prevalent being the sheer size of the network. With average enterprise class networks having thousands of users and thousands of machines with many subgroups requiring different levels of service, and the new wave of Bring Your Own Device or BYOD networks, the big question is; how to handle all this traffic without making compromises in what is filtered?

DansGuardian answers this question with filter groups. With filter groups you'll need some external method of authentication so that you can apply different levels of authorization with filter groups, which is unfortunately outside of the scope of this guide. In this guide we'll look at some of the ideas and concepts behind authentication with DansGuardian but the implementation details will be left up to you or another guide.

As you read this guide pay special attention to the default group information located under the Filter Group section. It is a powerful tool that catches everyone who hasn't authenticated.

Installing and Configuring Squid
Make sure that the tproxy USE flag is not set.

You may wish to conciser setting these USE flags:
 * logrotate - if you are already using logrotate to clean up old logs, you can include Squid's logs in the process via USE flag
 * kerberos, ldap, nis, radius, samba, and sasl - these flags make integrating Squid with existing authentication schemes a little easier, not all are required

To install Squid, execute this command:

The Squid config is just as (if not more so) big and scary as the DansGuardian config. Each network has its own set of needs and generalizing all of those into one perfect configuration is impossible. Instead, this configuration is meant to give you the most basic set up to get you running while explaining some core Squid configuration topics. Fine tuning the Squid configuration to suit a specific requirement is up to the individual administrator. For a much closer look at all the configuration parameters that Squid accepts have a look at the official documentation and for some (probably more helpful then raw configuration parameters) example configs check out the Squid wiki.

As per usual, lets back up the default configuration, lest we find ourselves in need of it.

Keep in mind, this is a very minimal config to get you started:

You can check your configuration file for syntax errors by running:

Anything it marks as a warning, should be looked into but is not critical to getting Squid to run. Anything marked as an error will need fixed before Squid will even start.

Add Squid to the start up list:

And then start Squid:

Tweaking DansGuardian For Heavier Use
The basic global configuration for DansGuardian is pretty much the same as in the previous small networks example. So make sure you read over that section first and then come back here.

Overview of Large Network Filtering
This is essentially a recap of what was presented in the single machine/small network section, with extra content. The real meat and potatoes to enterprise class, per-user filtering is based on authentication methods and filter groups. Basically you sub-divide your users into groups that need different levels of filtering, users (either in the background or directly) authenticate and are placed into the appropriate filter group. So for example, a very basic and minimal set up for a K-12 school might have a filter group for staff, and another for students, and yet another for technical staff who might need access to technical forums and documentation that regular users shouldn't have. Finally, there is a default group. The default group is a powerful tool, a catch all that handles unauthenticated users.

Authentication and Authorization
The DansGuardian wiki makes a distinction between authentication and authorization. The long and the short of it is that; authentication is verifying who someone is, and authorization is checking that a given person is allowed to perform some action.

If you'd rather not read the wiki, there is one important gotcha you need to be aware of, that is that DansGuardian itself only cares about a username. The password could be incorrect and DansGuardian wouldn't care. That means to effectively authenticate users, you need to offload the task to another process like Squid.

There are three general categories of authentication processes supported by both DansGuardian and Squid: To use Squid as your authentication process checkout some of their examples and pick one that matches the architecture of your existing network.
 * BASIC - sends credentials in plain text, very insecure (not recomended)
 * Digest - hashes credentials before sending them over the network
 * NTLM - New Technology Lan Manager; A proprietary protocol that also hashes credentials, really only useful if you have a Microsoft network

Now that you decided how to authenticate your users, you'll need to uncomment one of these lines in your global config:

Filter Groups
Before you can have multiple filter groups, you'll need to let DansGuardian know how many you'll have by changing this parameter to however many you want, in this example there will be 3:

Each filter group will have its own configuration file named /etc/dansguardian/dansguardianfN.conf

Where N is a number assigned to the group, typically you'll need at least a first group or dansguardianf1.conf and by default this is the default group. You should probably make a copy of it to act as a template for further groups:

Lets also copy the lists directory as well:

Now that you have a filter group template made open the first group with your favorite text editor, but first realize that this touches on some of the basic configuration options and is not an all inclusive list:

Transparent Filtering With IPTables
So you've decided you'd like to setup your machine to transparently filter web content, the most obvious solution for single computer is to use IPTables to redirect traffic. If you don't already have experience with IPTables have a look at the Gentoo Wiki on the subject before proceeding.

Lets recap; our proxy is listening on port 3128 and address 127.0.0.1, DansGuardian is listening on port 8080 and address 127.0.0.1. We only need to point our web traffic to DansGaurdian which is configured to route traffic through Squid and back. It helps to know that standard HTTP travels over port 80.

Next make sure your kernel is configured with the proper netfilter options:

We'll also need to find the UID of whatever user your proxy is running under, which should be its own separate UID from both DansGaurdian and any human users.

Finally append your IPTables rule list with something like this (put the number you got from the previous grep of /etc/passwd in place of $SQUID):

Once you are sure everything is working as it should, save the iptables state with this command: