1997 White Paper

Copyright September 1997

By J.D. Koftinoff Software, Ltd. and Turner and Sons Productions, Inc.

The IF Kernel Overview

The IF Kernel System is written in C++ and is implemented as a Dynamic Link Library for Microsoft Windows 95, Windows NT 4.0 Workstation, and Windows NT 4.0 Server.

The IF Kernel manages the interception of all TCP/IP network system calls by all applications. It dynamically calls other modules to notify them before and after each TCP/IP network system call is made. Each of these modules has the opportunity to modify the parameters and return values for each system call.

The IF Kernel manages a separate execution thread which it uses to call a Poll function in each module. This Poll function in each module can do additional TCP/IP functionality such as background e-mail logging which will not be intercepted by the IF Kernel.

The IF-Researcher Module

The IF-Researcher Module is used to research and log users browsing habits.

The IF-Researcher Module keeps track of all TCP/IP socket handles opened by an application, all TCP/IP host name lookup requests, all TCP/IP connections made, and all data transmitted and received on each socket.

IF-Researcher does not block or censor any data or connections. It logs all network accesses to a local disk file, an e-email address, or a special network server via a direct socket connection.

The log file is sent as pure ASCII and its format is completely configurable to allow easy importing into any database application.

IF-Researcher understands HTTP (web browsing), NNTP (newsgroup browsing), SMTP (mail sending), POP3 (mail receiving) and FTP (file transfer) protocols. When a connection is made to a server on a specific port, IF-Researcher begins parsing the appropriate protocol on all the sent and received data on the socket.

IF-Researcher can log all HTTP GET requests, HTTP PUT requests, NNTP newsgroup overview accesses, NNTP message accesses, NNTP posting accesses, SMTP e-mail headers of transmitted e-mail, POP3 e-mail headers of received e-mail, and FTP file transfers.

The IF-Filter Lite Module

The IF-Filter Lite Module is a simple URL, Hostname, Newsgroup, and phrase blocking engine. It does not allow logging or the IF-Only features. The user can specify specific locations on the net that should not be allowed.

The IF-Only Module

The IF-Only Module is the reverse of the IF-Filter Lite Module, and is useful for public Internet Kiosks where only specific hostnames and URLs should be made available to the end user.

The system blocks all hostnames, HTTP URL’s and Newsgroups that are not already specified by the administrator.

The IF-Filter Module

All available IF Modules are a subset of the IF-Filter module. The IF-Filter module has all the capabilities of IF-Researcher, IF-Lite, and IF-Only, as well as full generic phrase blocking all in one module.

In addition, there are two logger modules available with IF-Filter. Each logger is identical and can be set to log different types of accesses and violations to different destinations, with different detail levels and logging format.

Each logger can be set individually to log to a local or remote file, an external e-mail SMTP server, or to a specialized internet server via a simple socket connection.

IF-Filter Phrase Overview

The Internet Filter can scan for multiple phrases at the same time. All data packets incoming and outgoing are searched. When The Internet Filter searches a data packet for a phrase, it ignores case, all white space and most punctuation.

You may specify each phrase explicitly, or you may also specify a group of similiar phrases.

Guidelines for phrases

When you are adding phrases to search for, you will want to make sure the phrases are not too short. If you specified the single word ‘[SEX]’ as a phrase to censor, then it would censor the following as well:

  friendS EX-husband 
  the esSEX Corporation 
  etc.. 

Therefore you must specify a long phrase that is specific enough. To help you specify many similiar long phrases, you may specify a phrase with optional words.

Phrase syntax

You may specify a single phrase to match. However, the characters ‘{’ ‘}’ ‘[’ ‘,’ and ‘]’ are special within phrases.

A phrase is always enclosed by either the square brackets [ ] or the curly brackets { }.

If a phrase is enclosed by the square brackets, then the phrase will be censored.

If a phrase is enclosed by the curly brackets, then the phrase will be censored and the socket connection that the phrase was seen on will be shut down.

You may specify a section of a phrase within the [ ] block, such as ‘[It is a][nice][day]’. The sections do not have to be single words.

You may specify optional words or sections within the [ ] block by separating them with a comma. For example: ‘[It is a][nice,bad][day,night]’.

By putting a comma as the first character after the ‘[’ in a section, you are telling The Internet Filter that the section may be skipped.

Here are some examples of the phrase syntax:

[My][cat][has fleas]

This specification would match ‘my cat has fleas’. Of course it will also match ‘MY CAT h a S FLEAS’ as well.

[My][cat,dog][has fleas]

This specification would match ‘my cat has fleas’ and will also match ‘my dog has fleas’.

[It is a][nice,bad][,dark,bright][day,night]

This specification would match many phrases, including ‘It is a nice day’, ‘It is a nice bright day’ as well as ‘It is a bad dark day’.

IF-Filter Registry Settings

All available IF Modules are a subset of the IF-Filter module. The following registry settings are common to all IF Modules - However the user interface systems for each application sets these differently.

Identification

Enables

TCP/IP Port Ignores/Denies

Logger A

Logger B

Email setup

Violation level control bits

Violation matching expansion hook

Pseudo User Name Hook

Individual phrase list file specification

Any phrase that is matched on any (non-ignored) data socket transmission or reception will be dispatched via the violation control bits to censor, log, or block any further data via that socket.

Any host name lookup request that matches any phrase specified in any BDM## file will cause the filter to block the host name lookup request.

Any host name lookup request that matches any phrase specified in any GDD## file will cause the filter to be completely disabled for connections made to that host.

Any NNTP newsgroup overview request that matches any phrase specified in any NEW## file will cause the filter to block the NNTP newsgroup overview request.

Any HTTP URL request that matches any phrase specified in any URL## file will cause the filter to block the HTTP request.

Any HTTP URL request that matches any phrase specified in any GURL## file will cause the filter to be completely disabled for that HTTP request.

Any HTTP URL request that matches any phrase specified in any GNW## file will cause the filter to be completely disabled for that HTTP request.

Violation Levels

There are 8 violation levels - Each violation level can be set to perform a different action depending on the violation level control bits.

All phrases in any of the phrase,bad domain,good domain,bad news, url, good URL list get all munged together with different violation level offsets into the phrase scanning engine.

If a phrase line specifies no violation level, it defaults to level 1

Control Bits for violation levels

If a violation phrase is matched that has violation level control bit 4 enabled, the DLL specified by EX01 will have its “PhraseMatched” function called. It will be called for each phrase that is matched that has control bit 4 enabled.

The “PhraseMatched” function will look like this:

DWORD __stdcall PhraseMatched( 
        LPSTR buf, 
        LPCSTR normalized_match_string, 
        DWORD length_of_match,
        DWORD match_level,
        DWORD phrase_type
       );

Where:

PhraseMatched should do what it wants when it gets the match, but it can also just return a number to specify an action. The return value has the same format as the ‘Violation Control Bits’ - so if the routine returned 0x0f, for instance, the phrase would be censored, logged to both logger A and logger B, and the socket would be disconnected.

User Name/Login Hook

EX02 contains the path name of a DLL file that will be run when the filter DLL is first loaded. The DLL function ‘GetFilterUserInfo’ will be called. Its job is to tell the filter what user is logged in currently, and which registry keys to use for the user specific settings. It can also tell the filter to not allow any TCP/IP activity for this user.

The “GetFilterUserInfo” function will look like this:

DWORD __stdcall GetFilterUserInfo( 
  LPSTR name_buf, 
  DWORD name_buf_len,
  PHKEY registry_location_base,
  LPSTR registry_location_string,
  DWORD registry_location_string_len
);

GetFilterUserInfo() should:

GetFilterUserInfo() must return 1 if the user is allowed to use TCP/IP at all. If it returns 0, the filter will block all TCP/IP accesses for this program/user combination. GetFilterUserInfo() can call the standard WIN32 function ‘GetCommandLine()’ to find out what program is trying to run.

If the User Name Hook DLL doesn’t exist, a default function will be run instead. It gets the current Windows user name (via GetUser()), and sets the registry location for the user specific settings to be in a key with the same name as the user, in the same registry location as the system registry settings. So if the user name is ‘Jeff’, it will look for a key named ‘Jeff’ in the same place all the other filter registry settings are.

Within that key can be any other registry key to override the normal system settings. So every user on a system can have completely different filter settings.

Examples of Various Phrase Forms

PLAIN 7 bit Violation Level 1:

    My dog has fleas

BRACKETED 7 bit Violation Level 1:

    [My dog has fleas]

OPTIONAL SECTIONS 7 bit Violation Level 1:

    [My][dog,cat][has fleas]

CURLY BRACED 7 bit Violation Level 1:

(for easy disconnect flag without having to set violation level control bits)

    {My}{dog,cat}{has fleas}

BRACKETED 8 bit Violation Level 1:

(whitespace and punctuation are not ignored or skipped, 8 bit unicode characters can be used)

    [[My ]][[dog,,cat]][[ has fleas]]

CURLY BRACED 8 bit Violation Level 1:

(for 8 bit unicode phrases with easy disconnect flag without having to set violation level control bits)

    {{My }}{{dog,,cat}}{{ has fleas}}

BRACKETED OR BRACED WITH VIOLATION LEVEL SPECIFIED

(the first character on the line is a number 1 - 8, or 1 to 89) (the section in Quotes can be any previous form)

    5 "[[My ]][[dog,,cat]][[ has fleas]]"

COMMENTED OUT OR DISABLED PHRASE

//5 “[[blah blah blah”

Notes for obscure phrase filter engine features:

Although there are a number of different phrase files for things such as phrases, newsgroups, domains, bad domains, etc, all phrases get placed into the same filter engine. The filter engine discernes between them by adding an offset to the violation level depending on the type of phrase it is:

7 Bit Phrases                  00 - 09 
8 Bit Phrases                  10 - 19
Bad Domains                    20 - 29
Good Domains                   30 - 39
Bad URLS                       40 - 49
Good URLS                      50 - 59
Newsgroups                     60 - 69
7 Bit Phrases with disconnect  70 - 79
8 Bit Phrases with disconnect  80 - 89
Good Newsgroups               90 - 99

What this actually means is that a phrase in a ‘bad domains’ file list that looks like this:

    6 "www.microsoft.com"

is identical to an entry:

    26 "www.microsoft.com"

When specified in a plain phrase file.

So in fact you could stick all phrases,domains,etc etc, into one ‘phrase’ file.

Notes on specifying URL’s for the phrase engine:

URLS in both BAD and GOOD URL listings should contain the optional port number. For example:

for http://www.microsoft.com/ie4 you would use

"[http://][,www.][microsoft.com][,:80][/ie4]"

Not:

"[http://www.microsoft.com/ie4]"

Logger Line Format String Notes:

The fields LA13 and LB13 specify the layout of each log line. The layout string is kind of line a sprintf() format string with the following fields:

Plus you have all the standard fields available from the standard C function “strftime()”, such as:

ACCESS CLASS ID/STRINGS:

BLOCK STATUS STRINGS:

ACCESS INFO STRINGS:

For TCP/IP connecting:

For All data connections:

For HTTP access class:

For NNTP access class:

For SMTP:

For POP3:

DETAIL LEVELS