1997 White Paper

Copyright September 1997

By J.D. Koftinoff Software, Ltd. and Turner and Sons Productions, Inc.

The IF Kernel Overview

The IF Kernel System is written in C++ and is implemented as a Dynamic Link Library for Microsoft Windows 95, Windows NT 4.0 Workstation, and Windows NT 4.0 Server.

The IF Kernel manages the interception of all TCP/IP network system calls by all applications. It dynamically calls other modules to notify them before and after each TCP/IP network system call is made. Each of these modules has the opportunity to modify the parameters and return values for each system call.

The IF Kernel manages a separate execution thread which it uses to call a Poll function in each module. This Poll function in each module can do additional TCP/IP functionality such as background e-mail logging which will not be intercepted by the IF Kernel.

The IF-Researcher Module

The IF-Researcher Module is used to research and log users browsing habits.

The IF-Researcher Module keeps track of all TCP/IP socket handles opened by an application, all TCP/IP host name lookup requests, all TCP/IP connections made, and all data transmitted and received on each socket.

IF-Researcher does not block or censor any data or connections. It logs all network accesses to a local disk file, an e-email address, or a special network server via a direct socket connection.

The log file is sent as pure ASCII and its format is completely configurable to allow easy importing into any database application.

IF-Researcher understands HTTP (web browsing), NNTP (newsgroup browsing), SMTP (mail sending), POP3 (mail receiving) and FTP (file transfer) protocols. When a connection is made to a server on a specific port, IF-Researcher begins parsing the appropriate protocol on all the sent and received data on the socket.

IF-Researcher can log all HTTP GET requests, HTTP PUT requests, NNTP newsgroup overview accesses, NNTP message accesses, NNTP posting accesses, SMTP e-mail headers of transmitted e-mail, POP3 e-mail headers of received e-mail, and FTP file transfers.

The IF-Filter Lite Module

The IF-Filter Lite Module is a simple URL, Hostname, Newsgroup, and phrase blocking engine. It does not allow logging or the IF-Only features. The user can specify specific locations on the net that should not be allowed.

The IF-Only Module

The IF-Only Module is the reverse of the IF-Filter Lite Module, and is useful for public Internet Kiosks where only specific hostnames and URLs should be made available to the end user.

The system blocks all hostnames, HTTP URL’s and Newsgroups that are not already specified by the administrator.

The IF-Filter Module

All available IF Modules are a subset of the IF-Filter module. The IF-Filter module has all the capabilities of IF-Researcher, IF-Lite, and IF-Only, as well as full generic phrase blocking all in one module.

In addition, there are two logger modules available with IF-Filter. Each logger is identical and can be set to log different types of accesses and violations to different destinations, with different detail levels and logging format.

Each logger can be set individually to log to a local or remote file, an external e-mail SMTP server, or to a specialized internet server via a simple socket connection.

IF-Filter Phrase Overview

The Internet Filter can scan for multiple phrases at the same time. All data packets incoming and outgoing are searched. When The Internet Filter searches a data packet for a phrase, it ignores case, all white space and most punctuation.

You may specify each phrase explicitly, or you may also specify a group of similiar phrases.

Guidelines for phrases

When you are adding phrases to search for, you will want to make sure the phrases are not too short. If you specified the single word ‘[SEX]’ as a phrase to censor, then it would censor the following as well:

  friendS EX-husband 
  the esSEX Corporation 
  etc.. 

Therefore you must specify a long phrase that is specific enough. To help you specify many similiar long phrases, you may specify a phrase with optional words.

Phrase syntax

You may specify a single phrase to match. However, the characters ‘{’ ‘}’ ‘[’ ‘,’ and ‘]’ are special within phrases.

A phrase is always enclosed by either the square brackets [ ] or the curly brackets { }.

If a phrase is enclosed by the square brackets, then the phrase will be censored.

If a phrase is enclosed by the curly brackets, then the phrase will be censored and the socket connection that the phrase was seen on will be shut down.

You may specify a section of a phrase within the [ ] block, such as ‘[It is a][nice][day]’. The sections do not have to be single words.

You may specify optional words or sections within the [ ] block by separating them with a comma. For example: ‘[It is a][nice,bad][day,night]’.

By putting a comma as the first character after the ‘[’ in a section, you are telling The Internet Filter that the section may be skipped.

Here are some examples of the phrase syntax:

[My][cat][has fleas]

This specification would match ‘my cat has fleas’. Of course it will also match ‘MY CAT h a S FLEAS’ as well.

[My][cat,dog][has fleas]

This specification would match ‘my cat has fleas’ and will also match ‘my dog has fleas’.

[It is a][nice,bad][,dark,bright][day,night]

This specification would match many phrases, including ‘It is a nice day’, ‘It is a nice bright day’ as well as ‘It is a bad dark day’.

IF-Filter Registry Settings

All available IF Modules are a subset of the IF-Filter module. The following registry settings are common to all IF Modules - However the user interface systems for each application sets these differently.

Identification

  • ID STRING Computer Station name
  • ID1 DWORD Decryption key for phrase files

Enables

  • E00 DWORD Enable everything (0 or 1)
  • E01 DWORD Enable Phrase Scan (0 or 1)
  • E02 DWORD Enable Domain Scan (0 or 1)
  • E03 DWORD Enable Newsgroup Scan (0 or 1)
  • E04 DWORD Enable Good Host Names Only (0 or 1)
  • E05 DWORD Enable URL scan (0 or 1)
  • E06 DWORD Enable Logger A (0 or 1)
  • E07 DWORD Enable Logger B (0 or 1)
  • E08 DWORD Enable Pseudo User Name Hook (0 or 1)
  • E09 DWORD Enable Good Newsgroups Only
  • E10 DWORD Enable Email scanning

TCP/IP Port Ignores/Denies

  • P01 DWORD Ignore EMAIL connections pop3 smtp etc (0 or 1)
  • P02 STRING ASCII List of destination tcp/ip ports to ignore scanning
  • P03 STRING ASCII List of destination tcp/ip ports to deny

Logger A

  • LA01 DWORD Logger A type (0 = file, 1=email, 2=socket)
  • LA02 DWORD Logger A detail level (16=full detail)
  • LA03 DWORD Log all host names to logger A (0 or 1)
  • LA04 DWORD Log all URLs to logger A (0 or 1)
  • LA05 STRING Logger A file name (if type is 0), or socket destination (if type 2)
  • LA06 STRING Logger A Destination Email Address
  • LA07 STRING Logger A Email subject line
  • LA08 DWORD Logger A log all POP3 accesses
  • LA09 DWORD Logger A log all SMTP accesses
  • LA10 DWORD Logger A log all NNTP accesses
  • LA11 DWORD Logger A encryption form
  • LA12 DWORD Logger A encoding form
  • LA13 STRING Logger A log line format string
  • LA14 DWORD Logger A log all tcpip connections

Logger B

  • LB01 DWORD Logger B type (0 = file, 1=email, 2=socket)
  • LB02 DWORD Logger B detail level (16=full detail)
  • LB03 DWORD Log all host names to logger B (0 or 1)
  • LB04 DWORD Log all URLs to logger B (0 or 1)
  • LB05 STRING Logger B file name (if type is 0), or socket destination (if type 2)
  • LB06 STRING Logger B Destination Email Address
  • LB07 STRING Logger B Email subject line
  • LB08 DWORD Logger B log all POP3 accesses
  • LB09 DWORD Logger B log all SMTP accesses
  • LB10 DWORD Logger B log all NNTP accesses
  • LB11 DWORD Logger B encryption form
  • LB12 DWORD Logger B encoding form
  • LB13 STRING Logger B log line format string
  • LB14 DWORD Logger B log all tcpip connections

Email setup

  • M01 STRING Source Email Address
  • M02 STRING SMTP host name
  • M03 DWORD TCP/IP port to connect to on SMTP server
  • M04 DWORD Time to delay sending email log in seconds

Violation level control bits

  • B01 DWORD Control Bits for violation level 1
  • B02 DWORD Control Bits for violation level 2
  • B03 DWORD Control Bits for violation level 3
  • B04 DWORD Control Bits for violation level 4
  • B05 DWORD Control Bits for violation level 5
  • B06 DWORD Control Bits for violation level 6
  • B07 DWORD Control Bits for violation level 7
  • B08 DWORD Control Bits for violation level 8

Violation matching expansion hook

  • EX01 STRING Path name of DLL to call when violation with bit 4 set occurs

Pseudo User Name Hook

  • EX02 STRING Path name of DLL to call to verify user is logged in

Individual phrase list file specification

  • PHR01 - PHR07 STRING Full path name of phrase files

Any phrase that is matched on any (non-ignored) data socket transmission or reception will be dispatched via the violation control bits to censor, log, or block any further data via that socket.

  • BDM01 - BDM07 STRING Full path name of bad domain files

Any host name lookup request that matches any phrase specified in any BDM## file will cause the filter to block the host name lookup request.

  • GDD01 - GDD07 STRING Full path name of good domain files

Any host name lookup request that matches any phrase specified in any GDD## file will cause the filter to be completely disabled for connections made to that host.

  • NEW01 - NEW07 STRING Full path name of bad newsgroup file

Any NNTP newsgroup overview request that matches any phrase specified in any NEW## file will cause the filter to block the NNTP newsgroup overview request.

  • URL01 - URL07 STRING Full path name of bad URL list file

Any HTTP URL request that matches any phrase specified in any URL## file will cause the filter to block the HTTP request.

  • GURL01 - GURL07 STRING Full path name of good URL list file

Any HTTP URL request that matches any phrase specified in any GURL## file will cause the filter to be completely disabled for that HTTP request.

  • GNW01 - GNW07 STRING Full path name of good newsgroup file

Any HTTP URL request that matches any phrase specified in any GNW## file will cause the filter to be completely disabled for that HTTP request.

Violation Levels

There are 8 violation levels - Each violation level can be set to perform a different action depending on the violation level control bits.

All phrases in any of the phrase,bad domain,good domain,bad news, url, good URL list get all munged together with different violation level offsets into the phrase scanning engine.

If a phrase line specifies no violation level, it defaults to level 1

Control Bits for violation levels

  • Bit 0 - (value 0x01) Censor phrase

  • Bit 1 - (value 0x02) Log to logger A

  • Bit 2 - (value 0x04) Log to logger B

  • Bit 3 - (value 0x08) Disconnect socket

  • Bit 4 - (value 0x10) Execute DLL specified by EX01

If a violation phrase is matched that has violation level control bit 4 enabled, the DLL specified by EX01 will have its “PhraseMatched” function called. It will be called for each phrase that is matched that has control bit 4 enabled.

The “PhraseMatched” function will look like this:

DWORD __stdcall PhraseMatched( 
        LPSTR buf, 
        LPCSTR normalized_match_string, 
        DWORD length_of_match,
        DWORD match_level,
        DWORD phrase_type
       );

Where:

  • buf is a pointer into the raw buffer at the location that matched
  • normalizedmatchstring is a string showing the phrase that was matched without whitespace, etc.
  • lengthofmatch is the number of characters ‘buf’ points to, including all whitespace and punctuation.
  • match_level is the violation level attached to the phrase.
  • phrase_type is the type of phrase that was matched:

    0 = 7 bit text phrase 1 = 8 bit text phrase (unicode) 2 = Bad domain 3 = Good domain 4 = Bad URL 5 = Good URL 6 = Newsgroup 7 = 7 bit text phrase with disconnect 8 = 8 bit text phrase with disconnect (unicode) 9 = Good newsgroup

PhraseMatched should do what it wants when it gets the match, but it can also just return a number to specify an action. The return value has the same format as the ‘Violation Control Bits’ - so if the routine returned 0x0f, for instance, the phrase would be censored, logged to both logger A and logger B, and the socket would be disconnected.

User Name/Login Hook

EX02 contains the path name of a DLL file that will be run when the filter DLL is first loaded. The DLL function ‘GetFilterUserInfo’ will be called. Its job is to tell the filter what user is logged in currently, and which registry keys to use for the user specific settings. It can also tell the filter to not allow any TCP/IP activity for this user.

The “GetFilterUserInfo” function will look like this:

DWORD __stdcall GetFilterUserInfo( 
  LPSTR name_buf, 
  DWORD name_buf_len,
  PHKEY registry_location_base,
  LPSTR registry_location_string,
  DWORD registry_location_string_len
);

GetFilterUserInfo() should:

  • Fill in ‘namebuf’ with the ASCII name of the user that is logged in. namebuf_len is the size of the name buffer.

  • Fill in ‘registrylocationbase’ with the HKEY value of the root registry key to access user specific settings. For instance it could set *registrylocationbase to HKEYMACHINEKEY or HKEYUSERKEY.

  • Fill in the ‘registrylocationstring’ with the ASCII name of the the location within the registrylocationbase. registrylocationstringlen specifies the length of the registrylocation_string buffer.

GetFilterUserInfo() must return 1 if the user is allowed to use TCP/IP at all. If it returns 0, the filter will block all TCP/IP accesses for this program/user combination. GetFilterUserInfo() can call the standard WIN32 function ‘GetCommandLine()’ to find out what program is trying to run.

If the User Name Hook DLL doesn’t exist, a default function will be run instead. It gets the current Windows user name (via GetUser()), and sets the registry location for the user specific settings to be in a key with the same name as the user, in the same registry location as the system registry settings. So if the user name is ‘Jeff’, it will look for a key named ‘Jeff’ in the same place all the other filter registry settings are.

Within that key can be any other registry key to override the normal system settings. So every user on a system can have completely different filter settings.

Examples of Various Phrase Forms

PLAIN 7 bit Violation Level 1:

    My dog has fleas

BRACKETED 7 bit Violation Level 1:

    [My dog has fleas]

OPTIONAL SECTIONS 7 bit Violation Level 1:

    [My][dog,cat][has fleas]

CURLY BRACED 7 bit Violation Level 1:

(for easy disconnect flag without having to set violation level control bits)

    {My}{dog,cat}{has fleas}

BRACKETED 8 bit Violation Level 1:

(whitespace and punctuation are not ignored or skipped, 8 bit unicode characters can be used)

    [[My ]][[dog,,cat]][[ has fleas]]

CURLY BRACED 8 bit Violation Level 1:

(for 8 bit unicode phrases with easy disconnect flag without having to set violation level control bits)

    {{My }}{{dog,,cat}}{{ has fleas}}

BRACKETED OR BRACED WITH VIOLATION LEVEL SPECIFIED

(the first character on the line is a number 1 - 8, or 1 to 89) (the section in Quotes can be any previous form)

    5 "[[My ]][[dog,,cat]][[ has fleas]]"

COMMENTED OUT OR DISABLED PHRASE

//5 “[[blah blah blah”

Notes for obscure phrase filter engine features:

Although there are a number of different phrase files for things such as phrases, newsgroups, domains, bad domains, etc, all phrases get placed into the same filter engine. The filter engine discernes between them by adding an offset to the violation level depending on the type of phrase it is:

7 Bit Phrases                  00 - 09 
8 Bit Phrases                  10 - 19
Bad Domains                    20 - 29
Good Domains                   30 - 39
Bad URLS                       40 - 49
Good URLS                      50 - 59
Newsgroups                     60 - 69
7 Bit Phrases with disconnect  70 - 79
8 Bit Phrases with disconnect  80 - 89
Good Newsgroups               90 - 99

What this actually means is that a phrase in a ‘bad domains’ file list that looks like this:

    6 "www.microsoft.com"

is identical to an entry:

    26 "www.microsoft.com"

When specified in a plain phrase file.

So in fact you could stick all phrases,domains,etc etc, into one ‘phrase’ file.

Notes on specifying URL’s for the phrase engine:

URLS in both BAD and GOOD URL listings should contain the optional port number. For example:

for http://www.microsoft.com/ie4 you would use

"[http://][,www.][microsoft.com][,:80][/ie4]"

Not:

"[http://www.microsoft.com/ie4]"

Logger Line Format String Notes:

The fields LA13 and LB13 specify the layout of each log line. The layout string is kind of line a sprintf() format string with the following fields:

  • “!1” = access class number
  • “!2” = access class string (see below)
  • “!3” = access detail level
  • “!4” = block status
  • “!5” = access info
  • “!6” = full URL/File/Newsgroup/Phrase
  • “!7” = additional 1 field
  • “!8” = additional 2 field
  • “!9” = computer source ID (registry field ‘ID’)
  • ”!!” = “!”

Plus you have all the standard fields available from the standard C function “strftime()”, such as:

  • %m = 2 digit month
  • %d = 2 digit day
  • %y = 2 digit year
  • %I = 2 digit 12 hour clock hour
  • %M = 2 digit minute
  • %S = 2 digit second
  • %p = AM or PM

    • there are lots more, look up strftime() for more info.

ACCESS CLASS ID/STRINGS:

  • 0 = “HTTP”
  • 1 = “NNTP”
  • 2 = “SMTP”
  • 3 = “POP3”
  • 4 = “HOSTNAME” - Host name look up
  • 5 = “TRANSMITTED” - Transmitted data phrase
  • 6 = “RECEIVED” - Received data phrase
  • 7 = “DATA” - Transmitted or received data phrase
  • 8 = “CONNECTION” - Connection made to ip address

BLOCK STATUS STRINGS:

  • “CENSORED” - phrase matched and blanked
  • “SEEN” - phrase matched and not blanked
  • “GOOD” - access to known good host name, HTTP URL, or Newsgroup
  • “BLOCKED” - blocked access to known bad host name, HTTP URL or Newsgroup
  • “ACCESSED” - allowed access to host name, HTTP URL, or Newsgroup

ACCESS INFO STRINGS:

For TCP/IP connecting:

  • “FAILED” - connection attempted but failed
  • “ACCESSED” - connection made
  • “BLOCKED” - tcp/ip address blocked

For All data connections:

  • “PHRASE” - data phrase match

For HTTP access class:

  • “GET” - http get request
  • “PUT” - http put request

For NNTP access class:

  • “NEWSGROUP” - nntp newsgroup overview request
  • “POST” - nntp posted message
  • “READ” - nntp read message
  • “ATTACHMENT” - nntp attachment found

For SMTP:

  • “FROM” - message from address
  • “TO” - mail to list
  • “CC” - mail cc list
  • “SUBJECT” - mail subject line
  • “ATTACHMENT” - mail attachment found

For POP3:

  • “FROM” - message from address
  • “TO” - mail to list
  • “CC” - mail cc list
  • “SUBJECT” - mail subject line
  • “ATTACHMENT” - mail attachment found

DETAIL LEVELS

  • Detail level 0 : Something BLOCKED/CENSORED
  • Detail level 1 : Something ACCESSED/newsgroup accessed
  • Detail level 2 : Connection FAILED/newsgroup article accessed
  • Detail level 3 : Newsgroup article or e-mail attachement found
( categories: )