Gleaning Structure from NMAP Scan Reports Using I,Sushi

From The Math Club

Jump to: navigation, search

Contents

A little about NMAP

NMAP is a network scanning tool. It can generate textual scan reports containing information about each host. One characteristic about hosts it cannot currently report are statistical and structural characteristics associated with the ports, applications, and OS fingerprints found on each host.

Structural Characteristics and Classification

Using the I,Sushi algorithm, a recursive tokenization approach based on dictionary compression structural characteristics across a number of hosts can be infered. An idea down the road using this method is to build classification systems loosely based on both KNN classification and compression. The compression method in classification systems is slightly documented in the paper, Language Trees and Zipping. The basic premise is that given a series of classes, each associated with a characteristic compression dictionary, a nearest neighbor match or a maximum likelyhood match against a class, is the class that provides the greatest compression against the document being classified.

Structural Analysis of NMAP Scan Results

Anyhow, a number of hosts were scanned with NMAP. The exact commandline was:

nmap -vv -O -sV -oA scanlog 68.127.155.0/24

After the scan completed, the resulting logfile was processed with NMAP Magic, an implementation of the I,Sushi algorithm designed specifically for NMAP. NMAP Magic can be found on the Small Code page.

NMAP Magic generates newick trees. The Newick Tree format is a simple tree format in which trees are represented by tokens recursively imbedded in parentheses. These tree files can be viewed using TreeView. TreeView is an application typically used by computational biologists to study phylogenetic trees used to visualize evolutionary speciation. TreeView and newick trees were just the simplest and quickest format to code for.

Structural Example

Newick.gif


What's going on here is that siblings are related to each other because their co-occurance is a common characteristic across all of the scan reports. For example, the entire scan rage consisted of SBC DSL hosts, therefore the port tuple of (135,139,445,1025) was seen as filtered against every host scanned. The I,Sushi algorithm was able to infer this structural characteristic and thus it now becomes a frequently occuring subtree in all of the newick trees.

Scan Data

Here's a whole bunch more of the newick trees from the same scan and the actual NMAP scan report.

Personal tools