Gain experience using a Graph on real life data. Learn about network communication and how to analyze network traces using a Library. Gain experience using the graph algorithms for connected components and minimum spanning tree.
Story: You are the NSA and are responsible with knowing what everyone is talking about. You are faced with budget restrictions and need to minimize the number of computers to infiltrate. You have access to metadata about all communication so you can build a graph of what machines are talking to each other. With this you can create a list of target computers that you want to gain access to. You have access to the spanning port on various routers and have packet capture (PCAP) dump files of traffic.
1. Read in a packet trace provided using the pcap4j library and build a graph based on the communication using GraphStream. A node represents each IP address. Each edge is created as the result of a bidirectional communication between two nodes (One way only is ignored).The edge weight is based on the number of packets between those nodes. Use the attribute name “weight” on the edge to store the weight value. If a node does not have an edge then do not add it to the graph. You can use a HashMap to keep track of what communications exist for the first pass over the data. Then you can read the packets again and lookup in the HashMap if the edges exist both ways before you add or increment the edge in the graph.
Read in multiple network traces (PCAP files) and find one that gives you multiple connected components. Although you can limit the number of packets read to a low value for testing, the version you turn in should read in all packets.
2. Find the connected components of the graph using BFS or DFS. Do not use the GraphStream algorithm for this, write it yourself.
3. For each connected component compute the Maximum Spanning Tree using Prim or Kruskal’s algorithm and negating the weights. This will give you a tree with a new degree for each node. Use the Java PriorityQueue for this (You can just remove and add elements to update them) and implement Prim or Kruskal’s algorithm yourself.
4. Print out the top 3 nodes in each connected component ordered by the degree of each node. Also order the connected components in descending order.
CC:0, 45 Nodes Total =Without Maximum Spanning Tree: 172.16.117.132,58 22.214.171.124,2 126.96.36.199,2 =With Maximum Spanning Tree: 172.16.117.132,5 188.8.131.52,2 184.108.40.206,1 CC:1, 23 Nodes Total =Without Maximum Spanning Tree: 172.16.112.50,11 220.127.116.11,3 18.104.22.168,2 =With Maximum Spanning Tree: 22.214.171.124,3 172.16.112.50,1 126.96.36.199,1
Download the inside or outside tcpdump PCAP files from the 1999 DARPA intrusion detection evaluation dataset website. The file extension is
.tcpdump but they are PCAP files.
Download the snort log PCAP files from SPARSA Information Security Talent Search dataset website. The file extensions are not
.pcap but they are PCAP files.
Make your own PCAP file using your network card:
$sudo tcpdump -w mycap.pcap
Libraries and Code Examples
To read PCAP files we will use the pcap4j Java library. This library also allows for live captures from a network card. We will only be using it to decode IPv4 packets from PCAP files. (Note: If you are using Windows then you must install WinPcap for this library to work)
To build a graph and visualize it we will use the GraphStream library. This library makes it easy to model data because nodes and edges are identified by strings so maintaining references to nodes are not required. It uses OpenGL to visualize graphs in an animated way.
You can find the jar files for these libraries as well as sample code and scripts to run your program from the command line here: https://github.com/ieee8023/cs310-summer2015
The scripts “compile.sh”, “getclasspath.sh”, and “run.sh” are used to run your code on the command line. Edit run.sh to point to Class with the main method you want to run.
Sample Input/Output Benchmark
Here is a close up of the largest connected component with it’s maximum spanning tree colored in blue.
The output for the sample file (outside.tcpdump-small) provided on github should be:
$sh run.sh outside.tcpdump-small CC:0, 23 Nodes Total =Without Maximum Spanning Tree: 188.8.131.52,13 172.16.113.105,6 184.108.40.206,4 =With Maximum Spanning Tree (Weight:2068): 220.127.116.11,13 172.16.114.207,4 172.16.113.105,3 CC:1, 5 Nodes Total =Without Maximum Spanning Tree: 192.168.1.10,4 172.16.112.20,1 192.168.1.20,1 =With Maximum Spanning Tree (Weight:158): 192.168.1.10,4 172.16.112.20,1 192.168.1.20,1
Here is the output for the file (1999 DARPA intrusion detection evaluation outside.tcpdump from Monday). Here is a close up of the largest connected component with it’s maximum spanning tree colored in blue.
$sh run.sh outside.tcpdump CC:0, 840 Nodes Total =Without Maximum Spanning Tree: 172.16.112.207,174 172.16.117.132,160 172.16.113.105,155 =With Maximum Spanning Tree (Weight:969696): 172.16.112.207,97 172.16.117.132,91 172.16.115.87,80 CC:1, 2 Nodes Total =Without Maximum Spanning Tree: 192.168.1.2,1 192.168.1.1,1 =With Maximum Spanning Tree (Weight:264): 192.168.1.2,1 192.168.1.1,1
Here is the output for the file SPARSA Information Security Talent Search snort.log.1425771020. Some packets were unreadable and skipped.
$sh run.sh snort.log.1425771020 CC:0, 43 Nodes Total =Without Maximum Spanning Tree: 10.0.1.101,15 10.1.2.45,6 10.0.1.112,6 =With Maximum Spanning Tree (Weight:45601): 10.0.1.101,14 10.1.2.45,6 10.0.1.112,6 CC:1, 15 Nodes Total =Without Maximum Spanning Tree: 10.128.0.17,5 10.128.0.14,5 18.104.22.168,4 =With Maximum Spanning Tree (Weight:1932): 10.128.0.14,5 10.128.0.17,4 22.214.171.124,4 CC:2, 8 Nodes Total =Without Maximum Spanning Tree: 10.2.1.80,6 10.2.10.50,5 126.96.36.199,2 =With Maximum Spanning Tree (Weight:590): 10.2.1.80,5 10.2.10.50,2 188.8.131.52,2 CC:3, 7 Nodes Total =Without Maximum Spanning Tree: 10.1.2.40,6 184.108.40.206,1 220.127.116.11,1 =With Maximum Spanning Tree (Weight:16): 10.1.2.40,6 18.104.22.168,1 22.214.171.124,1 CC:4, 3 Nodes Total =Without Maximum Spanning Tree: 10.1.2.48,2 126.96.36.199,1 188.8.131.52,1 =With Maximum Spanning Tree (Weight:10): 10.1.2.48,2 184.108.40.206,1 220.127.116.11,1 CC:5, 3 Nodes Total =Without Maximum Spanning Tree: 10.1.2.12,2 18.104.22.168,1 10.2.5.30,1 =With Maximum Spanning Tree (Weight:10): 10.1.2.12,2 22.214.171.124,1 10.2.5.30,1 CC:6, 3 Nodes Total =Without Maximum Spanning Tree: 10.0.1.218,2 126.96.36.199,1 188.8.131.52,1 =With Maximum Spanning Tree (Weight:12): 10.0.1.218,2 184.108.40.206,1 220.127.116.11,1 CC:7, 3 Nodes Total =Without Maximum Spanning Tree: 10.1.2.57,2 18.104.22.168,1 22.214.171.124,1 =With Maximum Spanning Tree (Weight:4): 10.1.2.57,2 126.96.36.199,1 188.8.131.52,1 CC:8, 3 Nodes Total =Without Maximum Spanning Tree: 10.128.0.8,2 184.108.40.206,1 220.127.116.11,1 =With Maximum Spanning Tree (Weight:10): 10.128.0.8,2 18.104.22.168,1 22.214.171.124,1 CC:9, 3 Nodes Total =Without Maximum Spanning Tree: 10.128.0.1,2 10.2.12.80,1 10.2.8.80,1 =With Maximum Spanning Tree (Weight:68): 10.128.0.1,2 10.2.12.80,1 10.2.8.80,1 CC:10, 2 Nodes Total =Without Maximum Spanning Tree: 126.96.36.199,1 10.0.1.4,1 =With Maximum Spanning Tree (Weight:10): 188.8.131.52,1 10.0.1.4,1 CC:11, 2 Nodes Total =Without Maximum Spanning Tree: 10.0.1.6,1 10.2.3.10,1 =With Maximum Spanning Tree (Weight:8): 10.0.1.6,1 10.2.3.10,1 CC:12, 2 Nodes Total =Without Maximum Spanning Tree: 10.128.0.21,1 10.2.12.20,1 =With Maximum Spanning Tree (Weight:56): 10.128.0.21,1 10.2.12.20,1 CC:13, 2 Nodes Total =Without Maximum Spanning Tree: 10.2.5.50,1 10.2.2.30,1 =With Maximum Spanning Tree (Weight:34): 10.2.5.50,1 10.2.2.30,1
Put your code in a folder called ‘project1’ on the UNIX system. Your code should take a PCAP file name as the first command line argument to the run.sh script.
$sh run.sh sometestfile.pcap
Also write a memo.txt discussing the challenges and what you learned on this assignment. Also, answer the following questions:
1. What do the high degree nodes represent when looking at the connected components?
2. What do the high degree nodes represent when looking at the maximum spanning trees of the connected components?
Grading (total 25 points):
Due: 6/15 @5pm.
6 points: Build undirected graph from PCAP data.
7 points: Compute connected components with own implementation of BFS or DFS.
7 points: Compute maximum spanning tree with own implementation of Prim or Kruskal’s algorithm.
5 points: memo.txt and how easy the assignment is to grade
5 points: Apply clustering to the graph to better find central nodes. (Added: Submit code as well as visualize the clusters with different colors and submit a screen shot)