A Real World Introduction to Information Entropy


I’ve been using IPython notebook so much that it might finally be time to stand up a Pelican based site on this server in order to utilize Jake Vanderplas’ IPython integration method. This post might be my last nbviewer.org iframe crime against proper web design principles.

The intent of this post is to generally explore information entropy applied to a toy problem in network security. I’ll outline a common problem and the basic concepts of entropy then show a practical implementation using the the Kullback-Leibler divergence and the Python data stack.

In network security the latest malware botnet threat paradigm utilizes peer-to-peer (P2P) communication methods and domain generating algorithms (DGAs). This method avoids any single point of failure and evades many countermeasures as the command and control framework is embedded in the botnets themselves instead of the outdated paradigm of relying on external servers.

A potential method of minimizing the impact of these threats is imploying a profiler that detects attributes consistent with DGA and P2P.

Header image from http://archive.wired.com/magazine/2010/11/pl_decode_pachinko/all/, google “pachinko entropy” for some interesting links.