The dataset can be found here.
The NFS clients are located on the same 1 Gbps LAN with NFS client-side caching enabled. The caching effects across multiple experiments were eliminated by mounting and unmounting the file system between each experiment. We capture the NFS packet trace at the NFS server machine's network interface using the Wireshark tool, and filter out the data portion of the NFS operations. For all experiments in this work, we only use the opcode information in the NFS trace.
The data is organised into two parts, (1) the complete dataset and (2) the training data.
- In the complete dataset, traces generated by each particular workload used is stored in a folder named by that workload, for example, for 'grep', the complete dataset is kept in a folder named 'grepActual'. This folder contains two subfolders, viz., NFSCapture and RawCapture. 'RawCapture' contains the traces collected using wireshark on running a particular workload. 'NFSCapture' contains only the NFS pckets filtered out of these 'RawCapture'ed traces.
- The training Data contains the traces used for training the profile HMMs. Again, one folder per workload contains respective training traces.
Source CodeThe basic scripts that were used are available here. These can give a fair idea of how to apply profile HMM to the task of discovering application workloads from the Network File Traces. These are surely not ready to use scripts. The readMe is contained explaining the usage.