Week6

First, we practice some Kibana queries based on the weekly data. The data is stitched day-wise, and only flows greater than 100K were imported. However, when I try the queries showing the flows greater than 100M, there is no result, and we find out that only part of the data is imported, and the type for “input_type” is string again, not a numeric type, which results in not able to aggregate and using “greater than” query. The graduate students are working on fixing it. So it’s still unsure if we have data to use to complete our tasks.

We also learn how to use elastic search through command line, which is much more than regular ones. If we want to search and aggregate flows, there will be very long lines.

The data was ready Wednesday night—- it was stitched on a daily basis, it was clean, the variable type was corrected and imported to elk so I spent my Thursday morning to complete all the tasks.

First, using Kibana query to search for large flows, and display by input bytes with descending orders. Then, using 5-tuple, source and destination ip addresses and ports and protocol, to find the corresponding unstitched flows. The flow profile should visualize how the speed (bits per second, packets per second) of single flows changes with the time.

I did this by two ways. First, manually creating a charts by choosing a smaller flow (10-minute long) combined by 21 unstitched flows, and I used nfdump to get a extended format with speeds information, and create the charts with ascending time-stamp.

An issue we found during our presentation is that the unstiched flows have overlapping time, and if we separate them by flag (one is A, the other is AP), then we could get continues flows. In TCP, flows are sent when the data packet is filled, A is the regular one, and AP means it pushes, sending the flow before the data is filled. In the future we are going to figure out do we need to count those flows with AP flag, or omit them.

The other way is to get a csv file of the flows and imported into elk stack, then using Kibana to visualize. A problem I didn’t solve was that the nfdump csv format does not include speed information, so when creating the charts, y-axis should be bytes/duration, but I didn’t figure out how to do that. Instead I just use bytes, however, the flow I used is the biggest flow and the duration of each unstitched flows are pretty much the same (59-60s), so the charts could also present how the speed varies.