Post

Research checkpoint #06: checking the performance of a Raspberry Pi handling ML models

Research checkpoint #06: checking the performance of a Raspberry Pi handling ML models

In the last post, I proposed a division of the network structure that will help guiding us through what we want to study on this project. In this first post after that one, I decided to conduct simple tests using the baseline models proposed in post 02 to verify the performance of a Raspberry Pi, with the intention of simulating an IoT device. This is a heavier work than we plan to tackle on IoT devices, as they will only perform inference on the packets and won’t participate in the training phase, however these tests will still give us a hint of what to expect of the performance of this type of computer.

Testing the baseline models on a Raspberry Pi

In post 02 I proposed the use of two baseline models: logistic regression and Hoeffding Tree, inspired by João Gabriel Josephik work. At that time, I didn’t had a close look to the CIC-IoT-2023 dataset, which we plan to use right now, so I decided to use a simple dataset to test these models: the Pima Indians Diabetes Database. To verify the performance of a device close to an IoT one, I decided to train and test these models on a Raspberry Pi, which have the follow specifications:

  • Raspberry Pi Model 3 B
    • Quad Core 1.2GHz Broadcom BCM2837 64bit CPU
    • 1GB RAM
    • Debian GNU/Linux 12 (bookworm) OS

For comparison, I decided also to run the tests on my own notebook, which have these specifications:

  • Acer Aspire 3 A315-41-R4RB
    • AMD Ryzen 5 2500U 2.0GHz 64bit CPU
    • 12GB DDR4 2667 MHz RAM
    • Fedora Linux 41 (Silverblue) OS

To measure the time I used the command time with the default arguments and took note of the real elapsed time. To measure the peak of memory use I used the resource module from Python, which can measure the maximum resident set size used by the process, that is, “[…] the maximum number of kilobytes of physical memory that processes used simultaneously” according to the manual. All the tests were run 10 times and the reported values below are the average of all measurements taken.

For the first test, I converted this Jupyter Notebook from post 02 to a Python script to simplify the measurements and to avoid the need of a browser (which could affect the performance of the tests). The results running a logistic regression model with the Pima Indians Diabetes dataset can be found on this table:

DeviceAverage timePeak of memory use
Raspberry6.0306s142597KB (~142MB)
Notebook3.8677s140877KB (~140MB)

Table 01: average time and peak of memory use running a logistic regression model with the Pima Indians Diabetes dataset

Similarly, I also converted this Jupyter Notebook to a Python script to run a Hoeffding Tree model with the Pima Indians Diabetes dataset, obtaining the results below:

DeviceAverage timePeak of memory use
Raspberry7.8549s144902KB (~144MB)
Notebook4.3849s143285KB (~143MB)

Table 02: average time and peak of memory use running a Hoeffding Tree model with the Pima Indians Diabetes dataset

For a dataset which have only 768 entries and 8 features (plus the label one), I got surprised with the use of memory in both of the tests above. Notice that the .csv file used has only ~24KB (0,024MB), substantially smaller than the observed numbers. The average times of execution, however, were good, but it’s still important to consider that this is a really simple dataset, and we aim to perform inference on a network that is continually providing data to our device. As a next step (which can be found in the end of this post), I still plan to test this Raspberry sniffing the network, but for now the results above give us hints that this device is very limited.

Testing with the CIC-IoT-2023 dataset

The CIC-IoT-2023 dataset repository also provides a Jupyter Notebook which is an example of how the data can be used to train a logistic regression model. As we are using this model as baseline, I decided to convert this notebook to a Python script (which can be found here) and, with some adjusts to the code to make it easier to read and modify (and also fixing some errors that appeared on the process), I ran it with the first four files of the MERGED_CSV ones that were made available by the dataset authors, which are result of all the processing described in post 03. These files together sum up to 561Mb, an average of 140Mb each one.

DeviceAverage timePeak of memory use
Raspberry--
Notebook1m3.5792s1418037KB (~1.41GB)

Table 03: average time and peak of memory use running a logistic regression model with the CIC-IoT-2023 dataset

During the process of conducting these tests, a problem was observed: the Raspberry wasn’t capable to handle all of this data. Even using just one file, the memory required to run the model was highly above the available memory of the Raspberry (which has only 1GB for all the system). So, trying to run it made the device simply freeze, making it impossible to measure the performance on this test case.

I decided also to modify the script above to use the Hoeffding Tree model, to get some hints of this model performance on the dataset we aim to use for our project. The Python script for this test can be found here.

DeviceAverage timePeak of memory use
Raspberry--
Notebook32m30.2128s1865733KB (~1.86GB)

Table 04: average time and peak of memory use running a Hoeffding Tree model with the CIC-IoT-2023 dataset

The same problem happened with the Raspberry. But leaving this aside for a moment, notice how the average time exploded on this test case! For the same 561Mb of data, this model used almost 30x the average time required by the logistic regression one, indicating that its performance is significantly lower than this other one. It’s hard to make such assumption without more tests, considering that the structures of the models are very different (the Hoeffding Tree is designed for stream learning, not following the idea of train and test in separated phases), but this is something to keep in mind. Our goal is to work with Neural Networks at some point, how will they perform?

A final observation: the example Jupyter Notebook provided by the authors of the dataset has implementations to perform inference on 34 classes (all the different attacks), 8 classes (only on different types of attacks) and in 2 classes (benign and malign traffic). As we are only concerned with distinguishing benign from malign traffic, I decided to use only the last one implementation for the tests above. However, I noticed that the performance on this implementation was much better than the one achieved with more classes: with 34 classes the implementation of the logistic regression model resulted in 75% of accuracy and with 2 classes it result in 98% of accuracy. Taking a closer look at the dataset, I think there is one major topic of concern related to these metrics: the proportion of the benign and malign data. In the four files I used for the tests, discarding a few problematic entries, there is 66392 lines classified as benign observations and 2768346 classified as malign, that is, only about 2.34% of the data is classified as benign. This is an issue also faced in the thesis published by Kétly Gonçalves which was cited in the last post. It’s natural to observe more malign data in a dataset designed to provide data related to network attacks, since DDoS attacks are based on packet volume for example, but on a normal day the majority of the traffic that an IDS will handle with is benign, so this unbalance of the data could be a real problem in the future.

Next steps

Based on the division of the data processing proposed in the last post, the next steps for this project are:

  • Packet Parsing
    • Conduct experiments to verify the performance of Python libraries sniffing the network and processing the packages in a Raspberry Pi (different from what we did in post 4, in which we tested the lib’s with the CIC-IoT-2023 dataset and in a regular computer).
  • Inference
    • Run the baseline models proposed in post 02 in a Raspberry Pi to verify its performance.
  • Training (centralized IDS)
    • TBD.

This post was made as a record of the progress of the research project “DDoS Detection in the Internet of Things using Machine Learning Methods with Retraining”, supervised by professor Daniel Batista, BCC - IME - USP. Project supported by the São Paulo Research Foundation (FAPESP), process nº 2024/10240-3. The opinions, hypothesis and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of FAPESP.

This post is licensed under CC BY 4.0 by the author.