Tallinn University of Technology

At the same time as more and more servers consume increasing amounts of energy to store an ever-growing volume of data, another group is working to collect and store only meaningful information that is truly useful. Eduard Petlenkov belongs to this latter group and shares his thoughts and experiences about the real value of data and numbers. In March, the focus of the “Sustainability Months” is digital cleanliness and e-waste, and together we are looking for ways to make more sustainable choices in our electronic lives.

What principles do you follow when managing data volume and digital space, both in your personal work and in the design of systems or solutions?

Eduard Petlenkov, foto: TalTech
Eduard Petlenkov, photo: TalTech

My work is very closely related to data. Often a strange situation arises where there are many “numbers,” but very little real, meaningful data.

It is important to store the right data and in the right volume. Let me give a simple example—if you store data about a room’s temperature, certain conclusions can be drawn from it. However, these data are of little use if related information is missing: how many people were in the room, how the heating, cooling, and ventilation systems were operating, what the weather was like, and so on.

On the other hand, I have seen such data being stored at one-second intervals, which is not reasonable considering the actual dynamics of the process. The frequency of data storage must correspond to the speed of the specific process.

What have you gained by preferring smaller data volumes and thoughtful storage?

The problem is not only that processing an unjustifiably large volume of data takes a lot of time and resources. Excess data also begins to interfere in a substantive way and reduces the quality of the results.

Systems based on machine learning and artificial intelligence may start learning from noise and making decisions based on irrelevant features. As a result, the reliability and trustworthiness of the systems suffer. Just as with people—the easiest way to hide important information is to mix it with a huge amount of irrelevant news.

Data accumulation is often justified with the idea that “it might be useful someday.” How do you decide what is worth storing and what is not?

In a way, there is some truth in that line of thinking—some data may indeed become useful in the future. For example, the transition to a 15-minute energy market immediately created a need to model processes more precisely and react faster. This, in turn, means that higher-frequency data are needed.

At the same time, such changes do not happen overnight. Research and development must look ahead. When collecting data, conscious choices must be made, considering what technology will require and enable within the next 2–3 years.

Give one principle or practical recommendation to those who solve growing data volumes simply by buying more cloud space.

First, technological capabilities should be created to measure and collect the necessary data at the right time and in the right volume—for example, Internet of Things solutions, fast communication channels, and real-time data processing.

Increasing data volume should not be a goal in itself; the goal should be the availability of high-quality data.