Post

Logging and Log Management

Logging and Log Management

Log Data

Log data is the intrinsic meaning of a log message. In other words, log data is the information extracted from a log message that tells you why that log message was generated. For example, a web server typically logs whenever someone accesses a resource (image, file, etc.) on a web page. If a user accesses an authenticated page, the log message may include the username. This is an example of log data: you can use the username to determine who accessed a particular resource.
Log messages can generally be categorized into the following common types:

Debug

Debug messages are typically generated by software systems to assist developers in troubleshooting and identifying issues while the application code is running. At this log level, the printed information is more detailed and usually related to variables and system addresses. Examples include variable values used during execution, pointer addresses, and information related to logical processing flows in the source code.

Information

Information-level messages are intended to inform users and administrators that a harmless event has occurred. At this level, the logged information often relates to system configuration or interactions between the Wi-Fi Mesh system and other network components. For example, when a STA connects to the system, this log level may be used to print information such as MAC address, connection type, and the connection standards supported by that STA.

Warning

Warning messages are associated with situations where the system may be missing or requiring something, but the absence does not affect system operation. At this level, warnings usually capture events happening during system operation, such as physical port disconnects/reconnects, packet loss and retransmission, or, in a Wi-Fi Mesh system, 1905 devices leaving or rejoining the network.

Evaluating when to use this log level requires developers to understand the system’s entire operational flow. Although warnings do not immediately cause system failure, they often indicate early signs of larger issues that may arise. For example, a program missing some optional command-line arguments but still running normally may log the issue as a warning for users or operators.

Error

Error logs are used to report failures occurring at various layers within a computer system. For example, an operating system may generate an error log if it cannot synchronize the buffer with the disk. Unfortunately, many error messages only provide a starting point as to why the issue occurred. Further investigation is usually required to find the root cause.

Other examples include null pointers, out-of-range variable values, corrupted data, or behaviors that deviate from system design. All such events should be logged at the error level.

Alert

Alert messages indicate that something critical has occurred. In general, they relate to the survival or operational continuity of the system. For example, if initial configuration data for important daemons cannot be retrieved, the system will not operate correctly and must log an alert immediately. Developers typically do not expect many messages at this level.

How Log Data Is Transmitted and Collected

Conceptually, transmitting and collecting log data is straightforward. A computer or device implements a logging subsystem that can generate log messages whenever necessary. The exact conditions for generating logs depend on the device—some allow configuration, while others have predefined logging rules. In addition, there must be a central place to send and collect logs, typically called a *loghost*. A loghost is a Unix or Windows system where log messages from multiple devices are centrally collected.
Benefits of using a centralized logging system include:
  • A central location to store log messages from multiple sources
  • A location to store log backups
  • A place where log data can be analyzed

What Is a Log Message?

A log message is generated by a device or system to indicate that something has happened.
The typical structure of a log message includes:
  • Timestamp
  • Source
  • Data

Timestamp

This is the first piece of information printed when a log message is generated. It represents the exact moment when the system detected the event. It usually includes the year, month, day and time. Some issues may arise if the system clock is not yet synchronized—for example, when the device boots without Internet access—possibly leading to incorrect interpretations of when the issue occurred.

Source

The source typically includes the name of the running process that printed the message, along with the log level. Some systems also include the function name and line number where the log was generated. These details are often useful during development but are usually removed in production deployments to reduce log volume and prevent leakage of internal source-code information.

Data

What information should be printed is extremely important—you cannot print messages that are too short or too verbose. Developers should remove useless logs and ensure meaningful content is logged. Each log level requires different types of information depending on its purpose. For example, logging everything at the error level is counterproductive if you want to detect actual system failures. Proper logging principles—discussed later—must be followed to improve the quality of logs.

Log Retention Policy

A log retention policy establishes the foundation for determining storage requirements. The policy your organization adopts will guide decisions regarding storage type, size, cost, access speed, and deletion requirements.
The following aspects should be considered when creating a log retention policy:

Review Current Compliance Requirements

Many industries today enforce strict compliance requirements. For example, the Payment Card Industry Data Security Standard (PCI DSS) requires one-year log retention (PCI DSS section 10.7), while North American Electric Reliability Corporation (NERC) regulations specify different retention durations for various log types. Some regulations require specific logs to be kept without specifying retention time. These guidelines help establish minimum policy requirements.

Assess Organizational Risk

Internal and external risks determine log retention durations in different parts of your network. If logs are used to investigate insider threats, retention periods must be longer because such issues often go unnoticed for years. When eventually discovered, investigating them may require extensive historical data.

Consider Different Log Sources and Their Volumes

Firewalls, servers, databases, web proxies—each produces different log types, sizes, and volumes. For instance, logs from a core firewall may be extremely large and therefore only stored for 30 days unless compliance requirements mandate a longer duration. You may also receive logs from custom applications or unsupported operating systems for which analysis tools may be limited.

Review Available Storage Options

Log storage options include disk, DVD, WORM, tape, RDBMS, dedicated log archives, and cloud storage. Decisions depend on cost, capacity, and access speed. Importantly, stored logs must remain accessible within a reasonable time. Tape is inexpensive but notoriously slow for searching and may require manual handling. Media lifespan must also be considered—for example, seven years may be too long for cheap writable CDs or DVDs. Older storage formats may also become obsolete.

Principles of Log Management

Collection Rule

Do not collect log data that you do not intend to use. Only collect logs for clearly defined purposes (troubleshooting, security analysis, etc.). Avoid the mindset of “collecting just in case.” The same applies to generating logs: do not create logs that have no use.
  • Collect logs only for clear reasons
  • Avoid “just in case” logging
  • Do not generate logs that you will never use

Retention Rule

Keep logs only as long as they remain useful or longer if required by law.
  • Potential future usefulness
  • Legal or regulatory requirements
  • Do not store every log for years if unnecessary

Monitoring Rule

Log as much as needed, but alert only when action is required. Logging volume can be massive (petabytes), but human capacity to react is limited (perhaps 12 alerts/day?). Avoid overwhelming monitoring teams with unnecessary alerts.

The philosophy: **log everything, store important error logs, and monitor actionable problems.**

Security Rule

Do not invest more in protecting log data than you do in protecting critical business data. Logs are valuable, but they should not be more protected than trade secrets or sensitive business information.
  • Hash logs to prevent tampering
  • Apply reasonable access control
  • Avoid encrypting logs unless absolutely necessary

Continuous Change Rule

Log sources, log types, and log content constantly evolve due to changes in development and deployment environments.
  • Periodically review log policies and procedures
  • Update log collection and log-reading systems
  • Document log generation processes and maintain them
This post is licensed under CC BY 4.0 by the author.