Network Incident Management
Incident management is the process of minimizing the overall impact of an incident by restoring full functionality as quickly as possible. From a network standpoint, an incident can be an unforeseen network disruption, an inconsistency in the quality of service (like fluctuating bandwidth), or an event that may impact service to the user or customer in the future.
Types of Incidents
Hardware
Network devices can go down, or experience slowness or an outage. Critical hardware like servers, CPUs, routers, monitors, and printers are all prone to outages.
Software
Incidents caused by software bugs, misconfigurations, or compatibility issues can lead to service disruptions or abnormal behavior in network applications or services
Security
Incidents related to security are active and potential threats to the network, which can lead to a data breach and compromise the entire infrastructure.
Network
At the network level, incidents can happen relevant to protocols, critical network devices, or other infrastructure components that are integral to normal network functioning. Examples are incidents affecting DHCP, VPNs, IP addresses, the DNS, and so on.
Infrastructure Failure
This includes failures or malfunctions in network devices such as routers, switches, firewalls, or servers, which can result in connectivity issues or service degradation.
Human Error
Mistakes or misconfigurations made by network administrators or users can result in incidents such as accidental service outages, data loss, or security vulnerabilities.
Network Incident Classification
L1 (Level 1) incident
L2 (Level 2) incident
L3 (Level 3) incident
Incidents that fall under this category are those that happen in higher volumes but are also quickly resolvable. IT operations personnel choose to automate the majority of L1 tasks so they can focus on resolving more critical incidents.
L2 incidents are more complex issues that can disrupt the network and put a roadblock on its smooth functioning. L2 incidents hence require involvement of skilled staff with specific knowledge in the area.
L3 incidents are issues that happen on a larger scale in the network. Major incidents like these rarely happen, but when they do, the damage they can cause to the infrastructure is huge. L3 incidents require expertise and coordination, which is why they need the attention of personnel with significant specialization in the area.