editione1.0.0
Updated October 9, 2023Incident response is a well-established practice in the technology space and there has been a lot written about it. This introduction gives you a high-level overview of how incident response processes work and the typical actions and considerations that are associated with every stage.
The first thing to note is that for the most part, incident response is not linear. An incident response is a triggered process that will loop between a number of stages until all evidence and impact of the incident is resolved.
Figure: The stages of incident response.
The process itself is typically made up of four stages of action:
Identification
Verification
Containment
Remediation
During the Identification stage, an incident has been identified via one of the identified information sources. This information is passed to a first line responder, who triggers the incident response plan.
Task | Owner | Output |
---|---|---|
Initiate logging and timeline. Start the record for the incident. Note the nature and content of information received/identified in the Security Channel. | Initial Responder | Documented audit trail in the security channel |
Verification of information Source. Where the information leading to the incident acknowledgment was received from outside the organization, it is important to review the source for credibility, agenda, and risk. | Initial Responder | Verification activities and findings noted in the security channel |
High-level triage. Before an incident can be confirmed, a basic assessment should be made. This aims to eliminate known false positives and confirm reported or suspected issues. Triage will vary by incident type. | Initial Responder Support | Triage notes in security channel |
Initiate incident. Response Create a channel for the incident within Slack. Notify the Security channel of this new channel and ask conversation to be moved to the incident specific space. | Incident Responder Incident Lead | Creation of new incident specific document or communications log. |
(Optional) Activate on call. If the incident has occurred outside of normal working hours, the on-call system should be used to contact and activate on-call staff. | Incident Responder Incident Lead | On-call staff available to respond |
Allocate roles. Assign incident lead, deputy, and communications lead roles. Notify other named parties with incident responsibilities (see Roles and Responsibilities) | Incident Lead | List of allocated roles and contact details in the incident security channel. |
Classification of the incident. Using the classification guidance in this document, classify the issue. Peer review this decision with another member of the incident response team. | Incident Lead | Classification of incidents made and documented in the incident security channel. |
(High severity or above) Executive briefing. Where an incident is of high severity or highly public in nature, a brief should be given to the executive team. They may have questions or concerns that should be addressed. The communications lead or executive liaison should act as the ongoing mediator with this group. | Incident Lead Comms Lead | A concise executive summary of the incident and its status delivered to the executive team and stored in the incident specific security channel. |
Incident response briefing. Initial responder to brief the incident team and answer any initial questions. This makes the end of the active responsibility for the initial responder (unless they have been assigned the lead or deputy role). | All Incident Team | Meeting held with the incident team. Team briefed and if appropriate, the initial responder was relieved of duty. Minutes of meeting documented in incident specific security channel |
(Optional) Update public status page. If the incident is directly affecting customers or public facing systems, an appropriate update should be made on status page or update channels. External messages should be QA’d by the incident lead and a member of the senior leadership. | Comms Lead Incident Lead | Update to status page mechanisms where appropriate. |
Before the incident is a confirmed issue, the accuracy and extent of the issues must be verified. This stage of incident response is focused on the confirmation of the issue and clarification of the scope or extent to which it affects your company, its systems, and users.
Verification includes the identification of the issue across multiple data sources and the reproduction of any suspicious performance behavior in a controlled manner (by organizational staff or on organizational equipment). Even if the verification process flags this incident as a false alarm or inaccurate, it should still be documented.
Task | Owner | Output |
---|---|---|
Identify affected customers and systems. It is crucial that the extent of the incident is understood and recorded. Where appropriate this should include a breakdown of customers affected or systems/hosts at risk | Incident Lead Deputy | List of affected systems or customers in incident specific security channel |
Access and monitor all logs for the affected accounts or systems. (Optional) Where relevant or appropriate, increase logging levels to ensure sufficient granularity. | Deputy | Updates and findings in incident specific security channel |
Establish a timeline of events. Record all findings and investigative paths in the Incident Security Channel. | Scribe | Updates and findings in incident specific security channel |
Reproduce issue on the non-production environment. For issues that are caused by specific bugs or actions, these must be tested and documented. | Deputy | Updates and findings in incident specific security channel |
Identify other potential issue areas. Where an issue is caused by a specific bug or action, extend testing to all associated use cases or similar interaction points where possible. | Deputy | Updates and findings in incident specific security channel |
Investigate root cause or sequence of events leading to incident. Where time allows, ensure that the issue being investigated is the root cause of the issue and not the side effect of another more serious issue. This will require cross log investigation and timeline analysis. | Deputy | Updates and findings in incident specific security channel |
Confirm issue across account types, geographic location, etc. (the scope of the incident). It is crucial that the full scope or extent of the issue is understood. For platform or system issues that are public facing, this includes running out of privilege and geographic distinctions. Test assumptions and systems from both inside and outside organizational networks to avoid testing environment bias. | Deputy | Updates and findings in incident specific security channel |
Once identified and confirmed, the issue should be contained such that its impact on your systems and customers can be limited. Where possible, affected systems should be isolated from healthy systems. This may include preventative account suspension, removal from networks, or password reset activities if an account has been compromised.
All containment activities should be documented as part of the incident log and implications of said containment communicated to affected stakeholders.
danger Containment steps are very specific to the individual incident and scenario type. The following are generic steps and should be used as a guideline but not a comprehensive and complete approach.
Task | Owner | Output |
---|---|---|
Initiate customer contact. Where customers are affected, directly contact each customer. Contact should aim to reassure and acknowledge rather than provide technical detail. Required actions must be well tested inside the organization before external communications are sent. | Comms Lead | Customer contact drafts and actual messages |
Isolate compromised host(s). Where a host is assumed compromised, remove it from the network wherever possible or lockdown ingress and egress to a single controlled IP. Avoid powering down or restarting the host until an image or snapshot can be made. | Incident Lead | List of compromised hosts plus results from checking the isolation is successful |
Suspend compromised account(s). Where an account has (or is suspected to have) been compromised, it should be suspended. Suspension should aim to preserve all access or event logs for the account. Where the account is central to core operations, this should be reflected in the incident severity and classification. A decision must be made as to whether the account can be suspended safely without disrupting availability. | Incident Lead | Suspended account list and access to the relevant access and event logs for said accounts |
Seize relevant hardware or equipment. Where hardware such as laptops are believed to be the cause of or affected by an incident, they should be taken by the incident team for investigation and eventual remediation. Temporary clean devices may be issued as an interim solution, however, these should provide the minimum to get the job done and be replaced once the incident is resolved. | Incident Lead | Seized hardware list including asset tag and assigned owner |
Once contained, the issue must be remediated. This stage may vary in length and complexity based on the incident. If dealing with a security issue or an issue involving complex or legacy systems, consultation with domain experts is strongly recommended.
Changes made during the remediation phase should be undertaken in a controlled and documented manner, ensuring that each change is tested before the next is applied. Chaotic or uncontrolled changes increase the likelihood of introducing additional issues into the system or hiding potentially simple solutions.
Remediation can only be deemed successful once the verification step has been repeated and end-to-end tests have been conducted. For vulnerabilities outside of your company’s control, this might include following security news feeds, running available check tools, and increasing monitoring for the duration of the issue.
Verification, containment, and remediation will continue as a repeating loop until all the issues have been addressed and systems behavior has been returned to normal.
danger Remediation steps are very specific to the individual incident and scenario type. The following are generic steps and should be used as a guideline but not a comprehensive and complete approach. As always, if you are unsure on how to proceed or don’t have the skills in your team, reach out to professionals for help. Companies specializing in incident response and forensics will have the skills and experience you need to respond.
Task | Owner | Output |
---|---|---|
Patching and systems updates. Where applicable apply vendor patches or assess the availability of application or framework updates. | Incident Lead | List of systems updated, and patches applied in incident specific security channels. |
Address privacy issues. If the privacy of any personal data has been compromised, the privacy officer must assess the impact and determine the appropriate action to take in remediation. | Privacy Officer | Assessment on whether further action is required. |
Address software flaws. Where an incident relates to a vulnerability or issue with an in-house application, ensure that code is fixed and tested before deployment. Ensure that all instances of the flaw or issue are addressed and not just the initial instance. Engage external assistance where appropriate. | Deputy | Changes to code base linked to specific commits and tests. |
Address configuration issues. Where an incident relates to a misconfiguration, ensure that this is addressed in the build systems or scripts and the host is rebuilt with the new configuration. Avoid fixing in place on deployed servers where possible to avoid configuration creep. | Incident Lead | Rebuilt hosts and updated host build files. |
Initiate backup recovery. Where data has been lost or compromised, ensure that a backup is available and prepared for restore. | Incident Lead | Estimated recovery time and recovered data. |
Re-image or rebuild equipment or machines. Where equipment has been compromised or affected by an incident, re-image, or rebuild from a trusted base image. Do not attempt to fix individual issues such as malware or viruses in place. | Incident Lead | Rebuild hardware |
Address gaps in logging and audit. If the incident highlighted gaps in logs or audit trails, address these and ensure logs are centralized, securely stored and monitored. | Deputy | Logging and audit for the acknowledged gaps |
(Optional) Engage an external specialist to assess and retest remediation. For serious or complex incidents, ensure an objective specialist has reviewed and retested the remediation issues. | Incident Lead | Assessment results and report |
Communicate with affected customers. Once remediation is complete, the affected customers should be briefed. Where the action is required on their part (such as resetting a password) this must be clear and concise. Communication content and a distribution list should be QA’d by the Incident lead and a senior leader before sending. | Comms Lead | Draft communications, sign off and actual communications |
(Optional) Executive brief. For high severity issues, an executive brief should be compiled upon remediation. This should address any concerns and explain the risks and effects of the incident in concise terms. | Incident Owner IT Manager | Executive briefing document |
Unlike the actions we have discussed above, this last set of suggested tasks are ongoing. They need to be something you do frequently at all stages of the incident response process. The aim here is to ensure you always have a good record of what you have done or discovered and that you are always taking steps to learn more about the situation as it evolves.
This documentation and discovery not only helps with post-incident reviews but makes it much easier to share the load during an incident and let people swap in and out.
Task | Owner | Output |
---|---|---|
Record all actions, findings, and communications in the log. | All | Documented audit trail |
Access and monitor all logs and audit trails for the affected accounts or systems. (Optional) Where relevant or appropriate, increase logging levels to ensure sufficient granularity. | All | None |
Identify, document, and challenge all assumptions (ongoing). | All | Documented audit trail |
Whatever the incident you face, this process provides a stable and predictable set of activities and actions that you and your team can use to respond. When we put our knowledge of this incident response process into a repeatable document, we form what is known as an incident response plan, your grab-and-go guide to surviving in stressful times.
There are many ways to document these plans—stick with what works for your internal culture and documentation style. Rather than define the document template, we will look at the sections you need to include and why they are important.
Like many of the subjects we have discussed in this book, just because something is an incident, it doesn’t mean the world is ending. Security isn’t always critical and that’s OK.