Business Continuity & Disaster Recovery (BC/DR) Plan
ECS Technology Solutions
Version 1.1 – Updated: 17 April 2025
1. Purpose
This BC/DR Plan ensures ECS Technology Solutions (“ECS”) can continue to deliver critical managed services and meet contractual and regulatory obligations during and after a disruptive incident, while safeguarding customer data and minimizing downtime.
2. Scope
- People: All employees, contractors, and key third-party partners.
- Facilities: Headquarters in Elkhorn, NE and any remote offices.
- Technology: On-premises infrastructure, cloud resources in AWS & Azure, SaaS platforms (M365, ticketing, SIEM), and customer-hosted systems managed by ECS.
- Services: 24×7 Service Desk, Remote Monitoring & Management (RMM), Security Operations Center (SOC), Backup & Recovery, Professional Services, and Hosting.
3. Objectives
Objective | Target |
---|---|
Protect life & safety | Immediate |
Stabilize critical services | ≤ 4 hours |
Restore all customer-facing services | ≤ 24 hours |
Return to normal operations | ≤ 7 days |
Communicate status updates | First notice ≤ 30 min; subsequent every 2 h |
4. Definitions
Term | Meaning |
---|---|
RTO | Recovery Time Objective – max acceptable downtime. |
RPO | Recovery Point Objective – max tolerable data loss. |
Critical System | Outage halts customer operations or security. |
Incident Commander (IC) | Individual who leads BC/DR response (default: Information Security Manager). |
5. Business Impact Analysis (BIA)
Tier | Example Systems | RTO | RPO |
---|---|---|---|
Tier 1 – Critical | Ticketing (HaloPSA), SIEM, backup vaults, RMM, MFA IdP | 4 h | 1 h |
Tier 2 – Important | Documentation (Hudu), ERP/finance, internal Git repos | 24 h | 8 h |
Tier 3 – Routine | Marketing website, historical archives | 72 h | 24 h |
6. Roles & Responsibilities
Role | Responsibility |
---|---|
President | Activate BC/DR plan, allocate resources, approve external communications. |
Incident Commander (IC) | Lead response, coordinate teams, track progress. |
Chief Technology Officer (CTO) | Execute infrastructure failover, validate restorations. |
Information Security Manager | Manage security incidents, coordinate forensics, maintain BC/DR documentation, schedule tests, and notify regulators if required. |
Service Delivery Managers | Liaise with clients, prioritise tickets, provide status reports. |
All Staff | Follow instructions, execute playbooks, escalate issues. |
7. Risk Assessment (Top Threats)
- Cyber-Attack / Ransomware – Data encryption & service disruption.
- Cloud Region Outage – Loss of primary AWS or Azure region.
- Critical SaaS Failure – Ticketing or SIEM vendor outage.
- Physical Disaster – Fire, flood, or tornado impacts HQ.
- Pandemic / Workforce Unavailability – Staff unable to work onsite.
8. Preventive & Mitigation Controls
- Encryption: AES-256 (or strongest available) for data at rest; TLS 1.3 for transit.
- Backups: Daily backups to isolated Backup account/vault; cross-region replication; immutable retention.
- Real-Time Patch & Vulnerability Management: Immediate critical patching; continuous scanning.
- Least-Privilege IAM & MFA: Mandatory on Tier 1 systems.
- Redundant Cloud Architecture: Multi-AZ, cross-region failover for critical workloads.
- Endpoint Protection & EDR: Detect and contain threats rapidly.
- UPS & Generator: Protect on-prem equipment from power loss.
9. Recovery Strategies
9.1 Data Backup & Restore
- Snapshots stored in isolated AWS/Azure backup vaults (encrypted).
- Restore precedence: Tier 1 → Tier 2 → Tier 3.
9.2 Cloud Infrastructure Failover
Platform | Primary | Secondary | Method |
---|---|---|---|
AWS | us-east-1 | us-west-2 | Infrastructure-as-Code redeploy via CloudFormation + data restore |
Azure | Central US | East US 2 | Azure Site Recovery + ARM templates |
9.3 SaaS Service Outage
- Switch to out-of-band comms (Teams → SMS) if IdP down.
- Exported data restored into secondary provider or database.
9.4 Office Inaccessibility
- Staff operate fully remote using secured endpoints and MFA.
- VPN not required (Zero-Trust SaaS); fallback cellular hotspots issued.
10. Disaster Recovery Procedures
- Detection & Triage: SOC escalates event to IC.
- Declare Incident: IC records start time, severity, notifies President.
- Activate Plan: Mobilise recovery teams, reference relevant playbooks.
- Communication: Send initial client advisory ≤ 30 min via status page & email.
- Recovery Actions: Execute appropriate strategy (e.g., failover, restore).
- Validation: QA confirms service functionality; monitoring re-enabled.
- Return to Normal: Switch back to primary sites; debrief scheduled.
11. Communication Plan
Audience | Method | Frequency |
---|---|---|
Internal Staff | Teams / phone bridge | 30-min stand-ups |
Clients | Email from support@ecs.rocks & Status Page | Initial + 2-hourly |
Vendors / ISPs | Phone & ticket portals | As needed |
Regulators / Authorities | Letter or secure email | Within legal timeframes |
12. Testing & Training
- Annual Tabletop Exercises – Simulate scenarios, update gaps.
- Monthly KnowBe4 Training – Reinforce BC/DR awareness & phishing resilience.
13. Plan Maintenance
- Reviewed annually by the Information Security Manager and CTO, or sooner after major changes.
- Lessons learned post-incident feed into next revision.
- Latest approved plan stored and distributed via SharePoint.