Responsibilities:
- Continuously monitor network and service performance using specialized tools.
- Detect, troubleshoot, and respond to incidents in real-time to minimize downtime.
- Identify potential issues before they escalate into critical problems.
- Diagnose and resolve network, hardware, and software issues promptly.
- Coordinate with other technical teams to escalate complex issues and ensure resolution.
- Document troubleshooting steps and solutions for future reference.
- Collaborate with different IT teams, vendors, and third-party service providers to resolve issues.
- Communicate incident status and resolutions to relevant stakeholders.
- Maintain detailed documentation of network performance, issues, incidents, and resolutions.
- Generate reports on system performance, incidents, and downtime for management review.
- Ensure incident logs are up-to-date and adhere to company policies.
- Work in shifts (24/7 rotation) to ensure constant monitoring and quick response to incidents.
- Be available for on-call duties and emergency interventions when necessary.
Requirements:
- Proficiency in configuring monitoring tools and building custom dashboards.
- Hands-on experience with Grafana, Prometheus, and the ELK Stack.
- Strong proficiency in the Zabbix monitoring platform.
- Expertise in analyzing, troubleshooting incidents, and systematic follow-up for resolution.
- In-depth knowledge of network protocols: TCP/IP, DNS, HTTP, BGP, OSPF, SNMP, NetFlow, Syslog.
- Excellent documentation and reporting skills.
- Expertise in Windows and Linux operating systems.(LPIC 1- LPIC 2 & MSCE )
- Familiarity with Docker and containerized environments.
- Expertise in configuring and managing Cisco and MikroTik devices(CCNA/CCNP/MTCNA/MTCRE)
- Understanding of how to set up alerts, reports, and dashboards for real-time system and network status updates.