Site Reliability Engineering Manager-Midrange
Job title: Site Reliability Engineering Manager-Midrange in USA at PNC Financial Services
Company: PNC Financial Services
Job description: Position OverviewAt PNC, our people are our greatest differentiator and competitive advantage in the markets we serve. We are all united in delivering the best experience for our customers. We work together each day to foster an inclusive workplace culture where all of our employees feel respected, valued and have an opportunity to contribute to the company’s success. As a Site Reliability Manager within PNC’s Site Reliability Center (SRC), you will be based in Farmers Branch, TX, Pittsburgh, PA, Cleveland, OH, Birmingham, AL, Phoenix, AZ. The position is primarily based in a PNC location. Responsibilities require weekly time in the office or in the field on a regular basis. Some responsibilities may be performed remotely, at the manager’s discretion. Occasional travel may be needed.
Schedule is M-F 8:00am – 5:00pm. This position is leading teams across 3 shifts for 24/7 support.
Candidates are expected to be available for critical production issues as required. This may include off shift hours and weekends.PNC will not provide sponsorship for employment visas or participate in STEM OPT for this positionWe’re looking for a Site Reliability Engineering Technical Manager to lead our Midrange Operations and Engineering Support teams in a fast-paced, 24/7 enterprise IT environment. This role is ideal for a hands-on leader who thrives at the intersection of incident response, proactive remediation, SRE adoption, RedHat ecosystem support, and cross-functional collaboration.
You will be responsible for driving operational excellence, improving documentation, and ensuring resilience across midrange platforms, while mentoring a distributed team and influencing change across engineering, SRE, and production support.Skills Desired:
- Proven experience leading midrange or infrastructure operations in a high-availability environment
- Deep knowledge of Linux/RedHat systems, patching, and vulnerability remediation
- Familiarity with observability and APM tools (e.g., Dynatrace, vROps, Big Panda, Logscale)
- Strong incident management and SRE-aligned thinking (e.g., proactive issue identification, toil reduction)
- Excellent communication and documentation skills
- A collaborative approach to cross-functional engagement and knowledge transfer
- Review and manage alerts and events in Big Panda
- Track and prioritize R1/R2/P1 incidents, escalating to appropriate SRC Engineering teams
- Oversee Midrange requests via SRC Chat, ensuring timely and accurate responses
- Drive adoption of SRE practices, identifying systemic issues and remediating proactively
- Monitor and manage open issues via RedHat case management
- Lead regular knowledge transfer sessions across teams
o Discuss recent escalated incidents, excessive resolution time cases, and recurring issues
o Coordinate patching updates (IDS Patching)
What’s being patched, known issues, and current vulnerabilities
o Track open vulnerabilities by Mnemonic owner with a focus on remediation
o Provide visibility into Engineering Change activities and weekend change impacts
Platform Operations
- Manage and escalate issues including:
o Disk and path failures
o Access, login, and account security issues
o Remote access and server unavailability
o Patching defects and requests
- Collaborate closely with Midrange Engineering leads and other platform SMEs
- Drive the creation and upkeep of Linux system documentation, targeting at least one publish-ready doc every 3 weeks
- Maintain and enhance tooling documentation, including:
- Leads a team of Site Reliability Engineers in implementing, maintaining, and improving robust monitoring response sites and infrastructure applications.
- Recommends and facilitates the implementation of infrastructure enhancements as required to maintain the performance of sites in response to business growth and strategy.
- Streamlines the deployment process by introducing automated configuration management tools, resulting in a reduction in deployment time and increased efficiency.
- Oversees robust technical solutions for complex business and application challenges, while helping to define and communicate technical standards and best practices. Manages and oversees proactive reviews and audits of production sites, issue triage and follow up.
- Leads in the collaboration with cross-functional teams to design and implement scalable and highly available infrastructure.
- Maximizes staff contribution through professional growth and development, to increase teamwork and more effectively meet business needs.
- Customer Focused - Knowledgeable of the values and practices that align customer needs and satisfaction as primary considerations in all business decisions and able to leverage that information in creating customized customer solutions.
- Managing Risk - Assessing and effectively managing all of the risks associated with their business objectives and activities to ensure they adhere to and support PNC's Enterprise Risk Management Framework.
- Include Intentionally - Cultivates diverse teams and inclusive workplaces to expand thinking.
- Live the Values - Role models our values with transparency and courage.
- Enable Change - Takes action to drive change and innovation that will transform our business.
- Achieve Results - Takes personal ownership to deliver results. Empowers and trusts others in decision making.
- Develop the Best - Raises the bar with every talent decision and guides the achievement of all employees and customer.
Expected salary:
Location: USA
Apply for the job now!
[ad_2]
Apply for this job