YOUR LIFE'S MISSION: POSSIBLE
You have goals, dreams, hobbies and things you’re passionate about.
What’s Important to You Is Important to Us
We’re looking for people who not only want to do meaningful, challenging work, keep their skills sharp and move ahead, but who also take time for the things that matter to them—friends, family and passions. And we're looking for team members who are passionate about our mission—making a difference in military members' and their families' lives. Together, we can make it happen.
Don’t take our word for it.
The Site Reliability Engineer is a member of the Cloud Team and providing support on software development, operations and maintenance while dealing with complex infrastructure to improve performance, visibility, stability, availability and reliability using automated solutions. This role will provide Tier 3 support, either directly or by engaging with other stakeholders, for applications and platforms residing in the Cloud. Ideal candidate who has hands-on experience and understanding of software development lifecycle from inception to implementation. Successful Candidate would have knowledge and understanding of maintaining and will be responsible to ensure the reliability and speed of the software.
- Monitor alerts, metrics and logs to detect incidents, events and correlate them to find the root cause of outages.
- Conduct Post-Incident Review with various roles including developers, infrastructure engineers, product owners, system owners, information security to identify the cause and solution through automation to improve the agility, performance of the system.
- Work with other SREs to drive standards and consistency around best practices
- Create, modify runbooks and knowledge base which can be used by other engineers to follow and resolve incidents quickly. Identify opportunity and implement the automation needed to address and prevent operational issues.
- Ability to understand and modify existing code, scripts used for automation to build applications and infrastructure. Clearly identify and enable new alerts and monitors for critical services impacting system reliability.
- Drive increased efficiency across the teams, eliminating duplication, leveraging common DevOps processes, tools, and technology
- Collaborate with team in defining architecture; identify potential risks to successful implementation
- Work closely with business partners and software development teams in a matrix organization structure
- Automate tasks to reduce manual work, reduce outages, and enhance customer and employee experience
- Communicate and resolve complex production issues and implement preventative measures
Implement and tune monitoring, metric collection and alerting
- Identify opportunity and implement the automation needed to address and prevent operational issues
- Solid hand-on experience in setting up and correlating SIEM Monitoring Tools including but limited to: Azure Sentinel, Azure Log Analytics, Azure Monitor, Application Insights, Splunk, Moogsoft, CA APM/Wily Introscope etc.
- Senior Software developer in developing applications using tools such as Java, Spring Boot, Spring Framework, .NET Core, Angular, React, Vue.js
- Hands-on experience with a variety of database technologies including relational database such as Azure SQL, SQL Server, MySQL or NoSQL databases such as Azure Cosmos DB, MongoDB, Postgres SQL, etc.
- Hand-on experience integrating systems with REST APIs, Databases(RDBMS), LDAP, Active Directory, Azure Active Directory, RabbitMQ, Redis Cache, Azure Functions (Serverless)
- Hands-on experience in deploying applications to Production through automated CI/CD pipelines or automated scripts using tools such as Maven, Gradle, Docker, Git, JUnit, MSTest, Tomcat, SonarQube, Fortify, Selenium, Cucumber, Contrast Security, etc
- Understanding and experience delivering Twelve-Factor cloud-native applications
- Understanding and experience with Microservices architecture
- Knowledge, understanding and experience using ticketing systems for Catalogs and Change Management like ServiceNow, HP ITSM, BMC Remedy.
- Excellent communication and co-ordination skills to interact with different stakeholders who are technical and non-technical.
- Knowledge, understanding and experience of DevOps, Agile Methodologies
- Experience in Microsoft Azure Technologies
- Experience in Tanzu Application/Container Services (TAS/TKS) (Previously Pivotal Cloud Foundry) or equivalent container based platforms/products like Openshift, Azure Kubernetes Services, Google Container Services etc.
- Experience using ServiceNow ITOM and ITSM to create catalogs or to automate processes by integrating with other systems.
- We highly encourage SREs, DevOps, Application Developers, System developers, System Engineers who have knowledge and understanding of how software is built and managed.
Hours: Monday - Friday, 8:00am - 4:30pm
Location: 820 Follin Lane, Vienna, VA 22180
Remote Work Policy: Remote work is available for all positions contingent on business need and manager discretion
Equal Employment Opportunity
Navy Federal values, celebrates, and enacts diversity in the workplace. Navy Federal takes affirmative action to employ and advance in employment qualified individuals with disabilities, disabled veterans, Armed Forces service medal veterans, recently separated veterans, and other protected veterans. EOE/AA/M/F/Veteran/Disability
Navy Federal reserves the right to fill this role at a higher/lower grade level based on business need. An assessment may be required to compete for this position.
Bank Secrecy Act
Remains cognizant of and adheres to Navy Federal policies and procedures, and regulations pertaining to the Bank Secrecy Act.
This position is eligible for the TalentQuest employee referral program. Please indicate the employee who referred you when applying.