Senior Site Reliability Engineer - Infra (VP) (Hybrid)
The Role:
As a Site Reliability Engineer, you will be critical in ensuring our software products' reliability, scalability, and performance.
You will be responsible for designing and implementing highly available and fault-tolerant systems while working closely with the development team to deliver high-quality products.
In addition, you will work on complex and challenging problems, develop innovative solutions, and contribute to a dynamic and collaborative team environment.
If you have a passion for solving complex technical issues and ensuring the highest levels of system performance, we want to hear from you!
Functional Key Responsibilities:
Collaborate with development and product teams to ensure that applications and systems are designed and implemented with reliability, scalability, and performance in mind.
Automate and streamline operational processes, from deployment to monitoring and alerting, to improve efficiency and reduce manual error.
Design, implement, and maintain complex infrastructure systems for high-availability production environments.
Monitor systems and applications for performance, availability, and security, and respond to issues quickly and efficiently.
Continuously improve systems and applications' reliability, scalability, and performance through root cause analysis, code and architecture review, and proactive monitoring.
Participate and respond to critical incidents promptly and efficiently, performing troubleshooting and incident management as needed.
Develop and maintain disaster recovery and business continuity plans to ensure business continuity in case of service outages or disasters.
Provide technical guidance and mentorship to other engineers on reliability and scalability best practices, tools, and methodologies.
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency.
Technical Skills/Qualifications:
Extensive experience as a Site Reliability Engineer or Infra Architect.
Excellent hands-on experience in handling Linux based systems.
Proven Automation skills with tools including Ansible, Chef, etc.
Good Scripting knowledge in Bash/Python/Perl/etc.
Good support knowledge on Middleware tools like Apache, Web Sphere and Messaging tools like Tibco, JMS, RabbitMQ, etc.
Experience in rolling out redundant, mission-critical applications in a highly available production environment.
Familiarity with Databases including Sybase/Mongo/Couchbase.
Familiarity with monitoring Tools including ITRS, Grafana, Prometheus.
Strong analytical and problem-solving skills.
Consistently demonstrates clear and concise written and verbal communication skills.
Nice to Have:
Working knowledge on Equity order flow including Client Connectivity, FIX Protocol, Risk Validations will be considered highly advantageous
Knowledge in Cloud and DevOps Tech stack including Jenkins and Git.
Troubleshooting skills on Network.
Previous experience in a Banking/Financial institution.