Site Reliability Engineer 3

This opening is for a Site Reliability Engineer 3 that has development and system administration experience with large systems who can use their experience to formulate and implement automation solutions to support our monitoring and system administration teams in tasks that either are risky to the system, prone to mistakes, labor intensive, time consuming and/or repetitive. The tasks can include that for which an SOP currently exists or that can be developed, but is likely not to be followed consistently. The task is to create sustainable tools as a force multiplier that don’t function more poorly than the manual methods. Experience with the pros and cons of tools like SALT and PUPPET will be useful for some tasks but not for other tasks where the team might build a GUI for the shift to perform tasks on the clusters (or to automate those tasks entirely) which will require development skills.

Basic Qualifications:

Bachelor’s Degree in Computer Science or in a related technical field is highly desired which will be considered equivalent to two (2) years of experience. A Master’s degree in a Technical Field will be considered equivalent to four (4) years of experience. NOTE: A degree in Mathematics, Information Systems, Engineering, or similar degree will be considered
Cloud Systems Administrator or Developer Certification 
Fourteen (14) years of experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution.
Ten (10) years experience in system engineering/architecture.
Ten (10) years experience working with products that support highly distributed, massively parallel computation needs such as Hbase, Hadoop, Acumulo, Big Table, Cassandra, Scality et cetera.
At least ten (10) years experience writing software scripts using scripting languages such as Perl, Python, or Ruby for software automation.
At least four (4) years of experience managing and monitoring large Cloud System
Experience in performing and providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems to include monitoring technical health of a system, improving organizational processes, implementation of postmortem (failure) analysis and incident management.
Active TS/SCI security cleance with a current polygraph is required

Requisition Number: VTREQ0001457

Upload your CV/resume. Max. file size: 10 MB. PDF, doc/docx