10+ years of Systems or AIML production-service experience, commensurate with running cutting-edge hybrid cloud services in China and the rest of world
Self-motivated and proactive, with demonstrated creative and critical thinking capabilities
Ability for identifying problems in depth, distinguishing purposes vs. measures without confusion
Solid understanding of system architecture and large-scale service or computational platform operations
Demonstrated understanding on system management, covering aspects of configuration, security, performance, troubleshooting and usage accounting
Proficiency in coding with scripting and programming languages, including Bash, Python, GoLang, while having the ability of selecting the proper language as tool to solve a certain problem
Knowledge of large data storage and processing using SQL and Cassandra, HDFS and S3, Yarn and Spark
Knowledge of ML as well as experience in developing real ML jobs
Experience of designing and implementing systems to support ML applications
Experience in large-scale service and job deployment, using an orchestration framework (Kubernetes) and cloud services for large-scale projects
Experience in observability of system behaviors (e.g. Prometheus, Grafana)
Strong sense of thoroughness, driving details, delivering running code and contributing to collective understanding of organization
Sense of speed and prioritization, driving what matters with constrained resources while delivering high-quality results
Good communication with internal and external teams, in English and in Chinese