Roles & Responsibilities:
- Studying, transforming, and converting data science prototypes
- Deploying models to production
- Training and retraining models as needed
- Analyzing the ML algorithms that could be used to solve a given problem and ranking them by their respective scores
- Analyzing the errors of the model and designing strategies to overcome them
- Identifying differences in data distribution that could affect model performance in real-world situations
- Performing statistical analysis and using results to improve models
- Supervising the data acquisition process if more data is needed
- Defining data augmentation pipelines
- Defining the pre-processing or feature engineering to be done on a given dataset
- To extend and enrich existing ML frameworks and libraries
- Understanding when the findings can be applied to business decisions
- Documenting machine learning processes
Experience & Skills Fitment:
- 6+ years of IT experience in which at least 3+ years of relevant experience primarily in converting data science prototypes and deploying models to production
- Proficiency with Python and machine learning libraries such as pandas
- Strong working experience with pyspark
- Experience with Machine Learning life cycle and training/retraining
- Strong expertise in using kubeflow/airflow and docker containerization
- Knowledge of Big Data frameworks like Hadoop, Spark, etc
- Experience in working with ML frameworks like TensorFlow
- Strong written and verbal communications
- Excellent interpersonal and collaboration skills.
- Expertise in visualizing and manipulating big datasets
- Familiarity with Linux
- Ability to select hardware to run an ML model with the required latency
- Robust data modelling and data architecture skills.
- Advanced degree in Computer Science/Math/Statistics or a related discipline.
- Advanced Math and Statistics skills (linear algebra, calculus, Bayesian statistics, mean, median, variance, etc.)
Nice to have:
- Understanding of ML Xgboost api and usage of dask cluster
- Familiarity with Scala, Java, and R code writing.
- Exploring and visualizing data to gain an understanding of it, then identifying differences in data distribution that could affect performance when deploying the model in the real world
- Verifying data quality, and/or ensuring it via data cleaning
- Supervising the data acquisition process if more data is needed
- Finding available datasets online that could be used for training
Benefits:
- Kloud9 provides a robust compensation package and a forward-looking opportunity for growth in emerging fields.
Equal Opportunity Employer:
- Kloud9 is an equal opportunity employer and will not discriminate against any employee or applicant on the basis of age, color, disability, gender, national origin, race, religion, sexual orientation, veteran status, or any classification protected by federal, state, or local law.