We are looking for a talented Data Scientist with expertise in machine learning, data engineering, and model optimization. The ideal candidate should have strong proficiency in Python, PySpark, and SQL, with experience in time series forecasting, feature engineering, and evaluating data model performance. The role requires the ability to work on large-scale data integration projects across different industries.
A key aspect of this position involves building machine learning models that not only meet but surpass user expectations, delivering tangible value to the business. The candidate should be skilled in data model optimization, feature engineering, and using performance evaluation metrics to ensure high-performance solutions.
This role also requires experience with cloud platforms, ETL tools, data transformation processes, and handling both structured and unstructured data. While not mandatory, knowledge of object-oriented programming languages (C#, Java, JavaScript) would be an advantage. Strong communication skills are essential for effective collaboration with cross-functional teams and the presentation of findings.
Key Responsibilities: - Develop and optimize machine learning models, focusing on time series forecasting and predictive analytics.
- Conduct feature engineering and optimize data models to improve accuracy and efficiency.
- Continuously assess model performance using metrics like MAPE, RMSE, R , and refine strategies as needed.
- Design and implement data pipelines using PySpark, SQL, and cloud-based solutions for efficient data integration.
- Work on large-scale data integration initiatives using tools like Boomi, SnapLogic, SSIS, or Palantir for ETL processes.
- Utilize Palantir Foundry, Google Cloud, AutoAI, and Google Colab for data modeling, processing, and automation.
- Design and maintain data warehouse solutions to support advanced analytics and business intelligence.
- Perform complex data transformations using SQL queries and data objects for AI/ML-driven projects.
- Collaborate closely with business stakeholders to ensure models meet business objectives and user expectations.
- Deploy, monitor, and continuously improve machine learning models in production environments.
- Effectively communicate technical insights and findings to both technical and non-technical stakeholders.
Required Skills & Qualifications: - Proficiency in Python, PySpark, and SQL for data analysis, feature engineering, and model development.
- Expertise in time series forecasting models, including ARIMA, Prophet, LSTMs, and other ML-based approaches.
- Strong background in data model optimization, feature engineering, and performance evaluation.
- In-depth understanding of ML model evaluation metrics and best practices for improving model accuracy.
- Hands-on experience with data engineering, including data pipelines, ETL, and data transformation processes.
- Experience with tools like Boomi, SnapLogic, SSIS, or Palantir for data integration.
- Proficiency in cloud platforms, particularly Google Cloud (BigQuery, Vertex AI, Cloud Functions, etc.).
- Familiarity with Palantir Foundry for data processing, analysis, and visualization.
- Ability to optimize and query large-scale datasets using data lakes and relational databases.
- Experience with AutoAI for automated model selection and hyperparameter tuning.
- Experience with Google Colab for collaborative machine learning development.
- Excellent problem-solving and communication skills, with the ability to explain complex concepts to business stakeholders.
Preferred Qualifications: - Experience with MLOps for continuous deployment, monitoring, and retraining of machine learning models.
- Knowledge of business intelligence and reporting tools for data visualization.
- Background in supply chain, logistics, or operational forecasting.
- Experience in both batch and real-time data processing architectures.
- Ability to optimize SQL queries and data transformations for performance improvements.
- Familiarity with object-oriented programming languages such as C#, Java, or JavaScript (optional but beneficial).