I'm

Purvansh Jain

Versatile Engineer with 4 years of experience in building scalable data platforms and production-ready NLP systems across cloud environments such as AWS, Azure, and GCP. Skilled in LLM fine-tuning, real-time data processing, and designing robust ETL pipelines using Databricks, Snowflake, and Apache Airflow. Proven success in deploying models like BERT, GPT, and LaMDA for sentiment analysis, entity recognition, and multilingual NLP. Ivy League graduate with 4 patents in AI and cloud security, passionate about developing intelligent platforms that drive real-worldimpact.
Hero Image

Experience

JAN 2024 - CURRENT

Machine Learning Engineer

Metlife

  • Fine-tuned Generative AI models including Phi, Qwen, LaMDA, BERT, and GPT to enhance predictive analytics, decision-making, and user interaction for insurance and financial service applications.

  • Deployed and monitored ML models on AWS SageMaker and Azure ML, enabling scalable, low-latency inference pipelines in production environments, including claim and policy departments.

  • Designed and maintained distributed workflows using Databricks, AWS Lambda, and SSIS for batch loading and real-time data processing, improving reliability by 25% and migration efficiency by 40%.

  • Built graph-based knowledge representations using Neo4j and AWS Neptune to support entity linking, multilingual context understanding, and semantic querying for insurance datasets.

  • Architected and optimized star and snowflake schemas to enable complex reporting and analytics, while restructuring ETL flows to boost operational efficiency by 25%.

  • Integrated structured and semi-structured data from NetSuite and Salesforce to drive real-time NLP model inference and downstream reporting dashboards.

  • Collaborated with cross-functional teams to support LLM tuning, experimentation pipelines, and prompt optimization using retrieval-augmented generation-based pipelines for voice-driven NLP use cases.

  • JUN 2021 - JULY 2022

    Data Engineer

    Oracle Cerner

  • Architected and deployed scalable, fault-tolerant data pipelines using Hadoop, Apache Spark, and Airflow, reducing ETL processing time by 40% and improving throughput for critical healthcare data workloads.

  • Containerized data services using Docker and automated CI/CD workflows with AWS Lambda, reducing deployment time by 30%, improving success rate by 15%, and enhancing reliability of ER production systems.

  • Developed and optimized SSIS packages for extraction, transformation, and batch loading of multi-source data, improving data architecture management and boosting migration efficiency by 30%.

  • Engineered end-to-end data workflows for transforming data from Snowflake, including data preprocessing, feature extraction, data modeling, and deployment to support analytics and real-time ML inference.

  • Created a multi-terabyte, full end-to-end Data Warehouse on Amazon Redshift, capable of handling millions of records daily, improving data processing capacity by 50% and query performance by 35%.

  • Developed RESTful APIs using Spring Boot to streamline patient-provider data exchange and enhanced Emergency Room responsiveness by 200% through Agile practices and real-time data simulators, reducing technician response times and improving access to patient data.

  • JULY 2020 - MAY 2021

    Data Engineer

    Hexaware Technologies

  • Leveraged AWS services including S3, EC2, RDS, Lambda, and Redshift to design and deploy scalable data pipelines, enabling real-time analytics and reducing data processing costs by 30%.

  • Developed and optimized data models in Power BI to create interactive dashboards, significantly improving data visualization and enabling stakeholders to make informed decisions faster.

  • Developed and sustained large data ingestion processes using Python and Airflow, and automated data validation scripts that cut the amount of manual work by half.

  • Merged structured and unstructured data into Snowflake, enhancing data availability for BI tools and supporting stakeholders in setting up data models for a 15% improvement in reporting precision.

  • Implemented GDPR compliance across data infrastructure, ensuring adherence to regulatory standards.

  • Developed ETL jobs using PySpark, implementing data lineage to track data transformations through stages, resulting in a 25% reduction in data processing errors.

  • Nov 2019 - Feb 2020

    Cloud Intern

    INFRA STACK LABS Private Ltd

  • Utilized AWS CDK to implement Infrastructure as Code (IaC), enhancing cloud resource management and consistency.
  • Gained expertise in deploying infrastructure using AWS CloudFormation and Terraform, catering to the organization's needs
  • Apr 2019 - May 2019

    Technical Intern

    CODEBRINK

    Developed python scripts for weather data processing using pandas & authored technical articles on Python language.

    Techincal Skills

    Programming Languages & Scripting Languages

    Python, Java, C, CSS, HTML, PL/SQL, Shell Scripting

    Framework & Libraries

    React.js, Node.js, OpenCV, Apache Spark, PyTorch, Seaborn Sklearn, Spring MVC & Boot, TensorFlow, Transformers

    Database Connectivity & Querying

    AWS RDS, DBeaver, DataGrip, SQLite, JDBC, MySQL, MongoDB, Neo4j, PostgreSQL

    AI & ML Technologies

    CNN, FNN, KNN, GMM, MLE, ICA, PCA, SVMs, Decision Trees, Random Forests, K-means, Natural Language Processing, Reinforcement Learning, Regression (Simple, Ridge, LASSO, Logistic)

    Data Manipulation

    Analysis, Mining, Preprocessing, Mapping, Cleaning, Visualization, Modeling, Warehousing, Storytelling, Wrangling, Acquisition

    Cloud Technologies

    AWS (Polly, CLI, S3, EC2, Lambda), Kubernetes, Google Cloud Platform, IBM Cloud, Microsoft Azure

    CyberSecurity Tools

    Intrusion Detection & Prevention, Malware Analysis, Penetration Testing, Risk Analysis, STS, Vulnerability assessment

    Infrastructure Tools

    Apache HTTPD, NFS, NGINX, Samba, VMware, VirtualBox

    Visualization Tools

    Google Charts, Microsoft Power BI, MS Excel, Tableau

    IDE / OS Experience

    Eclipse, IntelliJ IDEA, PyCharm, Visual Studio, Windows, macOS, Linux (Parrot),

    Other Tools & Technologies

    CI/CD, Crucible, Docker, Git, Colab, Grafana, Jenkins, JIRA, Postman, Power BI, SonarQube, Tableau, VMware

    My Projects

    Projects

    Education

    Aug 2022 - Dec 2023

    UNIVERSITY OF PENNSYLVANIA

    Master of Science in Data Science

    GPA : 3.5/4.0

    • President for (RANGOLI) : South Asian Association at Penn ; Director of Cultural Programming for Graduate Association of UPenn (GAPSA)
    • Student lead at Consulate General of India - NY and a memeber of CIO in Greater Philadelphia.
    • Presented insights on the Diversity, Equity, and Inclusion rights for international students at the Princeton Model UN Conference 2023.
    • Organized PennApps XXIV, Nation's first student run Collegiate Hackathon
    • Certifications: ICSI-CNSS, CEH, CISSP, AWS Fundamentals & Azure Certified Cloud Practitioner

    Aug 2017 - May 2021

    JAIN UNIVERSITY

    B.Tech in Computer Science with specialization in Cloud Technology and Information Security

    GPA: 3.9/4.0

  • Organizer - JU Hackathon, National level hackathon conducted by IEEE Student Chapter ( 2019 )
  • Univeristy Gold Medalist Felicitated by Prime Minister of INDIA
  • My Patents

    Jain, Purvansh

    Issued January 07, 2025.

    FRAMEWORK FOR RESPONSIBLE AI CONTENT GENERATION USING LLM

    Jain, Purvansh

    Issued November 03, 2021.

    Early detection of Denial of Service and Deception attacks in industrial cyber-physical systems

    Jain, Purvansh

    Issued October 27, 2021.

    Network-based deployment of IoT architecture through software defined networks.

    Jain, Purvansh

    Issued September 10, 2021.

    Network traffic flow predictive, control and analytical model for cloud based networks and systems.

    Publications