Insurance Premium Prediction Application
Overview
This is a machine learning application designed for predicting insurance premiums. The project leverages a variety of tools and frameworks to streamline data management, experiment tracking, and model deployment.
π οΈ Tools Utilized
- DVC (Data Version Control): Used for managing and versioning data pipeline.
- Git: Version control system for tracking code changes.
- MLflow: Used for tracking the model training and model evaluation.
- GitHub Actions Server: Used for continuous integration and deployment.
- Dagshub: Facilitates MLflow experiment tracking and DVC data pipeline.
π’οΈ Machine Learning Pipeline
Data Ingestion π₯
The application ingests insurance premium data from the data/insurance.csv data path and saves it into artifacts/DataIngestionArtifacts.
Data Transformation π§
Data undergoes transformation to prepare it for model training. Transformed data and preprocessing artifacts are saved into artifacts/DataTransformationArtifacts. Preprocessors are also stored in models/.
Model Training π€
Multiple machine learning models are trained:
Linear Regression, Ridge Regression, Lasso Regression, Polynomial Regression, Random Forest,
Gradient Boosting, XGBoost, LightGBM, Catboost.
The top 4 performing models based on training metrics are selected. Both models and associated metrics are saved into artifacts/ModelTrainerArtifacts. MLflow is used to track model parameters and metrics throughout this process.
Model Evaluation π
The best-performing model on test data is selected and saved into artifacts/ModelEvaluationArtifacts and models/. Model evaluation metrics are tracked using MLflow.
Streamlit App Development π»
A Streamlit application is developed to allow users to input data and receive predictions from the trained model.

Model Deployment π
The model is deployend on the AWS EC2 using Docker and Github Action Server.
π Model tracking with MLFlow

ποΈ Data pipeline tracking with DVC

π Directory Structure
π.github/
βββ πworkflows/
βββ main.yaml
πdocs/
βββ πdocs/
β βββ index.md
β βββ getting-started.md
βββ mkdocs.yml
βββ README.md
πsrc/
βββ init.py
βββ πcomponents/
β βββ init.py
β βββ data_ingestion.py
β βββ data_transformation.py
β βββ model_trainer.py
β βββ model_evaluation.py
βββ πconstants/
β βββ init.py
βββ πentity/
β βββ init.py
β βββ config_entity.py
β βββ artifact_entity.py
βββ πpipeline/
β βββ init.py
β βββ training_pipeline.py
β βββ prediction_pipeline.py
βββ πutils/
β βββ init.py
β βββ utils.py
βββ πlogger/
β βββ init.py
βββ πexception/
βββ init.py
πdata/
βββ insurance.csv
πexperiment/
βββ experiments.ipynb
requirements.txt
requirements_app.txt
setup.py
app.py
main.py
README.md
implement.md
.gitignore
template.py
prediction.py
init_setup.ps1
dvc.yaml
Dockerfile
demo.py
config.json
.dockerignore
.dvcignore
π Models
- Linear Regression
- Ridge Regression
- Lasso Regression
- Polynomial Regression
- Random Forest
- Gradient Boosting
- XGBoost
- LightGBM
- Catboost
π₯οΈ Installation
π οΈ Requirements:
- Python 3.10
- mkdocs
- dvc
- numpy
- pandas
- colorama
- mlflow==2.2.2
- dagshub
- scikit-learn
- xgboost
- lightgbm
- catboost
- streamlit
βοΈ Setup
To reproduce the model and run the application:
-
Clone the repository:
git clone <repository_url>
cd <repository_name> -
Set up the virtual environment and install the requirements:
./init_setup.ps1 -
Execute the whole pipeline:
python main.py
Now run the streamlit app.
π― Inference demo
-
Run the Streamlit app:
streamlit run app.py2. Enter the input values and get prediction
Contributors π¨πΌβπ»
- Ravi Kumar