Project 2: 30 Days of Data Sets Training

Introduction

Welcome to my second machine learning project! In this project, I have worked on training a machine learning model using a dataset and implemented a simple slider using Jupyter widgets for data exploration.

Dataset

I used the Delaney solubility dataset with descriptors, which is available here. The dataset contains various molecular descriptors along with the solubility values.

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/dataprofessor/data/master/delaney_solubility_with_descriptors.csv")

![11](https://github.com/Gill-tech/machine-learning/assets/101551262/820676f9-8629-46d5-91d9-4de799c0c07d)

Data Preparation

I started by preparing the data for training the model. Here are the initial steps:

# Extracting the target variable
y = df["logS"]

# Extracting features
x = df.drop('logS', axis=1)

Data Splitting

Next, I split the data into training and testing sets using the train_test_split function from scikit-learn:

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=100)

Model Building

For this project, I chose to use a simple linear regression model. Here's how I built and trained the model:

from sklearn.linear_model import LinearRegression

# Initializing the linear regression model
lr = LinearRegression()

# Training the model
lr.fit(x_train, y_train)

Applying the Model

I applied the trained model to make predictions on both the training and testing sets:

# Making predictions on the training and testing sets
y_lr_train_pred = lr.predict(x_train)
y_lr_test_pred = lr.predict(x_test)

Conclusion

This project is a part of my 30 days of data sets training, where I am exploring different datasets and building machine learning models. The next steps involve further analysis, model evaluation, and potentially improving the model's performance.

Feel free to explore the Jupyter notebook for more details and insights. If you have any suggestions or questions, please let me know!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Machine_Learning.ipynb		Machine_Learning.ipynb
README.md		README.md
delaney_solubility_with_descriptors.csv		delaney_solubility_with_descriptors.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 2: 30 Days of Data Sets Training

Introduction

Dataset

Data Preparation

Data Splitting

Model Building

Applying the Model

Conclusion

About

Releases

Packages

Languages

Gill-tech/machine-learning

Folders and files

Latest commit

History

Repository files navigation

Project 2: 30 Days of Data Sets Training

Introduction

Dataset

Data Preparation

Data Splitting

Model Building

Applying the Model

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages