A Beginner’s Walkthrough of a Simple Machine Learning Process Using Logistic Regression

before we dive into an actual machine learning use case, it’s worth going over what the whole process usually looks like. You’ve probably seen the steps written somewhere before, but let’s just say them again: first, we load data. Then we preprocess it, train a model, evaluate it (sometimes), and finally, we make predictions. That’s the flow.

In this post, I’ll show you a super simple example using logistic regression. It’s one of the basic models out there for classification problems — and yes, it’s still widely used.

We’ll do this in Python, and don’t worry if you’re not an expert. The idea is just to get a feel for how things work. It’s less about memorizing code and more about understanding the structure.

What Is a Classifier Anyway?

Let’s clear this up first. In machine learning, a classifier is just a model that assigns a label to some input data. So if I give it some features, like measurements of a flower, it’ll tell me which species that flower probably belongs to.

This is what we call a supervised learning problem — we’re training the model on data where we already know the answer (the label), and then later we ask it to predict labels for new data we haven’t seen before.

The Plan for This Example

Here’s what we’re going to do:

  1. Import the libraries we need
  2. Load the dataset
  3. Split the data into input (X) and output (Y)
  4. Build and train a logistic regression model
  5. Make predictions
  6. Print the results

We’ll be using the Iris dataset — kind of like the “Hello World” of machine learning. It’s small, simple, and just right for getting comfortable with ML basics.

Step 1: Importing the Libraries

Before we do anything, we need to bring in some tools. I added a comment in my code like this:

# Import necessary libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
  • pandas helps us load and manipulate the data. You’ll see it used all the time in data science.
  • LogisticRegression comes from sklearn (aka scikit-learn), which is a popular machine learning library in Python.

Now, if you run this and get errors, it probably means those libraries aren’t installed in your Jupyter Notebook environment. You’ll need to open a terminal from the Notebook interface and install them manually using a command like:

conda install -c anaconda scikit-learn

Once that’s done, you can re-run the import lines, and they should work fine.

Step 2: Loading the Dataset

Now let’s bring in the data.

iris_data = pd.read_csv("iris.csv")
iris_data.head()

This reads a file named iris.csv and loads it into a pandas DataFrame called iris_data. The head() function just shows us the first five rows so we can get a quick look.

The dataset has different columns that describe features of iris flowers — like petal length, sepal width, and so on. It also has the species name, which is what we’ll be trying to predict.

Step 3: Splitting Features and Labels

To train a model, we need to separate what we’re using as input (features) from what we’re trying to predict (labels).

Here’s the code:

X = iris_data.drop(columns=["ID", "Species"])
Y = iris_data["Species"]
  • X is everything except the “ID” and “Species” columns. We don’t need “ID” — it’s just a row number.
  • Y is the “Species” column — that’s our target.

This gives us a DataFrame X with the features and a Series Y with the labels.

Step 4: Creating and Training the Model

Now we can create a model and train it.

model = LogisticRegression()
model.fit(X, Y)

That’s it. The model is now trained on the data. Behind the scenes, it’s figuring out the relationships between the features and the label. So when we give it new data, it can guess the species based on what it’s learned.

Step 5: Making a Prediction

Let’s pretend we have some new data — a flower with certain measurements — and we want to know its species.

new_data = [[5.1, 3.5, 1.4, 0.2]]
prediction = model.predict(new_data)
print("Predicted species:", prediction[0])

This prints out the predicted species name — something like “Iris-setosa”.

Just like that, we’ve built a very basic machine learning pipeline.

Wrapping Things Up

So what did we actually do here? Let’s summarize:

  • Loaded the Iris dataset
  • Split it into features and labels
  • Created a logistic regression model
  • Trained it using fit()
  • Made a prediction on new input data

The whole process is super repeatable. Once you understand it, you can apply the same pattern to more complex problems with bigger datasets, more advanced models, or different types of predictions.

Machine learning can feel overwhelming when you look at all the terminology and tools out there, but if you break it down — just like we did here — it starts to make more sense.

Take it slow. Try editing the dataset, maybe change the input values and see what predictions you get. The best way to learn is to play around and ask questions when things don’t work.

This isn’t about writing perfect code — it’s more about developing a feel for the workflow and logic.

Next time, we’ll look at how to evaluate a model and maybe even try a different classifier.