CSV plotting in Python according to category/name

by Mora O.   Last Updated October 19, 2018 22:26 PM

I have managed to plot the data according to what column number someone inputs for x and y-axis. My data is formatted as a CSV where there is a species name in the fourth column:

5.1,3.5,1.4,0.2,Iris-setosa

7.0,3.2,4.7,1.4,Iris-versicolor

5.8,2.7,5.1,1.9,Iris-virginica

At the moment, my program runs correctly and plots points where I need them to. The problem is that the points are all the same color. I need to somehow tell the program to look at the species name and use them as categories for their corresponding values. There are three species so the data should be in three colors with a legend.

import random
import matplotlib.pyplot as plt
import csv

#a function that will take data from a CSV and plot them according to which columns are inputted

def plot_data(fileName,colX,colY):
    dataList = []
    sepalLengthCM = []
    sepalWidthCM= []
    petalLengthCM =[]
    petalWidthCM = []
    species = []

    #reading the file
    with open(fileName, "r") as file:
        data = csv.reader(file) 

        #making a list of all the rows of data
        for row in data:
            dataList.append(row)

        #seperating each column into it's own list so I can plot them against eachother. For example, I'm plotting row 2 as the x axis and row 1 as the y
    for row in range(0, len(dataList)-1):
        sepalLengthCM.append(dataList[row][0])
        sepalWidthCM.append(dataList[row][1])
        petalLengthCM.append(dataList[row][2])
        petalWidthCM.append(dataList[row][3])
        species.append(dataList[row][4])

    #placing each column into a list of 'options' that the user can choose from.
    optionsList = [sepalLengthCM, sepalWidthCM, petalLengthCM, petalWidthCM]
    #using the indexes of the options list to plot the scatter plot. It works, but without distinction among species
    plt.scatter(optionsList[colX],optionsList[colY])
    plt.show()


plot_data("iris.csv",2,1)

How do I go about telling python to look at that fourth column? I have separated the species name into its own list but I don't that it is of any use to me here. I know how to plot columns but I can't figure out how to categorize the rows.



Related Questions



wrong order in (matplotlib.pyplot) scatter plot axis

Updated November 15, 2017 17:26 PM


pyplot - cannot draw dotted line

Updated April 28, 2015 00:11 AM