Stopbyte

Finding a percentage of instances in a column from a csv with python

So I have a csv here: https://uploadfiles.io/o5xm3 I am trying to use python to filter the column ‘pclass’ and print the percentage of each class (either ‘1’, ‘2’ or ‘3’) among all the passengers and then print the percentage of each class that did not survive (0=did not survive, 1=survived).

Here is my code so far (i am not going to use pandas for this exercise):

import csv
total, pclass = 0, 0

with open('titanic-new.csv', newline='') as csvfile:
for row in filter(lambda p: '1' ==p[1], csv.reader(csvfile, delimiter= ',')):
    total += 1
    if int(row[0]):
        pclass += 1

print('total: {}, pclass: {} ({:.2f}%)'.format(total, pclass,
                                         pclass/total*100))

input('press ENTER to exit')

My problem here is I am stuck as to how to tell the program that there are 3 different variables for the column ‘pclass’ those 3 being either ‘1’, ‘2’ or ‘3’. I then am unsure of how to get the program to print these 3 individually. Thank you for any help as I am just a beginner :slight_smile:

2 Likes

hmm… not sure why you’ve used the filter function, I will simply do it this way:

import csv
pclass = 0
valid_values = [1, 2, 3]
i = 1;

with open('titanic-new.csv', newline='') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter= ',')
    for row in csv_reader:
        if int(row[pclass]) in valid_values:
            i += 1
            print('row[pclass] at line #'+str(i))

The code snippet above should print something like this:

1 at line #1
2 at line #2
1 at line #3
3 at line #4
2 at line #5
3 at line #6

I used pclass variable (index 0) to refer to your pclass column. In case you want to look up the column by name then a bit of parsing will be needed for the first row.

3 Likes

Why is it that when I run it it just immediately closes? I added

input('press ENTER to exit') 

and it still closes immediately without doing anything?

1 Like

That’s interesting, did you try adding a try..except block around it:

try:
    csv_reader = csv.reader(csvfile, delimiter= ',')
    for row in csv_reader:
        if int(row[pclass]) in valid_values:
            i += 1
            print('row[pclass] at line #'+str(i))
except csv.Error as e:
    sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

Maybe that will help you find the root of the problem.