本文所需数据下载地址——>点这里下载
众所周知,葡萄酒的价格是与其品质相关的,本实训根据表中提供的数据对包葡萄酒品质数进行了分析与处理。
变量名 | 含义 |
---|---|
fixed acidity | 固定酸度 |
volatile acidity | 挥发性酸度 |
citric acid | 柠檬酸 |
residual sugar | 剩余糖 |
chlorides | 氯化物 |
free sulfur dioxide | 游离二氧化碳 |
total sulfur dioxide | 总二氧化碳 |
density | 密度 |
PH | 值 |
sulphates | 酸碱盐 |
alcohol | 酒精 |
quality | 品质 |
部分数据:
1、读取数据
import csv
f = open("C:/Users/55/Desktop/qweee.csv","r")
reader = csv.reader(f,delimiter = ";")
data =[]
for row in reader:
data.append(row)
for i in range(5):
print(data[i])
f.close
2、处理数据
(1)查看葡萄酒中总共分为几种品质等级
qlist = []
for row in data[1:]:
qlist.append(int(row[-1]))######################3rewrefsd
qcount = set(qlist)
print("葡萄酒共有%d种等级,分别是:%r"%(len(qcount),qcount))
(2)按白葡萄酒等级将数据集分为7个子集,并统计每种等级的数量
content_dict = {}
for row in data[1:]:
quality = int(row[-1])
if quality not in content_dict.keys():
content_dict[quality] = [row]
else:
content_dict[quality].append(row)
for key in content_dict:
print(key,":",len(content_dict[key]))
(3)计算每个数据集中 fixed acidity的均值
mean_list = []
for key,value in content_dict.items():
sum = 0
for row in value:
sum += float(row[0])
mean_list.append((key, sum / len(value)))
for item in mean_list:
print(item[0],",", item[1])
将结果用图表的形式展现出来
import numpy as np
import matplotlib.pyplot as plt
key_1 = []
a = []
for key in content_dict:
key_1.append(key)
a.append(len(content_dict[key]))
plt.pie(x=a,labels= key_1)
plt.show()