1. 报错详情¶
现象:graph.view()展示的图形显示中文为乱码。
In [40]:
from sklearn import tree
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
wine = load_wine()
Xtrain, Xtest, Ytrain, Ytest = train_test_split(wine.data,wine.target,test_size=0.3)
clf = tree.DecisionTreeClassifier(criterion="entropy")
clf = clf.fit(Xtrain, Ytrain)
score = clf.score(Xtest, Ytest)
feature_name = ['酒精','苹果酸','灰','灰的碱性','镁','总酚','类黄酮','非黄烷类酚类','花青素','颜色强度','色调','od280/od315稀释葡萄酒','脯氨酸']
In [41]:
import graphviz
dot_data = tree.export_graphviz(clf
,out_file = None
,feature_names = feature_name
,class_names=["琴酒","雪莉","贝尔摩德"]
,filled=True
,rounded=True
)
graph = graphviz.Source(dot_data)
graph.view()
Out[41]:
'Source.gv.pdf'
2 解决原理¶
修改编码方式为UTF-8,替换字体为仿宋。
3 解决方案¶
(1)解决方法一:
In [42]:
import graphviz
dot_data = tree.export_graphviz(clf
,out_file = 'tree.dot'
,feature_names = feature_name
,class_names=["琴酒","雪莉","贝尔摩德"]
,filled=True
,rounded=True
)
with open("tree.dot",encoding='utf-8') as f:
dot_graph = f.read()
graph=graphviz.Source(dot_graph.replace("helvetica","FangSong"))
graph.view()
Out[42]:
'Source.gv.pdf'
(2)解决方法二:
In [43]:
import graphviz
dot_data = tree.export_graphviz(clf
,out_file = None
,feature_names = feature_name
,class_names=["琴酒","雪莉","贝尔摩德"]
,filled=True
,rounded=True
)
graph = graphviz.Source(dot_data.replace("helvetica","FangSong").encode(encoding='utf-8'))
graph.view()
Out[43]:
'Source.gv.pdf'
两者原理是相同的,根据是否需要输出dot文件可选择使用方式。