tensorflow练习：鸢尾花种类预测，加州房价预测

## Project description

# 说明文档
`iris.py`是鸢尾花预测模型
`cali_house.py`是加州房价预测模型

## 鸢尾花预测模型
### 1、数据处理

120,4,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
. . . . .
. . . . .
. . . . .
4.4,2.9,1.4,0.2,0
4.8,3.0,1.4,0.1,0
5.5,2.4,3.7,1.0,1

### 2、建立模型

W = tf.Variable(tf.zeros([4, 3]))
b = tf.Variable(tf.zeros([3]) + 0.01)
output = tf.nn.softmax(tf.matmul(xs, W) + b)

loss = -tf.reduce_sum(ys * tf.log(output + 1e-10))

### 3、模型训练

for i in range(1000):
sess.run(train_step, feed_dict={xs: x_train, ys: y_train})
if i % 100 == 0:
print('Loss（train set）:%.2f' % (sess.run(loss, feed_dict={xs: x_train, ys: y_train})))

### 4、鸢尾花种类预测

access = tf.equal(tf.argmax(output, 1), tf.argmax(ys, 1))
accuracy = tf.reduce_mean(tf.cast(access, "float"))

### 5、结果

--------------------开始训练模型----------------
Loss（train set）:125.14
Loss（train set）:67.55
Loss（train set）:30.55
Loss（train set）:23.07
Loss（train set）:20.45
Loss（train set）:18.60
Loss（train set）:17.22
Loss（train set）:16.14
Loss（train set）:15.28
Loss（train set）:14.57
--------------------训练结束--------------------

********************性能评价********************

## 加州房价预测模型
### 1、数据预处理

import pandas as pd

print(features.describe())

>>>
total_bedrooms population households median_income
count 20433 20640 20640 20640
mean 537.870553 1425.476744 499.539680 3.870671
std 421.385070 1132.462122 382.329753 1.899822
min 1.000000 3.000000 1.000000 0.499900

nan = features.dropna(subset=['total_bedrooms'], axis=0) # 去除缺省值
repeat = nan.drop_duplicates() # 去掉重复值样本

### 2、特征工程

longitude latitude
count 20640 20640
mean -119.569704 35.631861
std 2.003532 2.135952
min -124.350000 32.540000
max -114.310000 41.950000

pd.cut(longitude, range(-125, -112), right=False)
pd.cut(latitude, range(31, 43), right=False)

def rooms_per_person(data): # 合成新特征：人均房间数 = 总房间数 / 总人数
rooms_per_person = data.apply(lambda x: x['total_rooms'] / x['population'], axis=1) # 计算特征值
rooms_per_person[np.abs(rooms_per_person) > 5] = 5 # 对异常值进行截断处理
rooms_per_person = rooms_per_person.rename('rooms_per_person') # 特征名称
return rooms_per_person

### 3、特征值归一化

households median_income
count 20640 20640
mean 499.539680 3.870671
std 382.329753 1.899822
min 1.000000 0.499900
25% 280.000000 2.563400
50% 409.000000 3.534800
75% 605.000000 4.743250
max 6082.000000 15.000100

normalized = （value - mean） / std

### 4、建立模型

W = tf.Variable(tf.zeros([142, 1]))
b = tf.Variable(tf.zeros([1]) + 0.01)
output = tf.matmul(xs, W) + b

loss = tf.reduce_mean(tf.square(ys - output))

### 5、训练模型

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(data[:, :-1], data[:, -1], test_size=0.25, random_state=2018)

for i in range(20000):
sess.run(train_step, feed_dict={xs: x_train, ys: y_train})
if i % 1000 == 0:
print('Loss（train set）:%.2f' % (sess.run(loss, feed_dict={xs: x_train, ys: y_train})))

### 6、房价预测

![result](house_value_prediction.PNG)