背景:最近有一个算数据的脚本几天都没出数据,而且手动跑起来的时候服务器立马卡机。目标是优化!
[work(2)@dm02 19:02:21 ~]$ nohup python twitter_click_stat.py 2012-11-02 1 >/tmp/click.log 2>&1 &
[3] 25862
[work(3)@dm02 19:05:07 ~/GALAXY_RELEASE/release]$ kill 25862
3分钟吃力7G内存
脚本流程:
1.load mysql表全部数据,加到MAP中( 2000W行的数据)
2.查HIVE表,如果twitter_id在MAP中则(赋值或+=),不在则创建类并赋值
MYSQL 表结构:
create table t_walrus_click_stat(
twitter_id int(11) not null,
goods_id int(11) not null default 0,
oneday_click SMALLINT(10) unsigned not null default 0,
threeday_click MEDIUMINT(10) unsigned not null default 0,
sevenday_click MEDIUMINT(10) unsigned not null default 0,
all_click int(11) unsigned not null default 0,
oneday_incr smallint(10) comment "增量" not null default 0,
last_clicktime timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
primary key(twitter_id)
)ENGINE=Innodb DEFAULT CHARACTER SET utf8
优化点:
1.不把表都load进来:计算当天有点击的推,然后取这部分推的一天点击(计算增量)/所有
2.能不能在插入的时候用update 计算一天增量和全部点击 == 靠谱!! 技术???
MYSQL =====TEST
mysql> show create table test;| test | CREATE TABLE `test` (
`id` int(11) NOT NULL DEFAULT '0',
`name` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
1 row in set (0.08 sec)
mysql> alter table test add unique (`id`);
Query OK, 0 rows affected (0.74 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table test;
| test | CREATE TABLE `test` (
`id` int(11) NOT NULL DEFAULT '0',
`name` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
1 row in set (0.00 sec)
mysql> select * from test;
+----+-----------+
| id | name |
+----+-----------+
| 1 | wang |
| 2 | xia |
| 3 | wang-load |
| 4 | xia-load |
+----+-----------+
4 rows in set (0.00 sec)
mysql> alter table test add column `click` int(11) NOT NULL DEFAULT '0';
Query OK, 4 rows affected (0.44 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> update test set click=id;
Query OK, 4 rows affected (0.04 sec)
Rows matched: 4 Changed: 4 Warnings: 0
mysql> select * from test;
+----+-----------+-------+
| id | name | click |
+----+-----------+-------+
| 1 | wang | 1 |
| 2 | xia | 2 |
| 3 | wang-load | 3 |
| 4 | xia-load | 4 |
+----+-----------+-------+
4 rows in set (0.00 sec)
mysql> insert into test(id,name,click) values (1,'wang',101) ON DUPLICATE KEY UPDATE click=click+values(click);Query OK, 2 rows affected (0.00 sec)
mysql> select * from test;
+----+-----------+-------+
| id | name | click |
+----+-----------+-------+
| 1 | wang | 102 |
| 2 | xia | 2 |
| 3 | wang-load | 3 |
| 4 | xia-load | 4 |
+----+-----------+-------+
mysql> alter table test add column `onn_day` int(11) NOT NULL DEFAULT '100';
Query OK, 4 rows affected (0.29 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from test;
+----+-----------+-------+---------+
| id | name | click | onn_day |
+----+-----------+-------+---------+
| 1 | wang | 102 | 100 |
| 2 | xia | 2 | 100 |
| 3 | wang-load | 3 | 100 |
| 4 | xia-load | 4 | 100 |
+----+-----------+-------+---------+
mysql> insert into test(id,name,click,onn_day) values (1,'wang',100,150) ON DUPLICATE KEY UPDATE click=click+values(click); == update 指定的字段才会更新
Query OK, 2 rows affected (0.07 sec)
mysql> select * from test;
+----+-----------+-------+---------+
| id | name | click | onn_day |
+----+-----------+-------+---------+
| 1 | wang | 202 | 100 |
| 2 | xia | 2 | 100 |
| 3 | wang-load | 3 | 100 |
| 4 | xia-load | 4 | 100 |
+----+-----------+-------+---------+
4 rows in set (0.00 sec)
mysql> insert into test(id,name,click,onn_day) values (1,'wang',50,0) ON DUPLICATE KEY UPDATE click=click+values(click),onn_day=onn_day-values(click); == 好像是可以了
Query OK, 2 rows affected (0.04 sec)
mysql> select * from test;+----+-----------+-------+---------+
| id | name | click | onn_day |
+----+-----------+-------+---------+
| 1 | wang | 252 | 50 |
| 2 | xia | 2 | 100 |
| 3 | wang-load | 3 | 100 |
| 4 | xia-load | 4 | 100 |
+----+-----------+-------+---------+
4 rows in set (0.00 sec)
mysql>
替换成自己用的:
insert into t_walrus_click_stat(twitter_id,goods_id,oneday_click,threeday_click,sevenday_click,all_c lick,oneday_incr) values (),(),(),() ON DUPLICATE KEY UPDATE oneday_click=values(oneday_click),threeday_click=values(threeday_click ),sevenday_click =values(sevenday_click ),all_click=all_click+values(oneday_click),oneday_incr=values(oneday_click)-oneday_click;
mysql> select * from t_walrus_click_stat;
twitter_id | goods_id | oneday_click | threeday_click | sevenday_click | all_click | oneday_incr | last_clicktime |
| 493925040 | 0 | 9286 | 21348 | 43676 | 67796 | 1003 | 2012-11-07 14:35:53 |
| 499177366 | 0 | 10134 | 23417 | 46144 | 75628 | 1382 | 2012-11-07 14:35:53 |
| 518431643 | 0 | 2099 | 6530 | 16786 | 66398 | 356 | 2012-11-07 14:35:53 |
| 521260243 | 0 | 9282 | 25611 | 48083 | 80877 | -2683 | 2012-11-07 14:35:53 |
| 522773775 | 0 | 4652 | 20168 | 42072 | 67077 | -3707 | 2012-11-07 14:35:53 |
| 527912329 | 0 | 4532 | 14810 | 39302 | 76832 | -286 | 2012-11-07 14:35:53 |
| 536369413 | 0 | 3602 | 13124 | 29901 | 79792 | -3322 | 2012-11-07 14:35:53 |
| 538737288 | 0 | 4800 | 13537 | 38469 | 83509 | 1122 | 2012-11-07 14:35:53 |
| 546735988 | 0 | 11573 | 35490 | 78441 | 115983 | 709 | 2012-11-07 14:35:53 |
| 547904288 | 0 | 8029 | 18536 | 45436 | 78306 | 4145 | 2012-11-07 14:35:53 |
+------------+----------+--------------+----------------+----------------+-----------+-------------+---------------------+
10 rows in set (0.00 sec)
mysql> insert into t_walrus_click_stat(twitter_id,goods_id,oneday_click,threeday_click,sevenday_click,all_click,oneday_incr) values (493925040,0,1000,25000,53000,0,0),(499177366,0,1000,25000,53000,0,0),(547904289,0,100,100,100,0,0) ON DUPLICATE KEY UPDATE oneday_click=values(oneday_click),threeday_click=values(threeday_click ),sevenday_click =values(sevenday_click ),all_click=all_click+values(oneday_click),oneday_incr=oneday_click-values(oneday_click);Query OK, 5 rows affected (0.08 sec)
Records: 3 Duplicates: 2 Warnings: 0
mysql> select * from t_walrus_click_stat;twitter_id | goods_id | oneday_click | threeday_click | sevenday_click | all_click | oneday_incr | last_clicktime |+------------+----------+--------------+----------------+----------------+-----------+-------------+---------------------+
| 493925040 | 0 | 1000 | 25000 | 53000 | 68796 | 0 | 2012-11-07 14:35:53 |
| 499177366 | 0 | 1000 | 25000 | 53000 | 76628 | 0 | 2012-11-07 14:35:53 |
| 518431643 | 0 | 2099 | 6530 | 16786 | 66398 | 356 | 2012-11-07 14:35:53 |
| 521260243 | 0 | 9282 | 25611 | 48083 | 80877 | -2683 | 2012-11-07 14:35:53 |
| 522773775 | 0 | 4652 | 20168 | 42072 | 67077 | -3707 | 2012-11-07 14:35:53 |
| 527912329 | 0 | 4532 | 14810 | 39302 | 76832 | -286 | 2012-11-07 14:35:53 |
| 536369413 | 0 | 3602 | 13124 | 29901 | 79792 | -3322 | 2012-11-07 14:35:53 |
| 538737288 | 0 | 4800 | 13537 | 38469 | 83509 | 1122 | 2012-11-07 14:35:53 |
| 546735988 | 0 | 11573 | 35490 | 78441 | 115983 | 709 | 2012-11-07 14:35:53 |
| 547904288 | 0 | 8029 | 18536 | 45436 | 78306 | 4145 | 2012-11-07 14:35:53 |
| 547904289 | 0 | 100 | 100 | 100 | 0 | 0 | 2012-11-07 14:39:10 |
+------------+----------+--------------+----------------+----------------+-----------+-------------+---------------------+
11 rows in set (0.00 sec)
2个问题:
1.其他的字段都OK,只有oneday_incr 为0。怀疑是 UPDATE oneday_click=values(oneday_click),threeday_click=values(threeday_click ),sevenday_click =values(sevenday_click ),all_click=all_click+values(oneday_click),oneday_incr=oneday_click-values(oneday_click); 先oneday_click=values(oneday_click)赋值了,所有再相减必然是0
2. 对应以前表里没有的数据所有字段都要在SQL中补全。all_click | oneday_incr 是 sevenday_click,oneday_click,
insert into t_walrus_click_stat(twitter_id,goods_id,oneday_click,threeday_click,sevenday_click,all_click,oneday_incr) values (493925040,0,10000,25000,53000,0,0),(499177366,0,10000,25000,53000,0,0),(547904289,0,100,100,100,0,0) ON DUPLICATE KEY UPDATE oneday_incr=cast(cast(values(oneday_click) AS SIGNED)-cast(oneday_click AS SIGNED) AS SIGNED),oneday_click=values(oneday_click),threeday_click=values(threeday_click ),sevenday_click =values(sevenday_click ),all_click=all_click+values(oneday_click);
Query OK, 5 rows affected (0.00 sec)
Records: 3 Duplicates: 2 Warnings: 0
mysql>
mysql> select * from t_walrus_click_stat;+------------+----------+--------------+----------------+----------------+-----------+-------------+---------------------+| twitter_id | goods_id | oneday_click | threeday_click | sevenday_click | all_click | oneday_incr | last_clicktime |+------------+----------+--
| 493925040 | 0 | 10000 | 25000 | 53000 | 77796 | 714 | 2012-11-07 14:41:46 |
| 499177366 | 0 | 10000 | 25000 | 53000 | 85628 | -134 | 2012-11-07 14:41:46 |
| 518431643 | 0 | 2099 | 6530 | 16786 | 66398 | 356 | 2012-11-07 14:41:46 |
| 521260243 | 0 | 9282 | 25611 | 48083 | 80877 | -2683 | 2012-11-07 14:41:46 |
| 522773775 | 0 | 4652 | 20168 | 42072 | 67077 | -3707 | 2012-11-07 14:41:46 |
| 527912329 | 0 | 4532 | 14810 | 39302 | 76832 | -286 | 2012-11-07 14:41:46 |
| 536369413 | 0 | 3602 | 13124 | 29901 | 79792 | -3322 | 2012-11-07 14:41:46 |
| 538737288 | 0 | 4800 | 13537 | 38469 | 83509 | 1122 | 2012-11-07 14:41:46 |
| 546735988 | 0 | 11573 | 35490 | 78441 | 115983 | 709 | 2012-11-07 14:41:46 |
| 547904288 | 0 | 8029 | 18536 | 45436 | 78306 | 4145 | 2012-11-07 14:41:46 |
| 547904289 | 0 | 100 | 100 | 100 | 0 | 0 | 2012-11-07 16:50:49 |
+------------+----------+--------------+----------------+----------------+-----------+-------------+---------------------+
11 rows in set (0.03 sec)
OK。搞定!!