MyRocks引擎:入坑须知

Stella981
• 阅读 759

导读

译者魏新平

知数堂第5期MySQL实战班学员,第10期MySQL优化班学员,现任职助教。

原文出处:****https://www.percona.com/blog/2018/02/01/myrocks-engine-things-know-start/

原文作者:****Vadim Tkachenko

   我们a 最近发布了Percona Server with MyRocks的GA版本。本文会让你了解到为什么Facebook要在生产环境使用MyRocks了。如果你使用Percona repositories ,你能够简单的安装MyRocks插件并且用ps-admin --enable-rocksdb来启动它。

Percona recently released Percona Server with MyRocks as GA. You can see how Facebook explains wins they see in production with MyRocks. Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb.

将它和典型的InnoDB进行比较时,存在一些主要和次要区别,我想在此强调一下。第一个主要的区别是MyRocks (based on RocksDB) 使用Log Structured Merge Tree数据结构,不是InnoDB的B+ tree数据结构。

There are some major and minor differences when comparing it to typical InnoDB deployments, and I want to highlight them here. The first important difference is that MyRocks (based on RocksDB) uses Log Structured Merge Tree data structure, not a B+ tree like InnoDB.

你能够在我发布在DZone的文章 当中了解到更多关于LSM引擎的信息 。总的来说,LSM引擎更适合写密集型的应用场景,读取速度可能会比较慢,全表扫描对于引擎来说负担会太重。当使用MyRocks作为应用底层时,需要特别注意这一点。MyRocks 不是加强版的InnoDB,也不能在所有应用场景下替换InnoDB。他有自己的优势/局限,就像InnoDB一样,你需要根据你数据的存取模式来选择使用哪一个引擎。

You learn more about the LSM engine in my article for DZone.The summary is that an LSM data structure is good for write-intensive workloads, with the expense that reads might slow down (both point reads and especially range reads) and full table scans might be too heavy for the engine. This is important to keep in mind when designing applications for MyRocks. MyRocks is not an enhanced InnoDB, nor a one-size-fits-all replacement for InnoDB. It has its own pros/cons just like InnoDB. You need to decide which engine to use based on your applications data access patterns.

还有什么其他需要注意的区别的吗?

What other differences should you be aware of?

让我们看一下目录结构。当前,所有的表和数据库都是存储在mysqldir的.rocksdb隐藏目录当中。名字和地址可以改变,但是所有的数据库当中的所有表还是存储在一系列的.sst文件当中,没有per-table / per-database的区分。

Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation.

默认情况下,MyRocks 使用LZ4来压缩所有的表。能够通过改变rocksdb_default_cf_options当中的变量来改变压缩的设置。默认值为compression=kLZ4Compression;bottommost_compression=kLZ4Compression。我们选择 LZ4,是因为它在很小的cpu负载下提供了可接受的压缩比。其他的压缩方式包括Zlib 和 ZSTD,或者直接不压缩。你能够在Peter和我的文章当中学习到更多关于压缩比VS速度 的信息。为了比较装载了来自我自制路由器软件的流量统计数据的MyRocks表的物理大小,我使用了为pmacct收集器软件创建的下表。

By default in Percona Server for MySQL, MyRocks will use LZ4 compression for all tables. You can change compression settings by changing the rocksdb_default_cf_options server variable. By default it set to compression=kLZ4Compression;bottommost_compression=kLZ4Compression. We chose LZ4 compression as it provides acceptable compression level with very little CPU overhead. Other possible compression methods are Zlib and ZSTD, or no compression at all. You can learn more about compression ratio vs. speed in Peter’s and my post.To compare the data size of a MyRocks table loaded with traffic statistic data from my homebrew router, I’ve used the following table created for pmacct collector:

CREATE TABLE `acct_v9` (
  `tag` int(4) unsigned NOT NULL,
  `class_id` char(16) NOT NULL,
  `class` varchar(255) DEFAULT NULL,
  `mac_src` char(17) NOT NULL,
  `mac_dst` char(17) NOT NULL,
  `vlan` int(2) unsigned NOT NULL,
  `as_src` int(4) unsigned NOT NULL,
  `as_dst` int(4) unsigned NOT NULL,
  `ip_src` char(15) NOT NULL,
  `ip_dst` char(15) NOT NULL,
  `port_src` int(2) unsigned NOT NULL,
  `port_dst` int(2) unsigned NOT NULL,
  `tcp_flags` int(4) unsigned NOT NULL,
  `ip_proto` char(6) NOT NULL,
  `tos` int(4) unsigned NOT NULL,
  `packets` int(10) unsigned NOT NULL,
  `bytes` bigint(20) unsigned NOT NULL,
  `flows` int(10) unsigned NOT NULL,
  `stamp_inserted` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=ROCKSDB AUTO_INCREMENT=20127562

正如你所看见的,表中有大概2000万条数据。MyRocks (用默认的 LZ4 压缩)使用了828MB。 InnoDB (默认,未压缩) 使用了3760MB。

as you can see, there are about 20mln records in this table. MyRocks (with default LZ4 compression) uses 828MB. InnoDB (uncompressed) uses 3760MB.

你能够在 .rocksdb 目录的LOG文件当中找到RocksDB 实例的详细信息。查看这些日志,可以进行更详细的诊断。你也能够使用SHOW ENGINE ROCKSDB STATUS命令,但是这会比SHOW ENGINE INNODB STATUS返回的内容更复杂,需要消耗大量的精力和时间去理解。

You can find very verbose information about your RocksDB instance in the LOG file located in .rocksdb directory. Check this file for more diagnostics. You can also try the SHOW ENGINE ROCKSDB STATUS command, but it is even more cryptic than SHOW ENGINE INNODB STATUS. It takes time to parse and to understand it.

注意,现在MyRocks只支持 READ-COMMITTED 隔离级别。并没有 REPEATABLE-READ 隔离级别,也没有像InnoDB里一样的gap锁。理论上, RocksDB 只支持 SNAPSHOT 隔离级别。然而,MySQL 当中并没有SNAPSHOT 隔离级别的概念,所以我们没有实现特殊的语法去支持。如果你对这个感兴趣,请联系我们。

Keep in mind that at this time MyRocks supports only READ-COMMITTED isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this.

当你试图加载大量的数据到MyRocks 当中时,你可能会遇到问题(不幸的是这个可能是你使用MyRocks 时的首次工作,当你使用LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table 或者 ALTER TABLE innodb_table ENGINE=ROCKSDB)。假如你的表太大,并且你没有足够的内存,RocksDB 就会崩溃。在生产环境中,你应该为你加载数据的session设置rocksdb_bulk_load=1。了解更多请查看文章:https://github.com/facebook/mysql-5.6/wiki/data-loading。

For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data. See more on this page: https://github.com/facebook/mysql-5.6/wiki/data-loading.

在MyRocks中的Block cache有点类似于innodb_buffer_pool_size,但是对于MyRocks它主要有利于读取数据。您可能需要调整rocksdb_block_cache_size设置。另外,它默认使用buffered reads,在这种情况下,操作系统的cache缓存着压缩的数据,而RockDB block cache 会缓存未压缩的数据。你可以保持这种两层的缓存机制,或者你可以修改rocksdb_use_direct_reads=ON关闭缓存,强制block cache直接读取。LSM树的本质要求当一层变满时,有一个合并过程将压缩数据推到下一层。这个过程可能相当密集并会影响用户查询速度。可以将其调整为不那么密集。

Block cache in MyRocks is somewhat similar to innodb_buffer_pool_size, however for MyRocks it’s mainly beneficial for reads. You may want to tune the rocksdb_block_cache_size setting. Also keep in mind it uses buffered reads by default, and in this case the OS cache contains cached compressed data and RockDB block cache will contain uncompressed data. You may keep this setup to have two levels of cache, or you can disable buffering by forcing block cache to use direct reads with rocksdb_use_direct_reads=ON. The nature of LSM trees requires that when a level becomes full, there is a merge process that pushes compacted data to the next level. This process can be quite intensive and affect user queries. It is possible to tune it to be less intensive.

现在还没有类似于Percona XtraBackup一样的热备软件来执行MyRocks表的热备份(我们正在研究这个)。你可以使用mysqldump 来进行逻辑备份,或者使用文件系统层面的snapshots ,比如LVM 或 ZFS 。

Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS.

在我们的官方文档当中,你可以了解到更多关于MyRocks 的优势和局限性.

You can find more MyRocks specifics and limitations in our docs at https://www.percona.com/doc/percona-server/LATEST/myrocks/limitations.html.

我们期待大家的反馈 。

We are looking for feedback on your MyRocks experience!

##更新(2018-02-12) 在获得Facebook MyRocks 团队的反馈后,我对原来的文章进行了更新。

UPDATES (12-Feb-2018) Updates to the original post with the feedback provided by Facebook MyRocks team

  1. 隔离级别 MyRocks 支持READ COMMITTED 和 REPEATABLE READ隔离级别,不支持 SERIALIZABLE。想了解更详细的信息可以阅读https://github.com/facebook/mysql-5.6/wiki/Transaction-Isolation。MyRocks 实现REPETABLE READ的方法和InnoDB不一样 — MyRocks 使用类似PostgreSQL 的snapshot isolation。 在Percona server 中,不允许在MyRocks 表上使用 REPEATABLE READ 隔离级别,因为REPEATABLE READ 隔离级别在innodb和myrocks db上的处理方式不一样。
  1. Isolation Levels MyRocks supports READ COMMITTED and REPEATABLE READ. MyRocks does not support SERIALIZABLE. Please read https://github.com/facebook/mysql-5.6/wiki/Transaction-Isolation for details. The way to implement REPETABLE READ was different from MyRocks and InnoDB — MyRocks used PostgreSQL style snapshot isolation. In Percona Server we do not allow REPEATABLE READ for MyRocks tables, as the behavior will be different from InnoDB.
  1. 在线二进制备份工具 网上有一个开源的在线二进制备份工具—myrocks_hotabackup:https://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/scripts/myrocks\_hotbackup
  1. Online Binary Backup Tool There is an open source online binary backup tool for MyRocks — myrocks_hotabackuphttps://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/scripts/myrocks\_hotbackup

扫码加入知数堂技术交流QQ群

(群号:****579036588)

群内可@各位助教了解更多课程信息

MyRocks引擎:入坑须知


MyRocks引擎:入坑须知 MyRocks引擎:入坑须知 MyRocks引擎:入坑须知 MyRocks引擎:入坑须知

知数堂

叶金荣与吴炳锡联合打造

领跑IT精英培训

行业资深专家强强联合,倾心定制

MySQL实战/MySQL优化/大数据实战 / Python/ SQL优化

数门精品课程

紧随技术发展趋势,定期优化培训教案

融入大量生产案例,贴合企业一线需求

社群陪伴学习,一次报名,可学1年

DBA、开发工程师必修课

上千位学员已华丽转身,薪资翻番,职位提升

改变已悄然发生,你还在等什么?

MyRocks引擎:入坑须知

扫码下载知数堂精品课程试听视频

或点击“阅读原文”直达下载地址

(MySQL 实战/优化、大数据实战、Python开发,及SQL优化等课程)

密码:hg3h

MyRocks引擎:入坑须知

MyRocks引擎:入坑须知

MyRocks引擎:入坑须知

本文分享自微信公众号 - 老叶茶馆(iMySQL_WX)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
Wesley13 Wesley13
3年前
SQL自动化审核初体验
导读作者:蓝剑锋TCL高级DBA,MySQL&OracleOCP知数堂MySQL实战/优化班第12期学员微信:lanjian106103个人公众号:DBARUN一、缘起2015.12.5,广州,ACOUGAsiaTour广州站第一次参加ACOUG论坛,会上
待兔 待兔
6个月前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Wesley13 Wesley13
3年前
MySQL DBA的工作日常
导读作者:田帅萌(邮箱:tplinux@163.com,欢迎交流)知数堂MySQLDBA班第9期优秀学员,Python运维开发班第5期学员,现任职知数堂助教又到了一年换坑最频繁的季节,很多童鞋在跑路(当然不是删库跑路!(https://oscimg.oschina.net/oscnet/64201376c910468
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Stella981 Stella981
3年前
ProxySQL新版本对MGR的原生支持
导读译者:张锐志(微信号:516160507),知数堂第10期MySQL学员原文出处:http://lefred.be/content/mysqlgroupreplicationnativesupportinproxysql/原文作者:lefred
Wesley13 Wesley13
3年前
MySQL自动化平台那些事
导读作者:田帅萌(邮箱:tplinux@163.com,欢迎交流)知数堂MySQLDBA班第9期优秀学员,Python运维开发班第5期学员,现任职知数堂助教承接上文《构建MySQL自动化平台思路》(https://www.oschina.net/action/GoToLink?urlhttps%3A%2F%2Fmp.w
Wesley13 Wesley13
3年前
MHA选择主库源码解析
导读作者:魏新平知数堂第5期MySQL实战班学员,第10期MySQL优化班学员,现任职助教。MHA在选择新的主库之前,会先把活着的slave分为几个数组,分别为latest(最靠前的slave数组),pref(优先被选择为master的数组),bad(不会被选择成为master的slave),slave
Wesley13 Wesley13
3年前
MySQL的四种GROUP BY用法
导读译者:魏新平知数堂第5期MySQL实战班学员,第10期MySQL优化班学员,现任职助教。原文出处:https://www.percona.com/blog/2018/02/05/fourwaystoexecutemysqlgroupby/原文作者:
Python进阶者 Python进阶者
1年前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这