https://groups.google.com/forum/#!topic/seastar-dev/RuK-OajeqHk
g++/glibc combination provides a c++ runtime with non scaleable
c++ exception. This is dues to global locks in stack unwinding code.
To summarize all the locks we have now and their purpose:
1. There is a lock in _Unwind_Find_FDE (libgcc) that protects
list of FDEs registered with __register_frame* functions.
The catch is that dynamically linked binary do not do that,
so all that it protects is checking that a certain list is empty.
This lock no longer relevant in gcc7 since there is a path there
checks that list is empty outside of the lock.
2. The lock in dl_iterate_phdr (glibc) that protects loaded object
list against runtime object loading/unloading.
To get rid of the first lock one has to use gcc7.
To get rid of the second one we can use the fact that we do not
load/unload objects dynamically (at least for now). To do that we
can mirror all elf header information in seastar and provide our
own dl_iterate_phdr symbol which uses this mirror without locking.
Unfortunately there is another gotcha in this approach: dl_iterate_phdr
supplied by glibc never calls more then one callback simultaneously as an
unintended consequences of the lock there, but unfortunately libgcc relies
on that to maintain small cache of translations. The access to the cache is
not protected by any lock since up until now only one callback could have
run at a time. But luckily libgcc cannot use the cache if older version
of dl_phdr_info is provided to the callback because the older version
did not have an indication that loaded object list may have changed,
so libgcc does not know when cache should be invalidated and disables it
entirely. By calling the callback with old version of dl_phdr_info from
our dl_iterate_phdr we can effectively make libgcc callback thread safe.
diff --git a/configure.py b/configure.py
index facdd8f..33b310b 100755
--- a/configure.py
+++ b/configure.py
@@ -297,6 +297,8 @@ arg_parser.add_argument('--static-boost', dest = 'staticboost', action = 'store_
add_tristate(arg_parser, name = 'hwloc', dest = 'hwloc', help = 'hwloc support')
arg_parser.add_argument('--enable-gcc6-concepts', dest='gcc6_concepts', action='store_true', default=False,
help='enable experimental support for C++ Concepts as implemented in GCC 6')
+arg_parser.add_argument('--disable-exception-scalability-workaround', dest='exception_workaround', action='store_true', default=False,
+ help='disabling override of dl_iterate_phdr symbol to workaround C++ exception scalability issues')
args = arg_parser.parse_args()
libnet = [
@@ -337,6 +339,7 @@ core = [
'net/inet_address.cc',
'rpc/rpc.cc',
'rpc/lz4_compressor.cc',
+ 'core/exception_hacks.cc',
]
protobuf = [
@@ -392,6 +395,9 @@ if args.gcc6_concepts:
defines.append('HAVE_GCC6_CONCEPTS')
args.user_cflags += ' -fconcepts'
+if args.exception_workaround:
+ defines.append('NO_EXCEPTION_HACK')
+
if args.staticcxx:
libs = libs.replace('-lstdc++', '')
libs += ' -static-libgcc -static-libstdc++'
diff --git a/core/exception_hacks.cc b/core/exception_hacks.cc
new file mode 100644
index 0000000..6f04630
--- /dev/null
+++ b/core/exception_hacks.cc
@@ -0,0 +1,108 @@
+/*
+ * This file is open source software, licensed to you under the terms
+ * of the Apache License, Version 2.0 (the "License"). See the NOTICE file
+ * distributed with this work for additional information regarding copyright
+ * ownership. You may not use this file except in compliance with the License.
+ *
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+// The purpose of the hacks here is to workaround C++ exception scalability problem
+// with gcc and glibc. For the best result gcc-7 is required.
+//
+// To summarize all the locks we have now and their purpose:
+// 1. There is a lock in _Unwind_Find_FDE (libgcc) that protects
+// list of FDEs registered with __register_frame* functions.
+// The catch is that dynamically linked binary do not do that,
+// so all that it protects is checking that a certain list is empty.
+// This lock no longer relevant in gcc-7 since there is a path there
+// that checks that list is empty outside of the lock and it will be
+// always true for us.
+// 2. The lock in dl_iterate_phdr (glibc) that protects loaded object
+// list against runtime object loading/unloading.
+//
+// To get rid of the first lock using gcc-7 is required.
+//
+// To get rid of the second one we can use the fact that we do not
+// load/unload objects dynamically (at least for now). To do that we
+// can mirror all elf header information in seastar and provide our
+// own dl_iterate_phdr symbol which uses this mirror without locking.
+//
+// Unfortunately there is another gotcha in this approach: dl_iterate_phdr
+// supplied by glibc never calls more then one callback simultaneously as an
+// unintended consequences of the lock there, but unfortunately libgcc relies
+// on that to maintain small cache of translations. The access to the cache is
+// not protected by any lock since up until now only one callback could have
+// run at a time. But luckily libgcc cannot use the cache if older version
+// of dl_phdr_info is provided to the callback because the older version
+// did not have an indication that loaded object list may have changed,
+// so libgcc does not know when cache should be invalidated and disables it
+// entirely. By calling the callback with old version of dl_phdr_info from
+// our dl_iterate_phdr we can effectively make libgcc callback thread safe.
+
+#ifndef NO_EXCEPTION_HACK
+#include <link.h>
+#include <dlfcn.h>
+#include <assert.h>
+#include
+#include
+#include "exception_hacks.hh"
+
+namespace seastar {
+using dl_iterate_fn = int (*) (int (*callback) (struct dl_phdr_info *info, size_t size, void *data), void *data);
+
+static dl_iterate_fn dl_iterate_phdr_org() {
+ static dl_iterate_fn org = [] {
+ auto org = (dl_iterate_fn)dlsym (RTLD_NEXT, "dl_iterate_phdr");
+ assert(org);
+ return org;
+ }();
+ return org;
+}
+
+static std::vector<dl_phdr_info> phdrs_cache;
+
+void init_phdr_cache() {
+ // Fill out elf header cache for access without locking.
+ // This assumes no dynamic object loading/unloading after this point
+ dl_iterate_phdr_org()([] (struct dl_phdr_info *info, size_t size, void *data) {
+ phdrs_cache.push_back(*info);
+ return 0;
+ }, nullptr);
+}
+}
+
+extern "C"
+[[gnu::visibility("default")]]
+[[gnu::externally_visible]]
+int dl_iterate_phdr(int (*callback) (struct dl_phdr_info *info, size_t size, void *data), void *data) {
+ if (!seastar::phdrs_cache.size()) {
+ // Cache is not yet populated, pass through to original function
+ return seastar::dl_iterate_phdr_org()(callback, data);
+ }
+ int r = 0;
+ for (auto h : seastar::phdrs_cache) {
+ // Pass dl_phdr_info size that does not include dlpi_adds and dlpi_subs.
+ // This forces libgcc to disable caching which is not thread safe and
+ // requires dl_iterate_phdr to serialize calls to callback. Since we do
+ // not serialize here we have to disable caching.
+ r = callback(&h, offsetof(dl_phdr_info, dlpi_adds), data);
+ if (r) {
+ break;
+ }
+ }
+ return r;
+}
+#endif
diff --git a/core/exception_hacks.hh b/core/exception_hacks.hh
new file mode 100644
index 0000000..f5ff51f
--- /dev/null
+++ b/core/exception_hacks.hh
@@ -0,0 +1,24 @@
+/*
+ * This file is open source software, licensed to you under the terms
+ * of the Apache License, Version 2.0 (the "License"). See the NOTICE file
+ * distributed with this work for additional information regarding copyright
+ * ownership. You may not use this file except in compliance with the License.
+ *
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+namespace seastar {
+void init_phdr_cache();
+}
diff --git a/core/reactor.cc b/core/reactor.cc
index 2beef59..b4b4e8d 100644
--- a/core/reactor.cc
+++ b/core/reactor.cc
@@ -93,6 +93,7 @@
#include "util/defer.hh"
#include "core/metrics.hh"
#include "execution_stage.hh"
+#include "exception_hacks.hh"
namespace seastar {
@@ -3369,6 +3370,9 @@ smp::get_options_description()
("max-io-requests", bpo::value
#endif
("mbind", bpo::value
+#ifndef NO_EXCEPTION_HACK
+ ("enable-glibc-exception-scaling-workaround", bpo::value
+#endif
;
return opts;
}
@@ -3514,6 +3518,12 @@ static void sigabrt_action() noexcept {
void smp::configure(boost::program_options::variables_map configuration)
{
+#ifndef NO_EXCEPTION_HACK
+ if (configuration["enable-glibc-exception-scaling-workaround"].as
+ init_phdr_cache();
+ }
+#endif
+
// Mask most, to prevent threads (esp. dpdk helper threads)
// from servicing a signal. Individual reactors will unmask signals
// as they become prepared to handle them.
--
Gleb.