-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify socket passing in hot restart test #805
Conversation
@htuch for first pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/operations/cli.rst
Outdated
@@ -10,6 +10,10 @@ following are the command line options that Envoy supports. | |||
|
|||
*(required)* The path to the :ref:`JSON configuration file <config>`. | |||
|
|||
.. option:: -a <path string>, --admin-address-path <path string> | |||
|
|||
*(optional)* The output file path where the admin address will be written. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
admin address and port
include/envoy/server/instance.h
Outdated
/** | ||
* @return int the total number of listeners | ||
*/ | ||
virtual int numListeners() PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we strictly need this? A caller could just iterate incrementally on getListenSocketByIndex
until a nullptr is returned for the purposes we need in this PR.
@@ -17,6 +18,8 @@ class Factory { | |||
* Constructs a Json Object from a String. | |||
*/ | |||
static ObjectPtr LoadFromString(const std::string& json); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, looks like LoadFromString
and peers violate the coding style in their naming. A distinct cleanup PR might be nice to make this file consistent.
source/server/http/admin.cc
Outdated
for (int i = 0; i < server_.numListeners(); ++i) { | ||
listeners.push_back(server_.getListenSocketByIndex(i)->localAddress()->asString()); | ||
} | ||
response.add(fmt::format("{}", Json::Factory::listAsJsonString(listeners))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the fmt::format
here.
source/server/server.cc
Outdated
|
||
if (options.adminAddressPath().length() > 0) { | ||
std::ofstream admin_address_file(options.adminAddressPath()); | ||
ASSERT(admin_address_file.is_open()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be better error handling here - it's easy for the user to supply a bad path, somewhere not writable, etc. We should log().criticial
the failure and return false (add return code). The exception catching in the caller is too focused on the JSON case, there are other reasons we can fail here, so that should also be refactored to reflect that.
tools/socket_passing.py
Outdated
def ReplaceListenerAddresses(original_json, admin_address, no_port_change): | ||
# Get original listener addresses | ||
with open(original_json, 'r') as original_json_file: | ||
parsed_json = json.load(original_json_file, object_pairs_hook=OrderedDict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in thinking we're using OrderedDict here just so we get deterministic output? If so, maybe just add a comment.
tools/socket_passing.py
Outdated
admin_response = admin_conn.getresponse() | ||
|
||
if not admin_response.status == 200: | ||
admin_conn.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use a try: ... finally: admin_conn.close()
block here.
tools/socket_passing.py
Outdated
if no_port_change: | ||
return CheckNoChange(original_listeners, updated_listeners) | ||
else: | ||
for updated, original in zip(updated_listeners, original_listeners): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe s/updated_listeners/discovered_listeners/
tools/socket_passing.py
Outdated
return CheckNoChange(original_listeners, updated_listeners) | ||
else: | ||
for updated, original in zip(updated_listeners, original_listeners): | ||
original['address'] = "tcp://" + updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the style guide says we need to have consistent use of "" and ''. Basically, prefer '' over "" unless there is legacy in a file you want to be consistent with.
tools/socket_passing.py
Outdated
while not os.path.exists(admin_address_path): | ||
time.sleep(1) | ||
counter += 1 | ||
if counter > 20: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably document that this also waits for the admin address file to appear on the filesystem. Also, best not to use magic numbers like 20, make this a constant at the top of the file, e.g. ADMIN_FILE_WAIT_TIMEOUT_SECS
.
BTW, as mentioned in person, the really robust way to do this is with something like inotify, but that's not worth doing for a simple test like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking solid, almost ready for external review.
source/server/http/admin.cc
Outdated
Http::Code AdminImpl::handlerListenerAddresses(const std::string&, Buffer::Instance& response) { | ||
std::list<std::string> listeners; | ||
int listener_index = 0; | ||
while (server_.getListenSocketByIndex(listener_index) != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can write this as while (auto listen_socket = server_.getListenSocketByIndex(listener_index) ) { ...
source/server/http/admin.h
Outdated
@@ -118,6 +120,7 @@ class AdminImpl : public Admin, | |||
Http::Code handlerServerInfo(const std::string& url, Buffer::Instance& response); | |||
Http::Code handlerStats(const std::string& url, Buffer::Instance& response); | |||
Http::Code handlerQuitQuitQuit(const std::string& url, Buffer::Instance& response); | |||
Http::Code handlerListenerAddresses(const std::string& url, Buffer::Instance& response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to make this more generic, e.g. handlerListenerInfo
, to allow later extension for more than addresses.
test/common/json/json_loader_test.cc
Outdated
|
||
std::list<std::string> list4 = {"127.0.0.1:46465", "127.0.0.1:52211", "127.0.0.1:58941"}; | ||
EXPECT_STREQ("[\"127.0.0.1:46465\",\"127.0.0.1:52211\",\"127.0.0.1:58941\"]", | ||
Json::Factory::listAsJsonString(list4).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be more robust to ::listAsJsonString
and then parse it back with RapidJSON and make sure you get back an equivalent C++ std::list
. That way any changes to the RapidJSON output formatting (e.g. whitespace use) don't cause the test to fail.
tools/BUILD
Outdated
@@ -3,4 +3,5 @@ package(default_visibility = ["//visibility:public"]) | |||
exports_files([ | |||
"gen_git_sha.sh", | |||
"git_sha_rewriter.py", | |||
"socket_passing.py", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should now be declared with envoy_py_test_binary
(hot off the press, since #823, which is still inflight but will probably be merged before this PR).
test/integration/hotrestart_test.sh
Outdated
|
||
echo "Updating original config json listener addresses" | ||
UPDATED_HOT_RESTART_JSON="${TEST_TMPDIR}"/hot_restart_updated.json | ||
tools/socket_passing.py "-o" "${HOT_RESTART_JSON}" "-a" "${ADMIN_ADDRESS_PATH_0}" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ${TEST_RUNDIR}"/tools/socket_passing.py
since #819 (sorry about the churn!).
TestEnvironment::temporaryPath("some/unlikely/bad/path.prof"), | ||
Network::Utility::resolveUrl("tcp://127.0.0.1:0"), server_); | ||
AdminImpl admin_bad_profile_path( | ||
"/dev/null", TestEnvironment::temporaryPath("some/unlikely/bad/path.prof"), "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this points to making AdminImplOptions
something worth considering.
AdminImpl admin_bad_address_out_path( | ||
"/dev/null", cpu_profile_path_, bad_path, | ||
Network::Test::getSomeLoopbackAddress(Network::Address::IpVersion::v4), server_); | ||
EXPECT_FALSE(std::ifstream(bad_path)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to EXPECT on the log output? I'm not sure, but if it's easy to do it might be worth doing that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, I haven't found a way to check log outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger could be mocked and the log statement EXPECTed, but, the way the logger object is brought into existence in many places including in AdminImpl makes this difficult. It's super convenient to be able to inherit in any class from Logger::Loggable and just get the magic log() method but this kills the flexibility of being able to pass in a mocked logger. Some test code added to the logger could provide a special mode where you can check what's been logged, but probably this is nontrivial and best done as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also just make the error a PANIC() and then do a death test. I don't have a strong preference here.
tools/socket_passing.py
Outdated
# exisiting json config file with updated listener addresses. This script is | ||
# currently called in the hot restart integration test to update listener | ||
# addresses bound to port 0 in the intial json config file. With the | ||
# -n or -no_port_change option, this script does not update an exisiting json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-n
is now gone.
tools/socket_passing.py
Outdated
import sys | ||
import time | ||
|
||
ADMIN_FILE_TIMEOUT_SECS = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining what this means.
tools/socket_passing.py
Outdated
return False | ||
discovered_listeners = json.loads(admin_response.read()) | ||
except: | ||
sys.stderr.write('Cannot connect to admin.\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it's scary to do a wildcard exception catch in Python and then not propagate further. In this case, it means exceptions will be hard to debug as we lose info. You could do except e: \n sys.stderr.write('Cannot connect to admin: %s' % e)
or something like that. Or re-raise with
except e:
sys.stdout.write(...)
raise e
or create a custom exception class. Probably the first is fine for this test tool.
source/server/http/admin.h
Outdated
@@ -1,6 +1,7 @@ | |||
#pragma once | |||
|
|||
#include <chrono> | |||
#include <fstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: shouldn't this be in admin.cc only?
test/common/json/json_loader_test.cc
Outdated
std::vector<Json::ObjectPtr> output3 = | ||
Json::Factory::LoadFromString(Json::Factory::listAsJsonString(list3))->asObjectArray(); | ||
EXPECT_EQ(output3.size(), 4); | ||
EXPECT_STREQ(output3[0]->asString().c_str(), "one"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible to EXPECT_EQ
with std::string
? Might be cleaner than dropping to C strings.
std::vector<Json::ObjectPtr> listener_info = | ||
Json::Factory::LoadFromString(response->body())->asObjectArray(); | ||
for (std::size_t index = 0; index < listener_info.size(); index++) { | ||
EXPECT_STREQ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point on EXPECT_EQ
here and elsewhere.
tools/BUILD
Outdated
@@ -1,6 +1,15 @@ | |||
package(default_visibility = ["//visibility:public"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you switch this to envoy_package since #828, thanks!
tools/socket_passing.py
Outdated
import time | ||
|
||
# Seconds to wait for the admin address output file to appear. The script exits | ||
# if the file is not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "exits with failure"
tools/socket_passing.py
Outdated
sys.stderr.write('Cannot connect to admin: %s\n' % e) | ||
return False | ||
else: | ||
if not len(discovered_listeners) == len(original_listeners): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A != B
is clearer than not A == B
. The second is in danger of being read (not A) == B
, which isn't what it means, but the casual reader might easily make that mistake.
tools/socket_passing.py
Outdated
# if the file is not found. | ||
ADMIN_FILE_TIMEOUT_SECS = 20 | ||
|
||
def ReplaceListenerAddresses(original_json, admin_address, updated_json): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this name is now legacy, it should probably be PrintListenerAddresses
or something you think is better.
tools/socket_passing.py
Outdated
with open(admin_address_path, 'r') as admin_address_file: | ||
admin_address = admin_address_file.read() | ||
|
||
result = ReplaceListenerAddresses(args.original_json, admin_address, args.updated_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/result/success
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready for wider review. @mattklein123
tools/BUILD
Outdated
@@ -1,5 +1,6 @@ | |||
load( | |||
"//bazel:envoy_build_system.bzl", | |||
"envoy_py_test_binary", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: prefer alphabetical ordering here.
@hennna can we get this passing tests? If you want to wait until cmake removal that's fine. It should be soon. |
Nevermind looks like you fixed it. Will look in a bit. |
The fix is only for the alphabetical ordering comment by @htuch. I'll wait for cmake removal rather than add changes that will need to be reverted. |
OK let's just wait until this is passing tests then we can take a quick final pass through. |
@hennna please merge master. This should pass tests now. |
@mattklein123 updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, a few small comments.
source/server/http/admin.cc
Outdated
{"/stats", "print server stats", MAKE_HANDLER(handlerStats)}, | ||
{"/listeners", "print listener addresses", MAKE_HANDLER(handlerListenerInfo)}} { | ||
|
||
if (address_out_path.length() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: !address_out_path.empty()
docs/operations/cli.rst
Outdated
@@ -10,6 +10,10 @@ following are the command line options that Envoy supports. | |||
|
|||
*(required)* The path to the :ref:`JSON configuration file <config>`. | |||
|
|||
.. option:: -a <path string>, --admin-address-path <path string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop the -a short option. This is a very uncommon option and would rather it be fully explicit. (Obviously need to fix tclap stuff below).
test/integration/server.h
Outdated
@@ -28,12 +28,14 @@ namespace Server { | |||
*/ | |||
class TestOptionsImpl : public Options { | |||
public: | |||
TestOptionsImpl(const std::string& config_path) : config_path_(config_path) {} | |||
TestOptionsImpl(const std::string& config_path) | |||
: config_path_(config_path), admin_address_path_("") {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: admin_address_path_("") not needed, as its the default.
AdminImpl admin_bad_address_out_path( | ||
"/dev/null", cpu_profile_path_, bad_path, | ||
Network::Test::getSomeLoopbackAddress(Network::Address::IpVersion::v4), server_); | ||
EXPECT_FALSE(std::ifstream(bad_path)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also just make the error a PANIC() and then do a death test. I don't have a strong preference here.
I'll defer the log.critical vs. PANIC change to after the wider discussion on failure modes in the upcoming doc PR. There doesn't seem to be strong feelings for either option in this specific PR. |
We don't currently run the Kotlin linter on anything other than `hello_world`, but we should. This PR fixes all existing violations and enables the linter. Resolves envoyproxy/envoy-mobile#599. Signed-off-by: Michael Rebello <me@michaelrebello.com> Signed-off-by: JP Simard <jp@jpsim.com>
We don't currently run the Kotlin linter on anything other than `hello_world`, but we should. This PR fixes all existing violations and enables the linter. Resolves envoyproxy/envoy-mobile#599. Signed-off-by: Michael Rebello <me@michaelrebello.com> Signed-off-by: JP Simard <jp@jpsim.com>
In order to get updated admin and listener address information, the following changes were made:
Add an optional command line flag <-a> <--admin-address-path>. When set, the admin address will be written to the specified file.
Add a /listeners URL admin handler that returns all server listener addresses.
Write a python helper script that either updates the listener addresses in the original config file or checks that the server listener addresses have not changed after envoy startup.
Fixes #654.