Saturday 9 February 2013

Let's keep moving (will the puns ever end?)

So, how is my first Adventure in Moveland going?

The solution for my classes went according to planned, which was nice. SessionHandle, being the class that holds LIBSSH2_SESSION*, found itself suddenly surrounded by many new friends:

friend class SessionConnection;
friend class RemoteSessionHandle;
friend class SessionAuthenticator;
friend class RemoteChannelHandle;


SessionHandle::GetSession(), which provides access to said LIBSSH2_SESSION*, went from public to private. And both SessionHandle and RemoteSessionHandle (which also gained SessionConnection as friend) now have private Reseat() functions, to allow SessionConnection to reseat the pointers on move:

SessionConnection::SessionConnection(SessionConnection&& other) :
    reportStatus(move(other.reportStatus)), host(move(other.host)), 
    port(move(other.port)), user(move(other.user)), pwd(move(other.pwd)), 
    ios(move(other.ios)), sock(move(other.sock)), 
    sessionHandle(move(other.sessionHandle)), 
    remoteSessionHandle(move(other.remoteSessionHandle)),
    sessionAuthenticator(move(other.sessionAuthenticator))
{
    sessionHandle.Reseat(reportStatus);
    remoteSessionHandle.Reseat(reportStatus, sock, sessionHandle);
}


So, everything went smoothly, right? Well, not quite. When I tested my move ctor, I was still getting a SegFault. It was now happening during the socket's dtor, more exactly here (boost's scoped_lock.hpp):

// Constructor acquires the lock.
scoped_lock(Mutex& m)
  : mutex_(m)
{
  mutex_.lock();
  ...
}


So, it looked like I was still going to learn something more about Boost.Asio. I began tracing the socket's dtor. Here's boost::asio::basic_stream_socket's definition:

template <typename Protocol,
    typename StreamSocketService = stream_socket_service<Protocol> >
class basic_stream_socket
  : public basic_socket<Protocol, StreamSocketService>


Notice the stream_socket_service<>? Let's take a look at it (relevant info only):

template <typename Protocol>
class stream_socket_service
#if defined(GENERATING_DOCUMENTATION)
  : public boost::asio::io_service::service
#else
  : public boost::asio::detail::service_base<stream_socket_service<Protocol> >
#endif
{ 

... 
  // The type of the platform-specific implementation.
#if defined(BOOST_ASIO_HAS_IOCP)
  typedef detail::win_iocp_socket_service<Protocol> service_impl_type;
#else
  typedef detail::reactive_socket_service<Protocol> service_impl_type;
#endif
...
  // The platform-specific implementation.
  service_impl_type service_impl_;
};


Since we're on Windows and have BOOST_ASIO_HAS_IOCP #defined, let's take a look at win_iocp_socket_service:

template <typename Protocol> 
class win_iocp_socket_service : public win_iocp_socket_service_base


There is actually little of interest here, so we move on to win_iocp_socket_service_base:

class win_iocp_socket_service_base
{
...
// Mutex to protect access to the linked list of implementations.
boost::asio::detail::mutex mutex_;

...
};


Holy Shmoly, Duck Dodgers! It has a mutex! And it's neither a reference nor a pointer!

So, what does it hold? In our case (Windows), it's an instance of boost::asio::detail::win_mutex, and it encapsulates an OS critical section object. And it performs RAII, too; win_mutex's dtor looks like this:

~win_mutex()
{
  ::DeleteCriticalSection(&crit_section_);
}


We can read here what DeleteCriticalSection() does, but this is the important bit:

After a critical section object has been deleted, do not reference the object in any function that operates on critical sections (such as EnterCriticalSection, TryEnterCriticalSection, and LeaveCriticalSection) other than InitializeCriticalSection and InitializeCriticalSectionAndSpinCount. If you attempt to do so, memory corruption and other unexpected errors can occur.

Memory corruption and other unexpected errors? Check, eager young space cadet.

So, basically, even though Boost.Asio's socket is moveable (according to std::is_move_constructible<> and std::is_move_assignable<>), it may hold objects that aren't. In this case, we were left with two instances of win_iocp_socket_service_base, each with its instance of win_mutex that pointed to the same OS critical section object.

When the first socket ("moved to") is destroyed, ~win_mutex() is called, and the OS is told to get rid of that pesky critical section object pointed by crit_section_. Which it dutifully does.

Then, the second socket ("moved from") is destroyed. During which, this is called:

void win_iocp_socket_service_base::destroy( 
  win_iocp_socket_service_base::base_implementation_type& impl)
{
  ...
  // Remove implementation from linked list of all implementations.
  boost::asio::detail::mutex::scoped_lock lock(mutex_);
  ...


Obviously, you can guess what mutex_ is holding, right? A lovely pointer to an ex-pesky ex-critical (and probably ex-)section. So when scoped_lock's ctor does this: mutex_.lock(), this gets called:

void lock()
{
  ::EnterCriticalSection(&crit_section_);
}


Windows looks at it and says "Why, how quaint! Didn't you just tell me, a few nano-seconds ago, to get rid of this? You're probably having memory problems. But I've got just the thing to solve that. Here, have a SegFault! Oh, and a nice day, too, of course". Or, as the duck said "And brother, when it disintegrates, it disintegrates".

So, now what? Well, I've decided to apply to socket the same treatment I gave io_service, namely, it has gone from tcp::socket sock to unique_ptr<tcp::socket> sock.

Next time, the final part of this moving adventure - adding move semantics to the channel classes. And deciding whether I want to allow move when a command is executing.

That won't be the end of it, naturally. I'll have to make this thread-safe, and then I'll have to revisit all this. But, one step at a time.

2 comments:

  1. It's been a while since you postet this. We just ran into the very same problem, however it doesn't always occur. We are happily moving our sockets around and on windows we observed segfaults immediately after std::move(socket). Did you file a bug for this back then? The issue and the win_mutex within win_iocp_socket_base still exists at least in 1.62 and according to the documentation (https://www.boost.org/doc/libs/1_71_0/doc/html/boost_asio/overview/cpp2011/move_objects.html) they still seem to support moving of sockets without any notice regarding OS limitations. Anyway, great and amusing article :)

    Cheers
    Daniel

    ReplyDelete
    Replies
    1. I just saw that win_iocp_socket_service_base (the one with the problematic win_mutex) is only a pointer member in the template-specialization of basic_io_object for move-capable compilers. See basic_io_object.hpp:234 (1.62). This way, moving a socket moves the pointer to the member of this service instance which should be ok. We probably have a different problem.

      Delete