asyncio: fix SSL connections by using native TLS transport#884
Conversation
c4b4b48 to
99e7245
Compare
|
@roydahan , please rebase |
Python 3.8+ rejects ssl.SSLSocket in asyncio's sock_sendall/sock_recv with TypeError. This caused the driver to fail connecting to ScyllaDB clusters requiring TLS, manifesting as 'protocol version 21 not supported' errors (0x15 = TLS Alert byte misread as protocol version). Fix by using asyncio's native TLS transport (loop.create_connection with ssl= parameter) instead of wrapping sockets with ssl.SSLContext.wrap_socket(). This preserves shard-aware port binding done during _initiate_connection(). Add _AsyncioProtocol to bridge asyncio's transport/protocol API back to Connection.process_io_buffer() for SSL data reads. Non-SSL connections continue using the existing sock_recv path. Fixes scylladb#330
|
Rebased. |
|
Can anyone review this PR? |
dkropachev
left a comment
There was a problem hiding this comment.
This looks like the right direction for asyncio TLS: using asyncio's native transport/protocol layer avoids the ssl.SSLSocket incompatibility with sock_recv() / sock_sendall().
I think it would be worth tightening the design before merge. Right now TLS and plain CQL use different I/O models, and TLS writes do not appear to account for asyncio transport backpressure. My main suggestion would be to consider using one transport/protocol path for both TLS and non-TLS asyncio connections in this PR.
The reason I think it belongs here is that the hard part is not only "make TLS work"; it is introducing asyncio transports into this reactor correctly. Once the reactor has transport lifecycle, protocol callbacks, write flow control, close handling, and tests for those pieces, keeping plain CQL on the old socket-coroutine path leaves two separate implementations of the same connection state machine. That makes future behavior harder to reason about and increases the chance that fixes land in one path but not the other.
CI already gives broad connectivity coverage through the asyncio integration matrix, including an SSL scenario via TestSslThroughNlb, so I would focus new tests on the new transport mechanics and compatibility edge cases rather than adding another generic SSL integration test.
| self._read_watcher = asyncio.run_coroutine_threadsafe( | ||
| self.handle_read(), loop=self._loop | ||
| ) | ||
| if self._using_ssl: |
There was a problem hiding this comment.
Would it make sense to use the transport/protocol path for both TLS and non-TLS connections here? With the current split, asyncio has two different I/O models in the same reactor: TLS uses asyncio.Protocol/transport, while plain CQL stays on sock_recv()/sock_sendall().
I think unifying them in this PR may be cleaner because the work needed for TLS is mostly the same work needed for a correct transport-based reactor: setup, read callbacks, write flow control, close handling, and tests. If those pieces are implemented only for TLS now, we end up with two connection state machines to maintain and a higher chance of future fixes applying to one path but not the other.
A shared create_connection(sock=..., ssl=self.ssl_context or None) path could make lifecycle, reads, writes, and flow control consistent across both modes.
| await self._loop.sock_sendall(self._socket, next_msg) | ||
| if self._transport: | ||
| # SSL: use asyncio transport (handles TLS transparently) | ||
| self._transport.write(next_msg) |
There was a problem hiding this comment.
Could we account for transport backpressure here? transport.write() queues into asyncio's transport buffer and returns immediately, unlike await sock_sendall(). Under sustained writes, this loop can drain _write_queue faster than the socket can actually send.
One possible shape is to have the protocol expose a write-ready event:
def pause_writing(self):
self.write_ready.clear()
def resume_writing(self):
self.write_ready.set()and then wait before writing:
await self._protocol.write_ready.wait()
self._transport.write(next_msg)| 'does not implement .finish()') | ||
|
|
||
|
|
||
| class _AsyncioProtocol(asyncio.Protocol): |
There was a problem hiding this comment.
Since this protocol owns the transport callbacks, I think it would be useful for it to also own flow-control state. Implementing pause_writing() / resume_writing() here would let the writer respect asyncio's high/low watermarks. It may also be worth unblocking any paused writer from connection_lost() so shutdown cannot leave the write coroutine waiting indefinitely.
| self._transport = None | ||
| self._protocol = _AsyncioProtocol(self) if self.ssl_context else None | ||
| self._using_ssl = bool(self.ssl_context) | ||
| self._ssl_ready = asyncio.Event() if self.ssl_context else None |
There was a problem hiding this comment.
Small compatibility point: the repo still supports Python 3.9, and the existing code passes loop=self._loop to Queue and Lock for that runtime. Should this new Event follow the same pattern so it is bound to the reactor loop rather than the caller thread's loop?
| ) | ||
| self._send_options_message() | ||
|
|
||
| def _connect_socket(self): |
There was a problem hiding this comment.
This override looks like it accidentally skips the base class sockopts handling. It would be good to preserve Cluster(sockopts=...) for asyncio connections:
if self.sockopts:
for args in self.sockopts:
self._socket.setsockopt(*args)| server_hostname = None | ||
| if self.ssl_options: | ||
| server_hostname = self.ssl_options.get("server_hostname", None) | ||
| if not server_hostname: |
There was a problem hiding this comment.
This may change behavior for an explicit ssl_options={"server_hostname": ""}. For asyncio create_connection(sock=..., ssl=...), server_hostname="" is the documented way to bypass hostname/SNI handling when no host argument is provided. Could we distinguish a missing value from an explicitly empty one, for example with is None or a sentinel?
Summary
SimpleStrategywithNetworkTopologyStrategyacross the codebase (ScyllaDB dropped SimpleStrategy support)loop.create_connection(ssl=)) instead ofssl.SSLContext.wrap_socket(), which Python 3.8+ rejects in asyncio'ssock_sendall/sock_recv0x15was misread as protocol version 21)Details
SimpleStrategy → NetworkTopologyStrategy
All CQL
CREATE KEYSPACEstatements, Python strategy objects, and test assertions updated. TheSimpleStrategyclass definition incassandra/metadata.pyis preserved for backward compatibility.AsyncioConnection SSL Fix
Problem: Python 3.8+ explicitly rejects
ssl.SSLSocketinasyncio.loop.sock_sendall()andsock_recv()withTypeError("ssl.SSLSocket is not supported"). The baseConnection._connect_socket()wraps the socket withssl.SSLContext.wrap_socket(), producing anSSLSocketthat asyncio refuses.Solution: Override
_connect_socket()inAsyncioConnectionto skip SSL wrapping, keeping a plain TCP socket. After the socket connects (preserving shard-aware port binding), upgrade to TLS asynchronously vialoop.create_connection(sock=, ssl=). A new_AsyncioProtocolclass bridges asyncio's transport/protocol API back toConnection.process_io_buffer()for reading. Writes usetransport.write()for SSL connections. Non-SSL connections are unchanged.Edge cases handled: SSL handshake failure properly unblocks the write coroutine and defuncts the connection.
Fixes #330