It has been over a year since we released the first version of SharpSocks, our proxy-aware reverse HTTP tunnelling SOCKS proxy. This post aims to provide a State of the Nation update for users. It details some of our experiences using it, how the experience and performance has been massively improved, and some of our observations along that journey for those interested in some of the juicy technical details under the hood.
TLDR – SharpSocks performance has been massively increased and is now fully integrated into PoshC2 so you can type sharpsocks
to get this going straight away.
What’s the point of a SOCKS proxy?
When performing a simulated attack (red team), or indeed many other types of test, SOCKS proxies become an integral tool during the “Act on Objectives” phase of the kill chain. For example, if the target is behind the network perimeter, is a system accessible via RDP, is an intranet site or is a Citrix hosted application, the SOCKS proxy becomes an essential tool. It can allow access to those “internal only” resources, through an established implants C2 (Command and Control). The diagram below shows a typical scenario in a red team:
Currently, there are plenty of SOCKS proxy implementations that can be used in this scenario (which are very good) but these are usually built in to the various C2 frameworks. The initial goal of SharpSocks was to be “framework agnostic”, such that it could be deployed to a machine, independent of whether an implant was operating or not.
When a SOCKS Proxy is deployed via other C2 frameworks, the beacon frequency must be increased to provide quick feedback (usually 1-5 seconds), as a consequence of the volume of data being sent and received increasing considerably. In addition, there are connections to potentially sensitive services, which can generate events or Indicators of Compromise (IOCs) thus increasing the risk of detection.
By separating the SOCKS proxy from your main C2 Implant comms:
- You can deploy the SOCKS Proxy to an unrelated machine and communicate through separate infrastructure.
- Even if the SOCKS Proxy activity gets detected, it will not result in losing your initial foothold. If the SOCKS and C2 Implant are combined then there is a higher chance of detection.
Downsides to the “Separation” :
- Additional setup, C2 infrastructure, separate commands and tools.
- The actual code of gets larger.
- The configuration becomes a lot more complicated.
It also does not help the fact that we like developing in C#, while everyone else wants to pwn stuff with Linux. In order to get around this, the use of SharpSocks with PoshC2 was supported by Wine. The SharpSocks client would work on a Linux OS but this was definitely less than optimal. Sooooo…
SharpSocks Server is now cross platform .Net Core
The main announcement is that at least on the SharpSocks side it has been ported to .Net Core (no more Wine thankfully; more on this below) and that its traffic is routed through PoshC2. In terms of PoshC2, which is detailed here: Introducing PoshC2 v5.0, major improvements have been made in its usability, in order to make SharpSocks a primary component rather than a tack on to the side.
Considering the risks we mentioned above, in deploying the SOCKS Proxy and implant on the same host and going back through the same infrastructure, is it still possible to deploy SharpSocks separately?
The answer is yes; absolutely the primary design goal of SharpSocks being deployed as a standalone SOCKS tool is still available.
State of the Nation
Some of the other project goals were to deliver a tool with the following features:
- Performance; ensure that the user experience of any protocol being tunnelled such as RDP is good.
- Proxy aware when sending traffic outside of the network.
- Allow for the easy customisation of the URLs used, the encryption mechanism and the data transport locations.
Over the last year, SharpSocks has seen regular usage on engagements where we have deployed PoshC2. One theme of threat intelligence led simulated attacks (CREST STAR, CBEST, TIBER etc.) is the use of a range of tools (i.e. C2 Frameworks) in order to simulate different levels of threat actor sophistication. PoshC2 is used in conjunction with commercial offerings and our own private Remote Access Tool (RAT) to simulate different types of adversaries.
Generally, feedback on SharpSocks use has been positive by our team, however the consistent themes of frustration have always been ease of use and performance. By far one of the most embarrassing situations as a developer you can get yourself into is when you completely forget how to setup your own tool. SOCKS proxying by nature is slow, however compared to other similar tools, SharpSocks still wasn’t the fastest.
This is where this updated release comes in. It is designed to make SharpSocks easier to use, fix some long standing bugs and generally make the performance of SOCKS proxying better and faster.
.Net Core
As mentioned above, the SharpSocks Server component is now built as a .Net Core project. Why .Net Core? Well, when SharpSocks was originally built, it was a purely “C# Windows” only solution. This was primarily due to my background as a professional developer before my career in InfoSec. Creating a Windows only tool really does, in terms of InfoSec, limit your potential user base. Eventually I came to the conclusion that yes – it had to become cross platform. Some of the options included Go or Python, but in the end porting to these would have required a complete rewrite of the underlying code base, something I was really not sure was worth the effort considering that .Net Core is available.
By porting to .Net Core, it is now possible to run SharpSocks on either Windows, Linux or MacOS with limited changes. The process of porting involved creating a new .Net Core project and moving the source into it. The only change that had to be made was that the encryption code tried to limit exposure of the key in memory by using DPAPI; in .Net Core this does not appear to be supported as I suspect there is no direct analogue in Linux or MacOS.
The main advantage of running as .NET Core rather than Wine is performance; tunnelling your traffic through SharpSocks feels so much faster. This is entirely logical, as it runs on a small optimised platform runtime.
Other Changes
Other fixes and changes include an overhaul of the error handling, particularly when the server has stopped responding. A particularly nasty issue was found that occurred when the redirector returned a 404 and the server had been stopped.
There is also a new option on Server called socketTimeout
(pass either -st
or -socketTimeout
and an integer value for second e.g. 240
).
- This controls how long a SOCKS connection to the server will be held open without any data having been sent over it. This is not something that you should normally have to tweak, however if you are having problems with the application being tunnelled timing out then this is a value that may help.
How to use (SharpSocks in Posh)
- Get a shell (PowerShell or C# implant) on the target host using your favourite execution method.
- Within the implant handler, select the implant.
- Type
sharpsocks
within the implant handler. Immediately, a line will be printed out with detail that you will need to start the server. Copy this and paste it into a terminal (NOT into the PoshC2 implant handler). - Once the server has started, type ‘Y’ and enter into the implant handler window. The output should now say that the implant has started. You can confirm this by looking at the output from the server you started in the previous step. If it has started, you should see the port that the SOCKS Proxy is now listening on (most likely 43334).
- If you haven’t already done so, now is the time to configure proxychains, your browser or the SOCKS aware application to point to localhost:43334.
- With this you can start SOCKS Proxying away.
SharpSocks Deep Dive
As stated, it had been a while since the initial release of SharpSocks and it was time to do a little work on it. The following is a bit of a deep dive on how it works, some of the changes that have been made recently, and issues that we had to work through. If there is anything you would like to know more about, hit me up on twitter @rbmaslen or on the PoshC2 Slack channel (poshc2.slack.com – email us @ labs at nettitude dot com for an invite).
Long Polling
Having taken Silent Break Security’s excellent Dark Side Ops 2 course (https://silentbreaksecurity.com/training/dark-side-ops-2-adversary-simulation/ ) at DerbyCon last year, before smashing the 2018 CTF with SpicyWeasel, I was struck by the usefulness of HTTP Long Poll. I had come across the concept many years ago in another life as a developer but hadn’t really grasped the relevance to the Red Team toolkit until I took the course.
The premise, essentially, is that you are inverting the beacon. In traditional implants:
- The main thread is on a timer (with an inbuilt jitter value).
- When this interval elapses a call back is made to the C2 Server.
One of the main problems with running on beacon time is that once you have queued your task, there is plenty of time to be distracted before the results arrive in. Long Polling means that when a request is made to an HTTP Server, the connection will be held open until a result set is ready (or a much longer time out elapses server side).
How this works in the implant context is that when the request comes into the C2 Server from the implant, the connection (HTTP Request) is held open until tasks have been queued or if tasks are queued and ready to go, it returns immediately. This puts the wait upon the Server side and makes the implant/tasking feel very snappy much more like a command line.
It also means shorter beacon times when being actively used and longer ones when not in use, so the operator needs to spend less time adjusting the sleep time. It also probably makes it a little harder for tools looking for “regular” HTTP requests in order to identify beacons.
So how does that relate to SharpSocks? Well, I’m glad you asked; first it’s probably best to understand how SharpSocks works as a concept.
So how does it all work then?
Very good question, if you do know answers on a postcard please 🙂
Breaking it down, there are at a bare minimum two components, the first being the “Server” and the second being “Implant” (both as separate Visual Studio projects). Within the Visual Studio solutions there are also two test applications that you can use if you are debugging the libraries. The way that SharpSocks is normally deployed, particularly via the Implant, is either a PowerShell script that loads the library or a C# implant that loads the Assembly (.dll).
The Server at it’s core is an HTTPListener. When the server starts up, it starts listening for connections to the URI that has been passed.
An important point is that the Implant and the Server need to share a few secrets:
- One is an Encryption Key.
- The other is a Command Channel Session Id.
The Implant communicates to the Server via XML Messages in regular Web Requests (encrypted of course. The request needs to identify the connection or the Command Channel that messages are being sent about by a randomly generated Session Id.
The Command Channel has a special one that is shared between the “Server” and the “Implant’ at start-up. Messages about connections opening and closing are sent over the Command Channel (encrypted of course with the key); actual traffic and a status are sent with session id’s that identify the connection.
I need to admit that XML is not a great format. It was used for simplicity when SharpSocks was first built, and yes it will be replaced at some point in time, but it works for the moment. A point could also be made about the lack of multiplexing; at the moment each request will only have information about one Session Id (Command Channel can have multiple open/close messages). This is on the road map, as I would really like to be able to reduce the number of web requests.
The first time a Command Channel request is sent by the Implant another port on the localhost interface is opened up. This is the SOCKS port where you can point your browser, other SOCKS proxy aware app or proxychains (the port itself either defaults to 43334 or what you pass on the command line). Now the app that wants to proxy needs to first send a request to this port identifying the host and the port that it wants to open a connection to (an example of the structure here is for SOCKS4) before it can send any data. The details about the connection to open is then queued as a Command Channel task which will be picked up by the implants Command Channel web request. Once the implant receives this message back, the connection to the specified host is attempted and a message based on the status of this connection is returned to the Server.
Back to Long Polling
So thinking about Long Polling you can probably immediately see where this might be of benefit to the Command Channel, particularly if you have an application that is infrequently opening connections. Having a traditional beacon means that there will be a delay in opening the connection at the implant side if the proxying application didn’t make the request to open right before the beacon arrived.
For this reason the implant now uses Long Polling for the Command Channel. If there are no connections to Open or Close (or none arrive during the wait) the Server will hold the connection open for 40 seconds delay before returning back to the implant telling it there is no work to do (the Implant will then wait a very small amount of time before initiating the connection again). This Long Poll interval can be tweaked on the Server at runtime by typing:
LPoll=
e.g. LPoll=5
or LPoll=60
Heading back to the implant side, when it receives the Open message it will then attempt to create a TCP Connection to the specified host:port
. Now in the previous version, all of the work that it was to perform was started asynchronously within a Task (one was opened per connection). The code running in this task would asynchronously poll that connection to the target application and then send/receive data to and from the Server via web requests. The Server itself would also have a polling loop checking any data on the SOCKS connection.
This kind of worked in theory, but using some real world applications like RDP (and then trying to debug why connections were really slow) showed up that this was a less than optimal design. There were too many Tasks being created (far too many resources being used), it was next to impossible to get an idea of any connections state when debugging and it had really been coded using a too simplistic notion that traffic would follow a pattern of:
client->server
server->client
client->server
etc.
Changing in this release, instead of both the Server and the Implant having a Task per open connection, there is now only a single loop that polls each of connections regularly (very regularly) to check if the connection is still open and read any data that is available. Moving to a singular loop has shown performance gains rather than losses (so the lesson here is beware of unnecessary parallelism particularly in terms of complexity).
Any data that has been read from a polled connection is written into a queue and then an Event is triggered (specifically a ManualResetEvent notifying another thread that there is data to send back to the Server. This threads sole job is to wait until there is data to send via Web Request back to the Server (it is important to note this is fire and forget, NOT long polled). It’s best to think of this as a standard ProducerConsumer loop. In order to implement it here I am using ConcurrentQueue’s. These are a really cool class helping to avoid locks in concurrent code.
So how does the data come back from Server? The Implant also makes a long polling request per open connection to poll the Servers queue for any data that needs to be sent down the SOCKS connection.
It’s really important to understand how this works as it leads to one of the more frustrating limits I’ve found in .Net (I’ve been coding C# since 2007 and this is the first time I have seen it).
Summing up in terms of Web Requests back to the server we have at least:
- 1 Long Polled Command Channel back to the Server.
- 1 Long Polled Data request per open connection.
- 1 Normal request when the Target application has sent data to the implant.
I’ve got all the code ready, should work really nice but NOPE it’s gone really slow again. What?!
Limiting limits
If you have done any coding using WebClient, C#’s wrapper for making Web Requests, then you might have also come across ServicePoint & ServicePointManager. These classes control the underlying connection, the connection pool and the TLS connection to the target site.
In order to do a System Test of everything I configured the test apps to point to each other (on Windows 10 using FireFox with its proxy configured to use the SOCKS proxy port) – I tried to load the BBC Website and the results were not great. First of all it’s insane how much gets loaded by most modern websites and how many connections to different sites there are.
I’ve probably ignored this in Burp by hiding what is not in scope… Both Test applications have the option of passing –verbose
for logging, yielding a load of information about the connections being opened, data being sent back & forth, and timings. Now i could see a delay in various places but having so many connections made it really difficult, so thinking only having one connection might make life easier, I resorted to Curl.
Sure enough, yes, in Curl I still kept getting an artificial delay, so there was no other option but to go back into Visual Studio and debug it. Did this yield any info? Not really. So, I could see the connection being made, everything was cool until data needed to be sent back and there was always a delay that appeared consistent with the Long Poll, but not always.
Is there a happy ending?
Well, yes. After an inordinate amount of testing, thinking about trying to use Kestrel instead of HTTPListener and much reading of documentation, I came across this utterly insane limit within the ServicePointManager
. Now, each domain that you make a connection to will have a ServicePoint
instance, meaning that there is a maximum of two (yes, 2!) concurrent connections to any site. This value can be modified, but it must be done before any connection is made to the host otherwise a ServicePoint
object will have been created and the limit will be two.
The image above is from:
So, we know that at least two connections are being Long Polled and any time data is read from the connection to the target application a third Web Request will be made. Either the Command Channel or Implants other connection will have to return before this can be sent. What can I say? Massively frustrating and I had never seen this before (or maybe I had in the past, forgotten it and am just getting old, who knows).
Conclusion
So, to wrap what have we learnt here… to be honest it just reinforces lessons that I have learnt continually through my career:
- RTFM – there is incredible value in the MSDN.
- Avoid unnecessary complexity and too much parallelism.
With this release I really feel that SharpSocks is getting a lot closer to the initial goals of being a very easy to use SOCKS proxy with solid performance. The upcoming features that are still in development are:
- Reduce some of the web traffic mainly by multiplexing some messages.
- Stop using XML and use a better message format.
- Tighter integration with PoshC2.
Oh – I almost forgot: SOCKS5 (TCP only for the moment) and thus Burp are now supported. SOCKS away my friends!