Introducing pgMitm🔗
Most of the time, I blog about system administration or other technical computing fields. Until now, I did publish a few articles related to code. Since the last two or three years, I have actually coded a lot (well, compared to the amount of free time, it is a lot). I have published everything on my personal forge (which is run by cgit).
For this article I would like to introduce one of this pet project which is entitled "pgMitm" for PostgreSQL Man-In-The-Middle. I started it in 2018, stopped working on it for a year and worked again on it at the beginning of this year. I think it is mature enough (but with lots of bugs and errors) to be used in the wild.
pgMitm origins
At work, we had to debug some proprietary binaries which were connecting to PostgreSQL servers. There were some problems with the connections and as diagnostics made progress, I tried to find what were the requests made towards the database servers. The pain with those proprietary programs is that you can't read any piece of code to try to understand what's wrong. It is literally a black box where you only control what's put in and take what's out.
I tried to use Wireshark to capture queries but the traffic was encrypted and there was no option to disable TLS connection from the client. We asked our database administrator to disable TLS connections from the server configuration but he couldn't do it on production environments. We were a little bit stuck because the editor said that the problem was surely coming from the database servers (this bitch excuse!). With limited means to debug and without a wide range of available servers with clear traffic I tried different solutions to improve the situation.
I found postgres_mitm which was doing a sort of man in the middle proxy for PostgreSQL. It was coded in Python3 and I tried to improve it to handle our specific situation. But after half a day of investigations, I needed something faster and C was the natural solution to this. I hesitated a bit before starting what could really be a time-consuming challenge but, as we were facing a blocking situation (the clients were merely usable because of this nasty bug), I started to code. The main features were coded in less than 3 days and after this time I was able to plug pgAdmin3 on pgMitm and make it usable, behind the proxy.
What is pgMitm?
pgMitm is a Man-In-The-Middle proxy for PostgreSQL. It works by listening to a port and redirecting all traffic towards a real PostgreSQL server. By capturing all traffic, it is able to analyse network packets and export parts to log files (capturing queries for example).
pgMitm forces a clear channel connection between the client and itself so you can sniff traffic with Wireshark, being able to capture every packet in order to make diagnostics, even if you can't configure the client and the server to use unencrypted communications.
How can I use it?
Well, you will have to compile it by hand. You just need gcc, the GNU libc and OpenSSL development files (libssl-dev in Debian).
gcc -Wall -O3 -std=gnu11 -o pgMitm pgMitm.c -lssl -lcrypto -lm
Then it is very simple to use:
pgMitm -p 5433 pgserver.example.com
Then you can connect your client to localhost/ip of your computer on the designated port (5433 in our example). All of the traffic will be sent to pgserver.example.com (on the default port which is 5432).
Here is the help documentation:
Usage: pgMitm [OPTION…] BACKEND pgMitm, a PostgreSQL proxy. -c, --clear Erase logfile if exists on startup. -l, --logfile=LOGFILE Path to the log file. -p, --port=PORT Local port to bind. -v, --verbose Verbose/Debug mode on standard output. -?, --help Give this help list --usage Give a short usage message -V, --version Print program version Mandatory or optional arguments to long options are also mandatory or optional for any corresponding short options. BACKEND is the name or IP address of a real postgreSQL server. You can also provide a port for this server by appending it after a colon, like: pgserver.example.com:5433 If you are using the default Postgresql port (5432), it can be ommited.
How things are handled internaly?
pgMitm try to disable TLS connections on the client side. There is an option in libpq (the PostgreSQL library for clients) which is used to prefer TLS connections before using plain connections if TLS is unavailable (you can also restrict to TLS only). Most of the time, clients are flexibles: they prefer TLS but are not restricted to. Whenever a client try to connect to pgMitm, it probes the server in order to know if it can handle TLS connections. pgMitm always answers "No" to any start connection which is not in clear.
Once the connection is established between pgMitm and the client, client starts to authenticate against the server with credentials. pgMitm intercepts those packets and try to connect to the real server (the backend) with the credentials sent by the client.
Then, a "tunnel" is opened like this:
/--------\ | | |-------------| {----------} |\-------/|----------->| pgMitm |------------>{ Client } | Backend | | | { } | |<-----------| |<----------- { } \________/ |-------------| {----------}
It is not a real tunnel (on the code aspect) because there are internal routines that are started whenever traffic is incoming, either from the client or the backend. Most of the time, pgMitm send directly the packets towards the destination but there are some times where it modifies the traffic (TLS negotiation/authentication/Cancel requests/etc.).
FAQ
As this piece of code will inevitably rot because of lack of interest or use, I think I need to explain some details. Let's have a short FAQ !
Why try to code it into C rather than in Python3?
It was a real challenge and I had doubts I was able to achieve this task. But after months of code review, C learning, adjusments, pgMitm is ready to be published.
First of all, I wanted to have something faster than Python3. Python3 is good for prototyping but slow to execute (well, most of the time). Furthermore, postgres-mitm wasn't able to make our buggy clients work at all (there was a crash or a never ended query).
As a second argument, I also wanted to take time to improve my skills in C. C is one of the first programming language I learnt when I was a child, just after BASIC and Pascal. From the beginning I loved it because it seems to be a real smart language, easy to learn without a lot of grammar tricks. The pointers system seemed to be so clever (even if I can remember it took probably a week of reading and testing for me to understand the concept of linked lists).
By coding pgMitm in C I achieved one of my child dream: being able to code a relatively complex program that I could use at work.
Why only one file for pgMitm?
I wanted to code a program with a very limited set of features and dependencies. No Makefile, no header files, no tests, only one C file, no other libraries than the GNU libc and openSSL (for TLS traffic).
What is supported?
- PostgreSQL protocol v3 (well v3 was launched 20 years ago).
- PostgreSQL backend from 9.6 to 10 has been tested.
- Some parts of PostgreSQL advanced protocol v3.
- Backend that are using TLS connections.
- Backend that are also using clear connections.
- IPv4 and IPv6 clients/backends.
What is the code architecture?
The code is mostly separated into three parts:
- The Man-in-The-Middle part who deals with traffic/connections/authentication and low-level socket/network/TLS things (the most challenging one).
- The reporting part who captures queries and traffic from client and export it into text log files. It was a way for me to build something with a lot of C string components. It was very complicated compared to what you can do with Python3.
- The "server" code: pgMitm is a listening process (a daemon) who forks on each new connection.
There is also the configuration/arguments part.
Why code things to export in text files?
After a few days, I was able to capture TLS traffic with Wireshark and I considered to stop there. But after having taken time to investigate the packets capture files, it seemed very important to be able to report queries or SQL statements on the screen or in log files because the faulty clients queried about 20MB per connection process with a lot of different queries. And analyzing 20MB of data in Wireshark is very time consuming.
So, I started to write to log files whenever a specific query was met on the flow. It helped me to really understand how PostgreSQL protocol worked, what were the different commands, how the information was sequenced. At the end, I finally wrote some dedicated functions to capture traffic between backend and client and print it into an intelligible human dialect. It works only on the client side because I suspected my problem was coming from the client, not from the database server.
Can pgMitm be used in production environments?
Of course not! It is just a toy for diagnostics. I was able to test it for intense work session on a GIS client, routing about a gigabytes of data without crashing or noticing a memory leak.
But you cannot expect to place it as a frontend to production servers with lots of clients and traffic. Furthermore it is has not been checked for memory corruption. From time to time, I noticed crashes. Most of the time it was because of some unchecked conditions or a lack of 100% of the PostgreSQL protocol syntax support.
On the security checks, I used Valgrind to check about memory leaks and I fixed some holes in the code (but probably not all of them because it was the first time I used Valgrind).
But you can try it, understanding those risks.
Did you found a solution to your original problem?
Of course I have! It took lots of time and I had to code adjustments in order to:
- be able to handle multiple connections toward the same server from the same client.
- handle the differents authentication mechanisms.
- take care of query canceling.
- and most of the time: take care of our buggy proprietary binaries.
- add features to export queries in text files.
After weeks of work, I finally demonstrated that the problem was coming from the client and not the server. It was a race condition between the sending of a cancel request and the fact that the client was asking about the results that were just canceled.
By the way, what is query canceling?
It is a mechanism to instruct PostgreSQL server to stop computing a query (which is running of course). On the network side, the mechanism is a bit tricky to understand. The server is computing the query and it can't listen to the client connection used to ask for the query (because it is computing an answer, it can't check for traffic and dealing with data at the same time). To instruct the server to stop working, you have to open a new connection (on a different port) and send a "magic" word inside a cancel request packet.
The "magic" word is sent by the server at the first connection, one for each new socket opened for a client. It is composed of a key and a sort of salt. If you know those two variables, it is possible to cancel any request for any client… but it is very difficult to capture both, only if you are using Man-In-The-Middle.
I had to implement it in pgMitm because before implementing this feature, I was unable to reproduce the bug in our faulty clients. By using pgMitm which did not route the cancel query packets at that time, they weren't buggy anymore!
After coding a "real" program, do you still enjoy C?
Well, of course! But I tend to fade a bit of the magic and the fame of C. If the language is cool, the libc (standard library) is a bit tricky to use. First of all, it is really huge. You can do nearly whatever you want with it. But you have to learn a lot before being really productive.
Then, the standard library keep track of old functions for compatibility reasons and sometimes it is misleading. For example I started to use gethostbyname to get the IP address of the backend. Then, after a while, I discovered getaddrinfo
which was much more efficient to use.
I also had shitty moments with strings and pointers arithmetic. Gosh it is so strict! You have to calculate rigorously everything, every increment; otherwise, your program or some of its parts are brutaly crashing. With time, I learned to verify everything but until now, I think I am not so comfortable with strings in C. Compared to Python3, it is so much more complicated and the libc is not really helping (should I use strcpy, strcat, strncpy, strwhatever ?).
But if you stick to some rules, are patient; if you stay stubborn and you are ready to fight for your code, you'll be as happy as I was for the first time I could use pgMitm during 15 minutes without any crash at all!
Conclusions
I have learnt so much about C language and the standard library, even on OpenSSL. I explored the following subjects:
- Process forking.
- Argument handling.
- DNS queries from C.
- Sockets.
- OpenSSL TLS configuration and socket system.
I am impressed by the fact that I manage to deal with sockets directly with the GNU libc. It was a challenge to discover this part of the system. Then, there was the TLS part and the OpenSSL understanding. It is not so complicated but it is far more than just one command to open a connection to a TLS server. You have to prepare a lot of things before being able to receive and send traffic. By chance, the process is very similar to the classic socket used in the libc.
I have learnt a lot about the PostgreSQL protocol internals. pgMitm doesn't use libpq, it handles all the traffic by itself. So I had to read some part of the code of PostgreSQL and other documentations on the protocol. I was quite lucky that PostgreSQL is coded in C and as a consequence, the code deals relatively well with C strings, making the handling of the protocol more simple than with restricted binary string formats.
For a non professional C coder it was really difficult and I am really proud of what I have achieved. I am also aware that pgMitm is far from being perfect. After all, I am still a C beginner and I believe that it will always be the case, whatever effort I will put in this very interesting language and environment…