ECONNRESET Errors - Potentially New Info & Questions

I have a PostGres database accessed by my app via Apollo.

I’ve seen the ECONNRESET error in my server logs for years in development, and now that I’m approaching launch I really needed to track it down and make it go away.

It’s not hard to find posts here and on other forums, trying to find out what this error is. The answers are often vague and along the lines of “there’s something wrong with your network”.

My PostGres db is hosted on AWS, and their support response was so helpful and informative, that I will post it here in full.

Dear Customer,

Warm Greetings from AWS Premium Support. Hope you are safe and doing well!!

This is Vidit here from the AWS premium support team and I will be assisting you with your case today.

I understand that connection to your RDS PostgreSQL instance was dropped unexpectedly and you want to know why the issue occured. Please correct my understanding if required.

So, to assist you with your query, I looked into the logs you provided and also the database metrices and network connectivity to your instance and would like to mention that everything looked fine and the issue was not due to limit on max connections.

To further analyze the root cause, I reached out to the internal team and found that the issue was: “could not receive data from client: Connection reset by peer”.

There is no issue observed from RDS infrastructure end in the given timestamp which can be pointed out as the root cause of the issue. Also talking about the error:

i.e., “could not receive data from client: Connection reset by peer” → This is usually not a database issue, rather because your client application dropped the connection to the database and postgres logged it.

This might also happen when a client connection is not terminated gracefully.

I request you to kindly look for connections which are ‘idle in transaction’ and if you see that there are several such connections, I suggest you to tweak your timeout parameters on the client side, to make sure the connections are gracefully closed if they’re idle for a long time.

You can check for ‘idle in transaction’ connections via the below query:

select * from pg_stat_activity where (state = ‘idle in transaction’) and xact_start is not null;

Below is a link which clearly states that in order to troubleshoot the issue, the settings from the application side should be modified and if possible check the closing of the connection:

[+] postgresql - Postgres could not receive data from client: Connection timed out - Database Administrators Stack Exchange connection-timed-out

The same is discussed over below third party forum links for community PostgreSQL :

reset-by-peer-in-my-logs

Why am I seeing “could not receive data from client: Connection reset by peer” in my logs? - Heroku Help

You can also make use of the ‘idle_in_transaction_session_timeout’ parameter in RDS PostgreSQL, which is set to 1 day by default. This parameter helps to terminate any session with an open transaction that has been idle for longer than the specified duration in milliseconds.

More suggestions on the message - In order to resolve the “connection reset by peer issue”, you can look into tuning your application side connection timeout settings and/or set below parameters to keep TCP connections alive at RDS end.

tcp_keepalives_idle (number of seconds of inactivity after which TCP should send a keepalive message to the client.)

tcp_keepalives_interval (number of seconds after which a TCP keepalive message that is not acknowledged by the client should be re transmitted.)

tcp_keepalives_count (number of TCP keepalives that can be lost before the server’s connection to the client is considered dead.)

Once you’ve setup the parameters, you can run below query using master user to verify the changes are applied.

$ select name,setting from pg_settings where name like ‘%tcp%’;

I will also recommend you to add log_connections and log_disconnections parameters. To track successful and Failed connections.

I hope that this information is helpful, please feel free to reach back to me if you require further clarification or request on this case and I will be more than happy to assist you further.

Have a Great Day Ahead !

Researching this, I found this:

As stated in the official documentation, Sequelize sets up a connection pool on initialization that can be configured through the constructor’s options parameter.

By default, this pool is limited to 5 connections. This means that your application will use no more than five connections at a time, no matter how many requests it gets. As explained before, if six users concurrently hit any of your APIs that need to connect to the database, one of them will be queued and have to wait.

Observations

  • The connection pool settings are – at least in Sequelize – provided by the app and so are out of the control of auto-scaling.
  • So having auto-scaling on your database server and/or app host, can’t fix this.
  • It appears that the number of max connections, has to be tuned over time by the web app developer by watching the connections required by the server and continuously updating the connection pool max connection settings to keep up.

Questions

  • Is there a Sequelize competitor that automatically tunes and updates the max number of connections?
  • Is there a big memory hit if I set the Sequelize max number of connections to something super-high, like 5000?
  • Do other kinds of databases, e.g. Mongo, have ways of avoiding the need for a max number of connections?

Hmmmm… I updated from the default of:

pool: {
        max: 5,
        min: 0,
        acquire: 30000,
        idle: 10000
    }
});

…to:

pool: {
        max: 50,
        min: 0,
        acquire: 30000,
        idle: 5000
    }
});

…and I still got an ECONNRESET error, on my production server.

Maybe I have two situations to deal with – making sure I have:

  • Enough max connections
  • Enough idle time, which is the time they can be idle before being terminated.

But I don’t think I have any functions that keep a connection open and idle for more than 5 seconds.

Well, as a test, I just updated to:

  "pool": {
    "max": 50,
    "min": 0,
    "acquire": 30000,
    "idle": 30000
  },

This is a major hassle. :slight_smile: If anyone has any thoughts/speculations/suggestions, please let me know.

I’m still studying and learning about this.

My understanding of pool.max was not complete. The developer quoted above said:

This means that your application will use no more than five connections at a time, no matter how many requests it gets. As explained before, if six users concurrently hit any of your APIs that need to connect to the database, one of them will be queued and have to wait.

…but per the PostGres wiki pool.max is also related to the number of processors you have:

How to Find the Optimal Database Connection Pool Size

A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((core_count * 2) + effective_spindle_count)

This explains why my first plan of just putting a big number into pool.max didn’t work.

I heard back again from AWS tech support:

So, to help you with your query I would like to mention that increasing the max_connections for your specific use case would not be much helpful as previously the issue that I troubleshooted was not due to connection limit being reached but was due to your client application dropping the connection which can be because the connection was idle since a long time and it got timed out.

So I need to focus more on pool.idle. I haven’t seen an ECONNRESET error since I increased that from 5000 to 30000, so that’s encouraging, but not dispositive since it can be a very intermittent error.

Per AWS tech support, I’ve also updated some settings on the AWS RDS side:

tcp_keepalives_idle (number of seconds of inactivity after which TCP should send a keepalive message to the client.)

tcp_keepalives_interval (number of seconds after which a TCP keepalive message that is not acknowledged by the client should be re transmitted.)

I’ve set tcp_keepalives_idle to 10 and tcp_keepalives_interval to 5.

I’ll update this thread when I know more.