Java service reports error? Linux to the rescue! 🐧

Andreas Loizou
4 min readSep 10, 2022

👋 Hello reader,

I’m Andreas, a Software Developer and Software Development Trainer. Today, I was troubleshooting and eventually fixed a nice little bug. I was then writing about it to my team. While writing, I immediately thought, “Why not make it public?” So, here we are. 💪

A bit of context first. My team at QBeat Technologies delivers high-quality bespoke software to clients. Apart from delivering the software, we also invest a significant amount of time in monitoring those applications to ensure they are healthy and running as they should.

We never reinvent the wheel, so if there’s any software tool out of the box, we could use, and covers our requirements, we do use it. For monitoring, ELK stack, I am looking at you 👀.

However, you might sometimes need much more advanced and project-specific features. Then you have to DYI 🔨. For example, speaking of monitoring and alerting, before sending an alert, you might have to aggregate data to get some more invaluable context (e.g. several application’s log files, the DB, or Spring Boot Actuator).

Mandatory breaking bad meme, with the guy who did it all by himself.

A recent monitoring tool we had to implement to provide us with advanced alerting is a Spring application that spins up Quartz jobs to monitor whatever it has to check. The application was deployed and running correctly! Yey ✨

Aaaand, it was working as expected until it didn’t 🦄.

We received the following error:

java.io.FileNotFoundException: foo.txt(Too many open files)

Too many open files? Really?

I can’t believe google already had a meme for that

We look at our Java/Spring code, and sure enough, as expected, all of our files are always opened using try-with-resources. Our jobs also terminate gracefully. Hmmm, so, what resource do we open but not close correctly?

We added an additional log entry to print the number of the files our process has open, along with how many files it’s allowed to have:

📒 Note: In case you want to run the command above, it is optimised for the latest Java LTS (Java 17). It uses the pattern matching instanceof introduced in Java 14 and the String.formatted(...) introduced in Java 15.

As expected, we verified that the count increased with every job executed.

The mystery is still there, though! What is the resource our application leaks?

To find that out, we must remember that everything (not only files) is a file on Linux. These could be the actual files but also directories, devices, and network sockets. You can have a look here for more information.

Troubleshooting:

  1. We first found the process ID of the monitoring service using the command ps -aux | grep our_application_name . ps gives us the running processes, and grep searches through them.
  2. Then, we used the command lsof (more here) to see the open files of our monitoring service. e.g. lsof -p 20215, where 20215 is the port.

In the results, we noticed an entry multiple times e.g. fsociety/easter_egg_application/logs/. That’s a directory! We reran the command again at a later stage — more similar entries. Sure enough, this directory is the leaking resource.

Thus, we searched to find where we referred to the application logs directory. We searched in the property file where it is defined and found where it is used.

In our Java code, it is used here: Files.list(applicationLogsDirectory), where we just list the files in the directory.

According to the corresponding Javadocs:

The returned stream encapsulates a DirectoryStream. If timely disposal of file system resources is required, the try-with-resources construct should be used to ensure that the stream's close method is invoked after the stream operations are completed.

So, we just missed that :)

This resource has to be closed as well. As soon as we fixed that, there was no more leakage :) High Five, brothers and sisters! Problem solved.

party time

PS: You’ve reached this far! Well, that’s amazing! Let’s connect! And if you are looking for a nice company to join, we look for great developers as well! Just ping me on Linkedin.

--

--

Andreas Loizou

Software Engineering Trainer | Director of Engineering @ https://www.qbeat.io | @UniofOxford alum. No limits. Dares the Unknown