A Virus in the Lab
This story goes back to the first half of the 2000s. At the time I worked at a company that built hardware for telephony.
It was a lot of fun. I had access to some nice NEC machines and wrote some really interesting code.
That was when I wrote a VoIP recorder that captured SIP packets directly off the network, in less than 100 lines of C, but that’s a story for another day.
One day a customer called complaining that their system was very slow. It was a big customer, a pharmaceutical lab here in São Paulo.
This customer’s system worked roughly like this: it received a lot of documents by FAX, such as requests, exam results, and patient records.
The system itself already did a rough classification and inserted some of the data into the database, which back then was MySQL (not my choice). The documents themselves were stored on drives shared over the network and could be accessed from the agents’ machines.
That way the lab’s system could locate documents and serve doctors and patients.
The people who knew the system best talked it over and concluded the slowness had to be the database, so the solution was to buy a new machine and optimize the database.
Within a few days the new machine was on my desk: a beautiful DELL just 1U tall. I don’t remember the specs, but it was a very expensive machine for the time.
I spent a couple of days happily configuring and tuning the machine for maximum performance. Nothing was left to chance, down to tuning the database caches exactly for the hardware, an optimized file system, no processes running beyond the absolutely necessary ones, and so on.
The machine left my desk and I was sure I had done the best job possible. From my tests, there was no way we’d run into database performance issues again any time soon.
A few days later the customer complained again. The system was still slow, so we talked and got them to let us send someone over to measure the parts of the system and figure out what was going on.
Of course that someone was me.
As soon as I arrived, I started watching the network, and traffic was quite high even outside peak hours. Based on the protocol, what was overloaded was the Windows file and directory sharing system.
So I did the most basic thing: I took a look at the files themselves. The FAX system generated thousands of files a day, organized into a directory structure by date and time, making it easy to know what had arrived at any given moment, not to mention the file names, which followed a standard pattern.
And in that ocean of files, one of them caught my eye: a doc with a name that broke the pattern, on top of being a doc rather than an image file generated by the FAX system.
All I had to do was open the file to see it was one of those VBA macro viruses that infected Word documents, very common at the time.
What this virus did was repeatedly scan every shared directory trying to find the Windows recycle bin to empty it. The source code even had a comment saying not to worry, that the virus was harmless (as if that were possible).
With thousands of files and directories being scanned by the virus, the network slowed to a crawl.
So I gathered the evidence and wrote up a nice report, and my boss was pleased that on top of solving the customer’s problem he could also bill the hours for sending me over. After the system was cleaned up everything went back to normal, now with a super-optimized database.
In case you’re curious to see what this kind of virus looked like, just do a quick search on GitHub and you’ll find plenty of repositories with the source code of many viruses.
This experience made it clear that before reaching for complex solutions or unnecessary spending, it’s essential to carefully measure and analyze the system.
Often, what looks like a failure in a specific component, like the database, can actually be a symptom of another problem, in this case a virus overloading the network.
This situation reinforces the importance of keeping systems simple, because that makes it far easier to analyze what’s actually going on.