Fix Linux Server Issues With These 5 Troubleshooting Steps

If your Linux server isn't performing to its full potential, it's likely there is an underlying issue that needs resolving.

Follow these five simple yet practical steps to troubleshoot a Linux server and reduce the downtime to an absolute minimal.

1. Check the Hardware

Let's get down to the absolute basics: check the hardware. This means you head over to the physical rack and check if any cables are loose or there's a power outage.

Alternatively, type the following command:

        $ sudo ethtool eth0

If it returns a yes, you know your port is talking to the network.

To check a server's BIOS/UEFI hardware report, use the following command:

To see what a server's BIOS/UEFI reports about its hardware

        $ sudo dmidecode --type memory

If the response looks good, this isn't the problem either. If you suspect there are memory issues, run the following command:

        $ sudo modprobe edac_core

If there are no results after running the aforementioned command, type the following:

        $ sudo grep "[0-9]" /sys/devices/system/etc/mc/mc*/csrow*/ch*_ce_count

This presents you with a list of the memory controller's rows along with the error count. When an output is combined with the dmidecode data on the memory channel, part number, and slot, you can successfully find the corrupted memory stick.

2. Decipher the Exact Problem

Your server has gone down, and there are no two ways about it. Before jumping in with your tools, it is essential to define what the exact problem is. For example, if your users face issues with a server application, you need to make sure the problem is not at the client's side.

Secondly, as a part of the problem hunt, you should try to narrow down the source of the problem. This would mean either the server per se or the server application. For instance, a server program can go haywire while the server functions like a well-oiled machine.

To check if an application is running smoothly, type the following:

        $ sudo ps -ef | grep apache2
$ sudo netstat -plunt | grep apache2

If the server is not responding, you can turn on the Apache server using:

        $ sudo service apache2 start

In short, figure out the exact problem before jumping the gun. This would help narrow down the list of issues and help you figure out a solution accordingly.

3. Using the Top Function

Top is one of Linux's most exemplary debugging functions, as it loads the average, swap, and a list of processes using the system's resources.

Top function checks load average, swap, and which processes are using resources

But the first time you use it, it can appear confusing. Here's a quick breakdown of top.

Line 1:

The time
How long the computer has been running?
Number of users
Load average (the system load time for the last minute, last 5 minutes, and last 15 minutes)

Line 2:

Total number of tasks
Number of running tasks
Number of sleeping tasks
Number of stopped tasks
Number of zombie tasks

Line 3:

CPU usage as a percentage by the user
CPU usage as a percentage by system
CPU usage as a percentage by low-priority processes
CPU usage as a percentage by idle processes
CPU usage as a percentage by I/O wait
CPU usage as a percentage by hardware interrupts
CPU usage as a percentage by software interrupts
CPU usage as a percentage by steal time
Total system memory
Free memory
Memory used
Buffer cache

Line 4:

Total swap available
Total swap free
Total swap used
Available memory

This is followed by a line for each running application. It includes:

Process ID
User
Priority
Nice level
Virtual memory used by process
Resident memory used by process
Shareable memory
CPU used by process as a percentage
Memory used by process as a percentage
Time process has been running
Command

To find out which process is consuming the highest memory, first sort the process by typing M.

To check processes using the most CPU power, press P.

To filter on specific options, press O, which will display the following commands:

        add filter #1 (ignoring case) as: [!]FLD?VAL

Further on, you can filter on a particular process, like

        COMMAND=apache

This will filter and show only Apache processes.

4. Tracking the Disk Space

Despite endless available storage, a server can run out of space, leading to a multitude of problems. In such scenarios, use the df command (disk filesystem) to pull out a complete summary of available/used disk space.

Use df command to view a full summary of available and used disk space.

You can use it in the following three ways:

        $ sudo df -h
$ sudo df -i
$ sudo df -hT

Another useful command is %util, which highlights how strained the device is. Any values greater than 60% utilization indicate poor storage performance. Anything close to 100% means the drive is close to saturation.

5. Check the Logs for Problems

The logs give you a ton of helpful information in the /var/log, a subdirectory specific to the service. For newcomers, Linux's server logs might be the scariest place on the planet.

That does not have to be the case, mainly since the logs are divided as per their functionality. One captures what happens on a system/program, while the other records system/application error messages. Logs are usually enormous files, given the amount of information they store.

Log data files are cryptic, and it's always best to learn how to maneuver your way around.

If you are unsure, use dmesg, which displays all the kernel's messages. The tail function shows the first 10 messages by default.

Tail function displays all the kernel messages.

        $ dmesg | tail

Combining the tail command with the -f keyword will continue to keep an eye on the syslog file and print out the next event within syslog.

        $ dmesg | tail -f /var/log/syslog

This command will continue to sweep through the logs and show possible problems.

Troubleshooting Your Linux Server Effectively

Troubleshooting your Linux server might seem a daunting feat initially, but there are a few instances necessary to get the ball rolling. If these five steps haven't helped you identify and track the problem, it might be worthwhile to get other people involved.

However, most times, one of the above troubleshooting steps should help resolve the issue at hand.