This Blog is to share our knowledge and expertise on Linux System Administration and VMware Administration

Thursday, November 16, 2017

How to add temporary & permanent route in Linux box?

Thursday, November 16, 2017 0

Follow the below steps for adding the route as  temporary or permanent . 

Temporary Route

We just need to add the ip route add command in command line. like below

[root@testserver ~]#ip route add 10.217.156.0/25 via 10.194.32.1 dev eth0

[root@testserver ~]# route -n | grep -i 10.217.156.0

10.217.156.0    10.194.32.1     255.255.255.128 UG    0      0        0 eth0

Above route entry will be invicible post reboot the linux box.

Permanent Route

We need to create a file route-eth0 (since it is pointing eth0 device) under /etc/sysconfig/network-scripts/ path & add the below entry.

[root@testserver network-scripts]# cat route-eth0
10.217.156.0/25 via 10.194.32.1 dev eth0

save & exit.

Route will be persistent after every reboot of the Linux box.


Note : Permanent route need  network interface.reload for immediate effect.

Hope it helps.




Tuesday, November 14, 2017

Server hang at GRUB during boot - SOLVED

Tuesday, November 14, 2017 0
If a RHEL server hangs on boot with nothing more than the word GRUB in the upper left hand corner of the screen, this means that GRUB is unable to read its configuration file. If you actually get a GRUB menu, but the server does not boot then you have different and potentially more complex issue.

The most common reason for GRUB being unable to read its configuration is caused by a discrepancy between how the BIOS enumerated the hard drives and what GRUB expects to be its boot disk.


To correct this issue, boot the server in rescue mode.


Once booted into rescue mode and your root disk filesystems have been mounted. Check the /boot/grub/device.map file to ensure it has correctly identified the boot disk. hd0 should point to the disk that contains /boot. On an HP Proliant system you should see the following line:


(hd0) /dev/cciss/c0d0


If it does not, correct the file and then update GRUB by issuing the following command:


/sbin/grub --batch --device-map=/boot/grub/device.map --config-file=/boot/grub/grub.conf --no-floppy


And then from the GRUB prompt enter the following commands:


grub> root (hd0,0)
grub> setup (hd0)
grub> quit


You can now eject the ISO and reboot the server normally.

What is NUMA (Non-Uniform Memory Access)?

Tuesday, November 14, 2017 0
NUMA stands for Non-Uniform Memory Access,describes a system with more than one system bus. CPU resources and memory resources are grouped together into a “NUMA node.

The memory in a NUMA node is thus much more easily accessed by an associated CPU. A CPU that needs to access memory in a different node (“Remote Access”) will experience much higher latency, and thus reduced application performance. 


NUMA is, in short, an alternative approach to server architecture that links several small, high-performing nodes together inside a single server case.  


So long as the memory and CPU being used falls within the bounds of the NUMA node, local communication within a NUMA node allows a CPU much faster access to memory than in an ordinary system layout. This is especially important in the multi-GHz era of today, when CPUs operate significantly faster than the memory they are using. NUMA helps keep CPUs from entering a stalled state, waiting for data, by allowing the fastest possible access to memory. 



 Numactl is a Linux tool that you can use to view NUMA topology on Linux.


numactl -- hardware

[root@nsk ~]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 2047 MB
node 0 free: 1454 MB
node distances:
node   0
  0:  10

numactl -- show


[root@nsk ~]# numactl --show
policy: default
preferred node: current
physcpubind: 0
cpubind: 0
nodebind: 0
membind: 0
 

Sunday, November 12, 2017

BUG: soft lockup - CPU#0 stuck for 10s!

Sunday, November 12, 2017 0

•Soft lockups are situations in which the kernel's scheduler  subsystem has not been given a chance to perform its job for more than  10 seconds; they can be caused by defects in the kernel, by hardware  issues or by extremely high workloads.

Run following command and check whether you still encounter these "soft lockup" errors on the system:

# sysctl -w kernel.softlockup_thresh=30

To make this parameter persistent across reboots by adding following line in /etc/sysctl.conf file:

 kernel.softlockup_thresh=30


Note: The softlockup_thresh kernel parameter was introduced in Red Hat Enterprise Linux 5.2 in kernel-2.6.18-92.el5 thus it is not possible to modify this on older versions

SOLVED : Buffer I/O error on boot

Sunday, November 12, 2017 0
Situation:

•After upgrading from Red Hat Enterprise Linux (RHEL) 5.1 to RHEL 5.5 (kernel 2.6.18-53.el5 to 2.6.18-194.8.1.el5), a system started to show IO errors on boot.

•The boot process took more time than before, but there are otherwise no significant problems occuring.


SCSI device sdc: 419430400 512-byte hdwr sectors (214748 MB)

sdc: Write Protect is off
sdc: Mode Sense: 77 00 10 08
SCSI device sdc: drive cache: write back w/ FUA
SCSI device sdc: 419430400 512-byte hdwr sectors (214748 MB)
sdc: Write Protect is off
sdc: Mode Sense: 77 00 10 08
SCSI device sdc: drive cache: write back w/ FUA
sdc:end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
Dev sdc: unable to read RDB block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
unable to read partition table
sd 1:0:0:1: Attached scsi disk sdc
 Vendor: SUN       Model: LCSM100_S         Rev: 0735
 Type:   Direct-Access                      ANSI SCSI revision: 05

Solution:


follow below solution to remediate above issue.


•Switching the controller to active/active mode would allow the devices to be probed through both controller ports and prevent the errors.

•An option to speed up the boot process is to rebuild the initrd without the HBA driver kernel modules and then probe the devices post boot, ie

# mkinitrd -v -f --omit-scsi-modules /boot/initrd-2.6.18-194.8.1.el5.img 2.6.18-194.8.1.el5

Performance collection tools to gather data for fault analysis in VMware

Sunday, November 12, 2017
This article explains how to use performance collection tools to gather data for analysis of faults such as:
    Unresponsive ESX hosts
    Unresponsive virtual machines
    ESX host purple diagnostic screens

Why gather performance data for a fault?

If the diagnostic logs do not help you determine the cause of a fault, you may need to use performance collection tools to gather further data for analysis. Set up performance collections tools to gather data about faults that may occur.

Performance gathering tools

VMware recommends the following tools for gathering performance data: 

top
The top utility provides a list of CPU-intensive tasks for the ESX host Service Console.
Use top in batch mode for Fault troubleshooting by directing the output to a file so that it can be reviewed after a recurrence.


Note: The top command is not available for ESXi.
To run the top utility, run the command:


# top –bc –d <delay in seconds> [–n <iterations>] > output-perf-stats-file.txt

 
Use the information in the output file to identify any trends before the fault. 


esxtop
The esxtop tool provides performance statistics for the entire ESX/ESXi host. It provides details of network, storage, CPU, and memory load from the VMkernel perspective. It provides details on a VMkernel world basis.
esxtop
To collect the data over long periods of time, run esxtop in batch mode. Direct the output to a file so that it can be reviewed after the fault.


To run the esxtop tool, run the command:


# esxtop –b –d <delay in seconds> [-n <iterations>] > output-perf-statistics-file.csv

 
Like esxtop, the resxtop tool provides performance statistics for a specified ESX host in the network. It provides the same performance information as esxtop and may be used either after deploying the VMware vSphere Management Assistant (vMA) virtual appliance or installing the VMware Command-Line Interface (vCLI). 


To run the resxtop tool and collect batch performance data, log into the vMA or open the vCLI, and execute the command:


# resxtop [server] [vihost] [portnumber] [username] -b -d <delay in seconds> [-n <interations>] > output-perf-statistics-file.csv


vm-support -s

 
Use the vm-support command with the -s parameter to collect performance statistics, system configuration information, and logging. Submit the file generated by this command to VMware Support for further assistance, if required. 


Performance Monitor (PERFMON.EXE)

 
Microsoft's Performance Monitor is a utility that comes with every Microsoft Windows NT-based Operating System. This utility can be used to monitor local and remote Microsoft Windows machines. It can log performance data and display data from logs or real-time data.


This utility is useful when reviewing data collected from the esxtop tool and for troubleshooting virtual machine unresponsiveness. When using Performance Monitor for virtual machine unresponsiveness, collect the data remotely from another Microsoft Windows machine so that the utility does not affect the data being gathered.
For more information about Performance Monitor on your specific version of Windows, refer to Microsoft support sites.

Friday, November 10, 2017

Time command in Linux Server - Brief explanation

Friday, November 10, 2017
NAME
       time - time a simple command or give resource usage
      
Format
       time [options] command [arguments...]

The time command runs the specified program command with the given arguments.  When command finishes, time writes a message to standard error giving timing statistics about this program run.  These statistics consist of 


(i) the elapsed real time between invocation and termination,
(ii) the user  CPU time (the sum of the tms_utime and tms_cutime values in a struct tms as returned by times.
(iii) the system CPU time (the sum of the tms_stime and tms_cstime values in a struct tms as returned by times.

real %e
user %U
sys %S

%e - Elapsed real time (in seconds).
%U - Total number of CPU-seconds that the process spent in user mode.
%S - Total number of CPU-seconds that the process spent in kernel mode.

Ex:

[root@nsk-linux ~]# time route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.2.0        0.0.0.0         255.255.255.0   U     1      0        0 eth4
0.0.0.0         10.0.2.2        0.0.0.0         UG    0      0        0 eth4

real    0m0.001s
user    0m0.000s
sys     0m0.001s

[root@nsk-linux ~]# time uptime

 08:34:58 up 57 min,  2 users,  load average: 0.04, 0.12, 0.08

real    0m0.003s
user    0m0.002s
sys     0m0.001s

For more option, please read man time.

Thursday, November 9, 2017

How to find the file in between some days & delete the same?

Thursday, November 09, 2017
Follow the below steps to find out the particular modified file in between 20 to 30 days & delete the same.

Command to find and list the file

#find / -mtime +20 -mtime -30 -type f -name test.* -exec ls -al {} \;   

Command to delete the listed file.

#find / -mtime +20 -mtime -30 -type f -name test.* -exec rm {} \;

As per your needs, change the file name.

Hope it helps.