This Blog is to share our knowledge and expertise on Linux System Administration and VMware Administration

Monday, November 20, 2017

lsof command with example - Linvirtshell

Monday, November 20, 2017 0

The lsof is an excellent utility that can be used to identify process or user that is locking system resource such as file or network socket.

Basically, the /etc/services system files is usually used to map a numeric port number to a descriptive port name defined by user.

In the absence of any options, lsof lists all open files belonging to all active processes.

NAME

       lsof - list open files

To report all processes that are accessing the TCP sockets found on the system
#lsof -i TCP

To find out what process is holding TCP port 22
#lsof -i tcp:22

To show all resources that are being held by process id 2057
#lsof -p 2057

To find out what are the resources held by all cron processes. 
#lsof -o crond

To confirm resources that are being held by user id user linuser
#lsof -u linuser| more

To find out what are the processes that are locking the specify file /usr/sbin/anacron
# lsof /usr/sbin/anacron

Listing the files, any of whose IPv4 & IPv6 Internet address matches
# lsof -i4 -i6

Checking the count of open files
# lsof | wc -l

Listing of NFS files
#lsof -N

Hope it helps

Saturday, November 18, 2017

VM Tools version by using Powercli command - Vmware

Saturday, November 18, 2017 0
Use the below command to get the VM tools version.

get-vm |% { get-view $_.id } | select Name, @{ Name=";ToolsVersion";;

Expression={$_.config.tools.toolsVersion}}




Friday, November 17, 2017

Error "system was unable to find a physical volume" SOLVED -Step by Step

Friday, November 17, 2017 0

If we get Error  "system was unable to find a physical volume" . It needs  to restore  the corrupted Volume Group


Situation :

If the volume group metadata area of a physical volume is accidentally overwritten or otherwise destroyed, you will get an error message indicating that the metadata area is incorrect, or that the system was unable to find a physical volume with a particular UUID. You may be able to recover the data the physical volume by writing a new metadata area on the physical volume specifying the same UUID as the lost metadata.

Solution:

The following example shows the sort of output you may see if the metadata area is missing or corrupted.

[root@test]# lvs -a -o +devices

  Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
  Couldn't find all physical volumes for volume group VG.
  Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
  Couldn't find all physical volumes for volume group VG.

  ...

You may be able to find the UUID for the physical volume that was overwritten by looking in the /etc/lvm/archive directory. Look in the file VolumeGroupName_xxxx.vg for the last known valid archived LVM metadata for that volume group.

Alternately, you may find that deactivating the volume and setting the partial (-P) argument will enable you to find the UUID of the missing corrupted physical volume.

[root@test]# vgchange -an --partial

  Partial mode. Incomplete volume groups will be activated read-only.
  Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
  Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.

  ...

Use the --uuid and --restorefile arguments of the pvcreate command to restore the physical volume. The following example labels the /dev/sdh1 device as a physical volume with the UUID indicated above, zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf. This command restores the physical volume label with the metadata information contained in centos_00000-1802035441.vg, the most recent good archived metatdata for volume group .

The restorefile argument instructs the pvcreate command to make the new physical volume compatible with the old one on the volume group, ensuring that the the new metadata will not be placed where the old physical volume contained data (which could happen, for example, if the original pvcreate command had used the command line arguments that control metadata placement, or it the physical volume was originally created using a different version of the software that used different defaults).

The pvcreate command overwrites only the LVM metadata areas and does not affect the existing data areas.

[root@test]# pvcreate --uuid "zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf" --restorefile /etc/lvm/archive/centos_00000-1802035441.vg /dev/sdh1
  Physical volume "/dev/sdh1" successfully created

You can then use the vgcfgrestore command to restore the volume group's metadata.

[root@test]# vgcfgrestore VG
  Restored volume group VG 

You can now display the logical volumes.

[root@test]# lvs -a -o +devices

  LV     VG   Attr   LSize   Origin Snap%  Move Log Copy%  Devices

  stripe VG   -wi--- 300.00G                               /dev/sdh1 (0),/dev/sda1(0)
  stripe VG   -wi--- 300.00G                               /dev/sdh1 (34728),/dev/sdb1(0) 

The following commands activate the volumes and display the active volumes.

[root@test]# lvchange -ay /dev/VG/stripe
[root@test]# lvs -a -o +devices

  LV     VG   Attr   LSize   Origin Snap%  Move Log Copy%  Devices
  stripe VG   -wi-a- 300.00G                               /dev/sdh1 (0),/dev/sda1(0)
  stripe VG   -wi-a- 300.00G                               /dev/sdh1 (34728),/dev/sdb1(0)

If the on-disk LVM metadata takes as least as much space as what overrode it, this command can recover the physical volume. If what overrode the metadata went past the metadata area, the data on the volume may have been affected. You might be able to use the fsck command to recover that data


Thursday, November 16, 2017

How to add temporary & permanent route in Linux box?

Thursday, November 16, 2017 0

Follow the below steps for adding the route as  temporary or permanent . 

Temporary Route

We just need to add the ip route add command in command line. like below

[root@testserver ~]#ip route add 10.217.156.0/25 via 10.194.32.1 dev eth0

[root@testserver ~]# route -n | grep -i 10.217.156.0

10.217.156.0    10.194.32.1     255.255.255.128 UG    0      0        0 eth0

Above route entry will be invicible post reboot the linux box.

Permanent Route

We need to create a file route-eth0 (since it is pointing eth0 device) under /etc/sysconfig/network-scripts/ path & add the below entry.

[root@testserver network-scripts]# cat route-eth0
10.217.156.0/25 via 10.194.32.1 dev eth0

save & exit.

Route will be persistent after every reboot of the Linux box.


Note : Permanent route need  network interface.reload for immediate effect.

Hope it helps.




Tuesday, November 14, 2017

Server hang at GRUB during boot - SOLVED

Tuesday, November 14, 2017 0
If a RHEL server hangs on boot with nothing more than the word GRUB in the upper left hand corner of the screen, this means that GRUB is unable to read its configuration file. If you actually get a GRUB menu, but the server does not boot then you have different and potentially more complex issue.

The most common reason for GRUB being unable to read its configuration is caused by a discrepancy between how the BIOS enumerated the hard drives and what GRUB expects to be its boot disk.


To correct this issue, boot the server in rescue mode.


Once booted into rescue mode and your root disk filesystems have been mounted. Check the /boot/grub/device.map file to ensure it has correctly identified the boot disk. hd0 should point to the disk that contains /boot. On an HP Proliant system you should see the following line:


(hd0) /dev/cciss/c0d0


If it does not, correct the file and then update GRUB by issuing the following command:


/sbin/grub --batch --device-map=/boot/grub/device.map --config-file=/boot/grub/grub.conf --no-floppy


And then from the GRUB prompt enter the following commands:


grub> root (hd0,0)
grub> setup (hd0)
grub> quit


You can now eject the ISO and reboot the server normally.

What is NUMA (Non-Uniform Memory Access)?

Tuesday, November 14, 2017 0
NUMA stands for Non-Uniform Memory Access,describes a system with more than one system bus. CPU resources and memory resources are grouped together into a “NUMA node.

The memory in a NUMA node is thus much more easily accessed by an associated CPU. A CPU that needs to access memory in a different node (“Remote Access”) will experience much higher latency, and thus reduced application performance. 


NUMA is, in short, an alternative approach to server architecture that links several small, high-performing nodes together inside a single server case.  


So long as the memory and CPU being used falls within the bounds of the NUMA node, local communication within a NUMA node allows a CPU much faster access to memory than in an ordinary system layout. This is especially important in the multi-GHz era of today, when CPUs operate significantly faster than the memory they are using. NUMA helps keep CPUs from entering a stalled state, waiting for data, by allowing the fastest possible access to memory. 



 Numactl is a Linux tool that you can use to view NUMA topology on Linux.


numactl -- hardware

[root@nsk ~]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 2047 MB
node 0 free: 1454 MB
node distances:
node   0
  0:  10

numactl -- show


[root@nsk ~]# numactl --show
policy: default
preferred node: current
physcpubind: 0
cpubind: 0
nodebind: 0
membind: 0
 

Sunday, November 12, 2017

BUG: soft lockup - CPU#0 stuck for 10s!

Sunday, November 12, 2017 0

•Soft lockups are situations in which the kernel's scheduler  subsystem has not been given a chance to perform its job for more than  10 seconds; they can be caused by defects in the kernel, by hardware  issues or by extremely high workloads.

Run following command and check whether you still encounter these "soft lockup" errors on the system:

# sysctl -w kernel.softlockup_thresh=30

To make this parameter persistent across reboots by adding following line in /etc/sysctl.conf file:

 kernel.softlockup_thresh=30


Note: The softlockup_thresh kernel parameter was introduced in Red Hat Enterprise Linux 5.2 in kernel-2.6.18-92.el5 thus it is not possible to modify this on older versions

SOLVED : Buffer I/O error on boot

Sunday, November 12, 2017 0
Situation:

•After upgrading from Red Hat Enterprise Linux (RHEL) 5.1 to RHEL 5.5 (kernel 2.6.18-53.el5 to 2.6.18-194.8.1.el5), a system started to show IO errors on boot.

•The boot process took more time than before, but there are otherwise no significant problems occuring.


SCSI device sdc: 419430400 512-byte hdwr sectors (214748 MB)

sdc: Write Protect is off
sdc: Mode Sense: 77 00 10 08
SCSI device sdc: drive cache: write back w/ FUA
SCSI device sdc: 419430400 512-byte hdwr sectors (214748 MB)
sdc: Write Protect is off
sdc: Mode Sense: 77 00 10 08
SCSI device sdc: drive cache: write back w/ FUA
sdc:end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
Dev sdc: unable to read RDB block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
unable to read partition table
sd 1:0:0:1: Attached scsi disk sdc
 Vendor: SUN       Model: LCSM100_S         Rev: 0735
 Type:   Direct-Access                      ANSI SCSI revision: 05

Solution:


follow below solution to remediate above issue.


•Switching the controller to active/active mode would allow the devices to be probed through both controller ports and prevent the errors.

•An option to speed up the boot process is to rebuild the initrd without the HBA driver kernel modules and then probe the devices post boot, ie

# mkinitrd -v -f --omit-scsi-modules /boot/initrd-2.6.18-194.8.1.el5.img 2.6.18-194.8.1.el5