Tuesday, January 10, 2012

Memory Hotplug for Linux Guests

Recently I was asked to increase the RAM in a couple of the development VM's, but this request came with a twist. We could not afford a reboot. It would waste a lot of time for the dev team to stop all the engines, start them up again after the reboot and wait for the VM to catch up and download all the relevant data from the database.

VMware forums were lacking in detail about the hot-add compatibility with client operating systems, so I realised I’d better lookup for a solution on Google and try it for my self see how it works.

The Hot add hardware feature is only supported on the VM hardware version 7. Once this was verified, I made sure the Edit Settings > Options > General Options was set to the correct OS type. This is important, as the interface will only display the Memory/CPU Hotplug options for supported OSes. In my case I was running CentOS 6.2 x86_64, so selected Red Hat Enterprise linux (64-bit).

Next, the " Memory/CPU Hotplug feature in Edit Settings > Options " should be enabled.

I found that the CentOS build I was using (2.6.32-220.el6.x86_64) recognises hot added memory automatically.

The VM was running with 4GB RAM, so I added another 4GB RAM and now it had 8GB RAM allocated to it.

When memory is hotplugged, the kernel recognizes new memory, makes new memory management tables, and makes sysfs files for new memory’s operation. If firmware supports notification of connection of new memory to OS, this phase is triggered automatically. ACPI can notify this event. If not, “probe” operation by system administration is used instead.

Within the "/sys/devices/system/memory" directory there are a number of folders all named ‘memoryX’ where X represents a unique ‘section’ of memory. How big each section is, and hence how many folders you have is dependant on your environment.

[root@vm24_dev ~]# ls -lrth /sys/devices/system/memory
total 0
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory9
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory8
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory7
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory6
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory5
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory4
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory3
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory2
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory11
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory10
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory1
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory0
--w-------. 1 root root 4.0K Jan 10 14:14 probe
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory71
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory70
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory69
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory68
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory67
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory66
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory65
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory64
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory63
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory62
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory61
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory60
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory59
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory58
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory57
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory56
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory55
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory54
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory53
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory52
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory51
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory50
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory49
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory48
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory47
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory46
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory45
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory44
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory43
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory42
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory41
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory40
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory39
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory38
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory37
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory36
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory35
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory34
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory33
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory32
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory23
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory22
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory21
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory20
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory19
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory18
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory17
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory16
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory15
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory14
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory13
drwxr-xr-x. 2 root root 0 Jan 10 14:14 memory12
-rw-r--r--. 1 root root 4.0K Jan 10 14:14 soft_offline_page
-rw-r--r--. 1 root root 4.0K Jan 10 14:14 hard_offline_page
-r--r--r--. 1 root root 4.0K Jan 10 14:14 block_size_bytes

You can check the file "/sys/devices/system/memory/block_size_bytes" to view the size of sections in bytes. Basically, the whole memory has been divided up into equal sized chunks as per the SPARSEMEM memory model.

[root@vm24_dev ~]# cat /sys/devices/system/memory/block_size_bytes
8000000

In each section’s folder there is a file called ‘state’, and in each file is one of two words; online or offline.
Locate the memoryX folder(s) which account for the hot added memory by working out the section sizes above, or (like me), just check the contents of the state files:

[root@vm24_dev ~]# cat /sys/devices/system/memory/memory39/state
online

Once you locate the offline sections, you can bring them online as follows:

[root@vm24_dev ~]#echo online > /sys/devices/system/memory/memory40/state
Validate the memory change using:

[root@vm24_dev ~]# free
total used free shared buffers cached
Mem: 8060484 262040 7798444 0 8080 60648
-/+ buffers/cache: 193312 7867172
Swap: 11300856 0 11300856

I noticed that William Lam (lamw on the VMware communities) created a nice script to automate the discovery and online process. It’s very neat and can be downloaded from : http://communities.vmware.com/docs/DOC-10492

You can also create it as follows:

[root@vm24_dev ~]# vi online_hotplug_memory.sh

Paste the following content in to the file and save it.
-------------------------------------------------------------------------
#!/bin/bash
# William Lam
# http://engineering.ucsb.edu/~duonglt/vmware/
# hot-add memory to LINUX system using vSphere ESX(i) 4.0
# 08/09/2009

if [ "$UID" -ne "0" ]
then
echo -e "You must be root to run this script.\nYou can 'sudo' to get root access"
exit 1
fi


for MEMORY in $(ls /sys/devices/system/memory/ | grep memory)
do
SPARSEMEM_DIR="/sys/devices/system/memory/${MEMORY}"
echo "Found sparsemem: \"${SPARSEMEM_DIR}\" ..."
SPARSEMEM_STATE_FILE="${SPARSEMEM_DIR}/state"
STATE=$(cat "${SPARSEMEM_STATE_FILE}" | grep -i online)
if [ "${STATE}" == "online" ]; then
echo -e "\t${MEMORY} already online"
else
echo -e "\t${MEMORY} is new memory, onlining memory ..."
echo online > "${SPARSEMEM_STATE_FILE}"
fi
done
-------------------------------------------------------------------------
[root@vm24_dev ~]# chmod +x online_hotplug_memory.sh
[root@vm24_dev ~]# ./online_hotplug_memory.sh

The out put should be as follows :

[root@vm24_dev ~]# ./online_hotplug_memory.sh
Found sparsemem: "/sys/devices/system/memory/memory0" ...
memory0 already online
Found sparsemem: "/sys/devices/system/memory/memory1" ...
memory1 already online
Found sparsemem: "/sys/devices/system/memory/memory2" ...
memory2 already online
Found sparsemem: "/sys/devices/system/memory/memory3" ...
memory3 already online
Found sparsemem: "/sys/devices/system/memory/memory4" ...
memory40 is new memory, onlining memory ...
Found sparsemem: "/sys/devices/system/memory/memory5" ...
memory41 is new memory, onlining memory ...
Found sparsemem: "/sys/devices/system/memory/memory7" ...
memory42 is new memory, onlining memory ...

That’s it! Quite simple really.

Cheers !
Harish.

No comments: