Virtually Curious

Tuesday, May 8, 2018

docker: Error response from daemon: Server error from portlayer

docker: Error response from daemon: Server error from portlayer:

I got this error while running a new container in VIC.

I tried to move the VCH to a different ESXi Host, but no luck.

It is not advisable to reboot the VCH from vCenter, so had to skip that.

Tried to search the internet and came up with a similar error but a different cause - https://github.com/vmware/vic/issues/5782

This did not get me anywhere so I started looking from a higher level and this is what I noticed.

Turned out I had upgraded the VIC from 1.2.1 to 1.3 last weekend and complete forgot to upgrade this last VCH.

As you can see in the image above the VCH I was trying to run a container against was running on 1.2.1. Duh !

So I upgraded the VCH by running the following command -

./vic-machine-linux upgrade --id xxxxxx

And here is the result -

And ta-da I was able to run the new nginx container I was trying to deploy earlier.

Thursday, March 29, 2018

How to find the disk size for an Avamar BMR restore

We had a scenario where a BMR (Bare Metal Restore) system state restore was required.

A skeleton VM was created with drives matching the original server and booted from the Avamar BMR ISO. The restore was kicked off and after 90% the restore failed. Why?

Because the disk sizes did not match. The VM had several disks, some were stripped volumes some as spanned volumes. When the restore skeleton VM was created, the Admins just matched the volumes to the old VM. Avamar has no visiblity of how the disks are laid out within the OS.

Example of how the disks were laid out in VMware and inside the OS -

So the question is how do you find the disk size if the VM is down and you need a BMR restore?

Avamar has a very nice Avtar command line utility. I could not find an official document from EMC but there is bits and pieces of information on the web.

avtar.exe --help will give you a wide range of options.

Steps to find the drive size for a BMR restore -

On your Windows workstation, install the Avamar client software and do not register it with the Avamar console.
The default installation path will be - C:\Program Files\avs\
Run the following command

C:\Program Files\avs\bin>avtar.exe -x --server=IP_of_Avamar_Server --id=Username --ap=
Password --path=/clients/servername_FQDN

--labelnum=label_number_of_the_backup_to_be_restored --internal --target=.\tmp\ .system_info

--target=.\tmp will create a tmp directory under avs\bin and the output of the command will be under this directory.

There will be numerous XML files under \tmp.
The useful files are CriticalVolumesMapping.xml and partitiontables.xml

CriticalVolumesMapping.xml will give you the details of how the disks are laid out within the OS.

Example -
-<VolumeMappings Version="2.0">

We can clearly see that -

Disks 3,4,5,6 make up Logical volume E:

Disks 8,9 make up Logical volume I:

partitiontables.xml will provide you with the disk sizes

Example -

-<PhysicalDisk NumPartitions="4" DiskSize_bytes="274872407040" PartioningScheme="MBR" MBRSignature="1720029347" DiskType="Fixed" DiskNumber="8" DiskSize_Gbytes="255" SectorSize_bytes="512">

-<PartitionList>

<Partition Size_bytes="274876826112" Start_bytes="32256" partStyle="MBR" SerialNumber_dec="0" SerialNumber_hex="0" PartitionNumber="0" Size_Gbytes="255" Type="Alternate Linux swap" Bootable="false"/>

<Partition Size_bytes="0" Start_bytes="0" partStyle="MBR" SerialNumber_dec="0" SerialNumber_hex="0" PartitionNumber="1" Size_Gbytes="0" Type="Empty" Bootable="false"/>

<Partition Size_bytes="0" Start_bytes="0" partStyle="MBR" SerialNumber_dec="0" SerialNumber_hex="0" PartitionNumber="2" Size_Gbytes="0" Type="Empty" Bootable="false"/>

<Partition Size_bytes="0" Start_bytes="0" partStyle="MBR" SerialNumber_dec="0" SerialNumber_hex="0" PartitionNumber="3" Size_Gbytes="0" Type="Empty" Bootable="false"/>

</PartitionList>

</PhysicalDisk>

-<PhysicalDisk NumPartitions="4" DiskSize_bytes="274872407040" PartioningScheme="MBR" MBRSignature="1720029346" DiskType="Fixed" DiskNumber="9" DiskSize_Gbytes="255" SectorSize_bytes="512">

-<PartitionList>

<Partition Size_bytes="0" Start_bytes="0" partStyle="MBR" SerialNumber_dec="0" SerialNumber_hex="0" PartitionNumber="1" Size_Gbytes="0" Type="Empty" Bootable="false"/>

<Partition Size_bytes="0" Start_bytes="0" partStyle="MBR" SerialNumber_dec="0" SerialNumber_hex="0" PartitionNumber="2" Size_Gbytes="0" Type="Empty" Bootable="false"/>

<Partition Size_bytes="0" Start_bytes="0" partStyle="MBR" SerialNumber_dec="0" SerialNumber_hex="0" PartitionNumber="3" Size_Gbytes="0" Type="Empty" Bootable="false"/>

In the above scenario, Disks 8 and 9 together make up Volume I: and from the partitiontables.xml file you know the total size of I: will be 510GB. While creating the skeleton VM you will create a 510 GB drive.

Once you create the exact size and number of drives required, Avamar will perform a restore without any hiccups.

Problem solved !

Thursday, March 15, 2018

VMware Stack Upgrade

I’m working on a plan to upgrade our existing VMware Stack and wanted to write a detailed post about all components involved and the order of upgrade.

Current State –

· Primary and Recovery site with XtremIO Arrays on both ends with physical RecoverPoint appliances for array based replication.

· vCenter Servers on both sites (with embedded Platform Services Controller) running on Windows Server 2012 R2 with an external SQL Database.

· ESXi 6.0.0, 3825889

· Site Recovery Manager (SRM) 6.1.1.13825

· Storage Replication Adapter (SRA) 2.2.0.3

· XtremIO (XIOS) 4.0.15-24, XMS 4.2.1

· RecoverPoint (RPA) 4.4.1

· Avamar 7.3

Future State –

· vCenter Server Appliance (VCSA) 6.5 U1f

· ESXi 6.5, 7388607

· SRM 6.5.1, 6014840

· SRA 2.2.0.3

· XIOS 6.0.1, XMS 6.0.1

· RecoverPoint (RPA) 5.1

· Avamar 7.5

There are numerous inter-dependencies to get to the future state.

· Avamar 7.3 is not compatible with vCenter 6.5

· RPA 4.4.1 is not compatible with ESXi 6.5

· Upgrading SRM from 6.0.x to 6.5 is not supported. You have to upgrade to 6.1.x before you upgrade to 6.5 (in my case that’s not required since already running on 6.1.1.x)

· Any vCenter upgrade will break SRM until both sides are on the same level. I have Array replication active in case of a disaster while upgrading SRM.

Prerequisites –

· Backup of vCenter Database on primary and recovery site.

· Backup of SRM vPostgres Database on primary and recovery sit. (Explained in detail below)

· Primary and Recovery Site Platform Services Controller and vCenter server instances must be running.

Order of Upgrade –

· Upgrade Avamar to 7.5

· RPA 4.4.1 is not compatible with ESXi 6.5 hence upgrade that to RPA 5.1

· Upgrade vCenter Server to VCSA 6.5 GA at primary site.

· Upgrade SRM to 6.5 at primary site. Note: SRM cannot be upgraded from 6.1.1 to 6.5.1

· Upgrade SRA at primary site – Not required since running on latest

· vCenter Server to VCSA 6.5 GA at recovery site.

· Upgrade SRM to 6.5 at recovery site.
· Upgrade SRA at recovery site – Not required since running on latest.

· Upgrade vCenter from 6.5 GA to 6.5U1g at primary site.

· Upgrade SRM from 6.5 to 6.5.1 at primary site.

·         Upgrade vCenter Server to VCSA 6.5U1g at recovery site.
·         Upgrade SRM from 6.5 to 6.5.1 at recovery site.
·         Verify connection between SRM. Verify Protection groups and recovery plans are valid.
· Upgrade ESXi to 6.5, 7388607 at recovery site.

· Upgrade ESXi to 6.5, 7388607 at primary site.

· Upgrade virtual hardware and then VMtools on Virtual Machines – Can be scheduled during the next available outage window.

· Upgrade XIOS and XMS to 6.0.1

Backup & Restore (if required) the SRM Embedded vPostgres Database -

1) Log into the system on which you installed Site Recovery Manager Server.

2) Stop the Site Recovery Manager service.

3) Navigate to the folder that contains the vPostgres commands.

4) If you installed Site Recovery Manager Server in the default location, you find the vPostgres commands in C:\Program Files\VMware\VMware vCenter Site Recovery Manager Embedded Database\bin.

5) Create a backup of the embedded vPostgres database by using the pg_dump command.

pg_dump -Fc --host 127.0.0.1 --port port_number --username=db_username srm_db > srm_backup_name. To create a backup you need the admin password. We did not have the Admin password documented. Here is a link on how to reset the Admin password - http://virtuallycurious.blogspot.com/2018/06/the-case-of-forgotten-site-recovery.html

You set the port number, username, and password for the embedded vPostgres database when you installed Site Recovery Manager. The default port number is 5678. The database name is srm_db and cannot be changed.

6) Start the Site Recovery Manager service.

7) Restore (if things go south) by using the pg_restore command

pg_restore -Fc --host 127.0.0.1 --port port_number --username=db_username --dbname=srm_db srm_backup_name

References –

· VMware Product Interoperability Matrices -http://partnerweb.vmware.com/comp_guide2/sim/interop_matrix.php

· Update sequence for vSphere 6.5 and its compatible VMware products (2147289) - https://kb.vmware.com/s/article/2147289

· Backup and Restore the embedded vPostgres Database - https://docs.vmware.com/en/Site-Recovery-Manager/6.5/srm-install-config-6-5.pdf

· EMC Recoverpoint SRA compatibility Matrix - https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=sra&productid=39129

· Compatibility Matrix for SRM 6.5 - https://www.vmware.com/support/srm/srm-compat-matrix-6-5.html

· Order of upgrading vSphere and SRM components - https://docs.vmware.com/en/Site-Recovery-Manager/6.5/com.vmware.srm.install_config.doc/GUID-E7B47738-C63D-4A05-9A13-7C5FF20801A7.html

· Avamar Comaptiblity Matrix - https://support.emc.com/docu32263_Avamar-Compatibility-and-Interoperability-Matrix.pdf?language=en_US

Sunday, November 26, 2017

Sharing NFS-backed Volumes Between Containers

vSphere Integrated Containers supports two types of volumes, each of which has different characteristics.

VMFS virtual disks (VMDKs), mounted as formatted disks directly on container VMs. These volumes are supported on multiple vSphere datastore types, including NFS, iSCSI and VMware vSAN. They are thin, lazy zeroed disks.
NFS shared volumes. These volumes are distinct from a block-level VMDK on an NFS datastore. They are Linux guest-level mounts of an NFS file-system share.

VMDKs are locked while a container VM is running and other containers cannot share them.

NFS volumes on the other hand are useful for scenarios where two containers need read-write access to the same volume.

To use container volumes, one must first declare or create a volume store at the time of VCH creation

You must use the vic-machine create --volume-store option to create a volume store at the time of VCH creation.

You can add a volume store to an existing VCH by using the vic-machine configure --volume-store option. If you are adding volume stores to a VCH that already has one or more volume stores, you must specify each existing volume store in a separate instance of --volume-store.

Note: If you do not specify a volume store, no volume store is created by default and container developers cannot create or run containers that use volumes.

In my example, I have assigned a whole vSphere datastore as a volume store and would like to add a new NFS volume store to the VCH. The syntax is as follows -

$ vic-machine-operating_system configure

--target vcenter_server_username:password@vcenter_server_address

--thumbprint certificate_thumbprint --id vch_id

--volume-store datastore_name/datastore_path:default

--volume-store nfs://datastore_name/path_to_share_point:nfs_volume_store_label

nfs://datastore_name/path_to_share_point:nfs_volume_store_label is adding a NFS datastore in vSphere as the volume store. Which means when a volume is created on it, it will be a VMDK file which cannot be shared between containers.

Add a NFS mountpoint to be able to share between containers. You need to specify the URL, UID, GID and access protocol.

Note:You cannot specify the root folder of an NFS server as a volume store.

The syntax is as follows -

--volume-store nfs://datastore_address/path_to_share_point?uid=1234&gid=5678&proto=tcp:nfs_volume_store_label

If you do not specify the UID and GID the default is 1000. Read more and UID and GID here.

Before adding NFS Volume Store

After adding NFS Volume Store

Two things to note in my example-

1) I am running a VM based RHEL backed NFS and the UID and GID did not work for me.

2) The workaround was to manually change permissions on the test2 folder on the NFS Share point by using chmod 777 test2

Now that the NFS volume store is added, lets create a volume -

Next, deploy two containers with My_NFS_Volume mounted on both.

To check if the volume was mounted docker inspect containername and you will see the details under Mounts. You will see the name as well as the read/write mode.

Now lets create a .txt file from one container and check from the other if it can be seen.

This concludes we can share NFS-backed volumes between containers.

Friday, November 17, 2017

Enable SSH and pings to PhotonOS

In the previous post we saw how to configure static IP for PhotonOS.

Lets take a look at how to enable SSH and set to start at boot.

Two simple commands -

# Start Service - systemctl start sshd

# Configure SSH service to automatically start at boot - systemctl enable sshd

PhotonOS uses iptables firewall which by default will block everything except SSH.

Lets allow pings using the following commands -

iptables -A OUTPUT -p icmp -j ACCEPT

iptables -A INPUT -p icmp -j ACCEPT

Note: This change is not persistent.

So how do we get this to be persistent ? Lets see -

/etc/systemd/scripts/iptables is the script that gets executed on iptables service start. So we can add our rules at the end of this script and ICMP rules will be persistent.

Reboot and check it out yourself !

Configure Static IP on PhotonOS

To obtain the name of your Ethernet link run the following command: networkctl

If this is the first time you are using Photon OS, you will only see the first 2 links. The others got created because I ran some docker swarms and created customer network bridges.

The network configuration file is located at -

/etc/systemd/network/

You might see the file 10-dhcp-eth0.network. I renamed this file to static.

You can do this by running the following command -

root@photon [ ~ ]# mv /etc/systemd/network/10-dhcp-eth0.network /etc/systemd/network/10-static-eth0.network

Use vi editor to edit the file and add your static IP, Gateway, DNS, Domain and NTP.

This is how the file would look like.

root@photon [ ~ ]# cat /etc/systemd/network/10-static-eth0.network

[Match]

Name=eth0 <<<<<<< “Make sure to change this to your adapter. ipconfig to check adapter name”

[Network]

Address=10.xx.xx.xx/24

Gateway=10.xx.xx.1

DNS=10.xx.xx.xx 10.xx.xx.xx

Domains=na.xx.com

NTP=time.nist.gov

Apply the changes by running -

systemctl restart systemd-networkd

Try to ping out form the OS.

Note: You will not able able to ping this VM as by default the iptables firewall blocks everything except SSH. In my next blog I will explain how to allow ping on iptables.

Friday, November 10, 2017

Pull and Push Images from Private Repository (Project Respository) inside a dch-photon container

vSphere Integrated Containers v1.2 includes the ability to provision native Docker container hosts (DCH). The DCH is distributed by VMware through Docker Hub.

Refer to my previous blog about Docker Swarm

Each node in this swarm is a native docker container host. A service that is deployed on the swarm is nothing but containers deployed on top of individual swarm nodes. Which means you can run a docker ps on individual swam nodes and get a list of containers running on each of them.

Swarm_test is our VCH.

Manager1 is the swarm manager and worker1, worker2, worker3 are the worker nodes.

As you can see the swarm is running 2 services - portainer and web

There are 4 replicas of the web service. Service ps web shows you the 4 instances with their IDs.

These are nothing but 4 individual containers running on each of the nodes.

Note the service IDs on the swarm which runs as a container on manager1 and worker1.

You can use either deploy images from Docker Hub or use docker-compose.yml to define your application made up of multiple containers.

What if a Developer wants to pull and push images from a private repository inside a Project created in your VIC Management Portal ?

So I tried to connect to my private repository and got a certificate error.

I am trying to login from worker2 to vic.xx.xxx.com which is my VIC Manager.

Lets copy the right certificate so we can login to our private registry. To get the certificate login to your VIC management portal - https://vicmanagerip:8282 and login using an Admin account. I logged in with administrator@vsphere.local

Download the certificate. The certificate needs to be copied on the worker2 node at - /etc/docker/certs.d/yourFQDN_VIC_manager_name/

You need to create certs.d and yourFQDN_VIC_manager_name directory. Here is how -

Lets try to connect once again -

Success! Similarly you can copy the cert to all other nodes thus letting you push and pull images from the private registry and deploy them straight from or to a swarm node.