The CMS online Cloud (redundant and grizzly reborn)

In August, we put in production our new implementation of the CMS online Cloud. I will present the results of the efforts by Olivier Chaze, Wojciech and myself in CHEP2013.

The new CMS online Cloud features now a complete easily scalable and redundant infrastructure. We implemented:

  • RabbitMQ in cluster with replicated queues as described by RabbitMQ people in here;
  • MySQL cluster after modifying minimally the definition of the tables needed by the different OpenStack services;
  • Grizzly OpenStack services in active/active using the clusterware corosync/pacemaker and Round Robin aliases as an easy load balancer when needed (keystone, dashboard, APIs…). I want to thank here Florian Haas that gave me a guideline to develop the clusterware agent scripts for the different services.

This made our 1300 hypervisors (~13000 cores) cloud resilient to failures! The controlling infrastructure described briefly above scales nicely. I’ll post the link to the presentation after it is given.

The CMS online Cloud

Another milestone has been achieved. We were able to setup the two bonded 10Gbit links through the “data networks” to serve the cloud with minimal changes on the existing infrastructure.

We configured the networks at the level of the router/switches/hypervisors, and we used OpenvSwitch to redirect the flow (and change the MAC Address of the gateway for this flow) to specific computers, where traffic goes, through the high bandwidth network we have. This means that without a modification on the side of the VMs, they will be using our high bandwidth network to talk to the “data providing” computers.

You can find some information on the talk given by Wojciech: “CMS Experience and plans“.

OpenStack summit in Portland & HEPiX Summer Workshop 2013

In less than 2 weeks we are going to present the current status of our OpenStack installation at two important conferences related to clouds and grids.

On 15th of April Toni is going to Portland, where the nearest Openstack summit will take place. He will give detailed information about the architecture of CMSooooCloud, installation and configuration aspects and the big scale performance tests (about 1200 hypervisors under control of one OpenStack cloud manager).

At the same time I am going to talk about the virtualization and re-usage of resources in Bologna (Italy) at The High Energy Physics Unix Information Exchange. I will focus on benefits of applying the cloud approach in scientific environment. I will also cover aspect of network virtualization using OpenVSwitch.

We hope to meet some of you there.

The CMS online Cloud

Find here the talk The CMS openstack, openvswitch-ed, opportunistic, overlay, online-cluster Cloud (CMSoooooCloud) about our CMS online Cloud on
The International Symposium on Grids and Clouds (ISGC) 2013.

You will also find information on the CMS online Cloud and other efforts at CERN to use the clusters when not fully used on the talk Offline usage of online farms presented on ALICE, ATLAS, CMS & LHCb joint workshop on DAQ@LHC

 Have fun.

No adds in this blog. Google denied because of insufficient content

Hi,

Google denied my application to have adds through AdSense because of insufficient content…  There is not much content yet, so I can not dispute much, right?

After reading about the cookies that they put on your browser I may not want you to be exposed to their adds either… I still have to decide, but I may not reapply to get adds in here.

 

The CMS online Cloud

Together with Wojciech Ozga we put together an overlay cloud with OpenStack: the CMSoCloud or CMSoooCloud or CMSoooooCloud…

It has, when full, 1300 hypervisor (~13000 physical cores).

Consider voting for the presentation under this link. You will need to register to the community first. Follow this link to register.

Thanks in advance for your support!

To test our OpenStack infrastructure we decided that our CPU should not be wasted and Wojciech picked up a project to help studying Alzheimer’s, Huntington’s, Parkinson’s, and many cancers. Help them too! Follow this link and help the folding@home project. We managed while testing the stability of our infrastructure to finalise more than 57000 Work Units only on a few weeks using more than 1000 hypervisors! Follow our progress in this link. Lately we have not been able to dedicate many CPU cycles… We are using the infrastructure ourselves… But we’ll try to continue helping…

This illustrates that one can donate CPU cycles to projects. Cloud infrastructure makes it easier. With an overlay one, as ours, your computers can be useful even if you don’t use them!