The CMS online Cloud (redundant and grizzly reborn) in CHEP2013

I presented the details of our new cloud: “The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)” on the¬†20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013). Find there details on our¬†easily scalable and redundant infrastructure implementation.

The CMS online Cloud (redundant and grizzly reborn)

In August, we put in production our new implementation of the CMS online Cloud. I will present the results of the efforts by Olivier Chaze, Wojciech and myself in CHEP2013.

The new CMS online Cloud features now a complete easily scalable and redundant infrastructure. We implemented:

  • RabbitMQ in cluster with replicated queues as described by RabbitMQ people in here;
  • MySQL cluster after modifying minimally the definition of the tables needed by the different OpenStack services;
  • Grizzly OpenStack services in active/active using the clusterware corosync/pacemaker and Round Robin aliases as an easy load balancer when needed (keystone, dashboard, APIs…). I want to thank here Florian Haas that gave me a guideline to develop the clusterware agent scripts for the different services.

This made our 1300 hypervisors (~13000 cores) cloud resilient to failures! The controlling infrastructure described briefly above scales nicely. I’ll post the link to the presentation after it is given.