The CMS online Cloud (redundant and grizzly reborn) in CHEP2013

I presented the details of our new cloud: “The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)” on the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013). Find there details on our easily scalable and redundant infrastructure implementation.

The CMS online Cloud (redundant and grizzly reborn)

In August, we put in production our new implementation of the CMS online Cloud. I will present the results of the efforts by Olivier Chaze, Wojciech and myself in CHEP2013.

The new CMS online Cloud features now a complete easily scalable and redundant infrastructure. We implemented:

  • RabbitMQ in cluster with replicated queues as described by RabbitMQ people in here;
  • MySQL cluster after modifying minimally the definition of the tables needed by the different OpenStack services;
  • Grizzly OpenStack services in active/active using the clusterware corosync/pacemaker and Round Robin aliases as an easy load balancer when needed (keystone, dashboard, APIs…). I want to thank here Florian Haas that gave me a guideline to develop the clusterware agent scripts for the different services.

This made our 1300 hypervisors (~13000 cores) cloud resilient to failures! The controlling infrastructure described briefly above scales nicely. I’ll post the link to the presentation after it is given.