Distributed computing: an ITU BR experience

Distributed computing: an ITU BR experience (by Pham N. Hai, Head, Broadcasting Services Division and Andrea Manara, Systems Analyst, Broadcasting Service Division)

This article discusses the ITU distributed computing experience developed together with EBU and CERN for the Regional Radiocommunication Conference 2006 (RRC-06).

Cooperation with EBU and CERN

A key ingredient for the success of the Conference was the unprecedented level of cooperation between ITU, the European Broadcasting Union (EBU) and the European Organization for Nuclear Research (CERN).

The complex planning activities conducted at this conference and during the intersessional period were based on the software developed by EBU, which includes hundreds of thousands of programme lines. In preparing the Plan for digital terrestrial broadcasting, ITU experts performed meticulous calculations within a limited timeframe using two independent infrastructures: the ITU distributed system with 100 PCs; and the CERN Grid infrastructure that is based on a few hundred dedicated CPUs from several European institutions.

Distributed computing: an ITU BR experience

To meet its statutory obligations for daily processing of space and terrestrial radio service notices and to solve complex problems as planning of limited radio spectrum and orbit resources for treaty-related conferences the ITU Radiocommunication Bureau (BR) has a need to perform CPU intensive calculations in a limited timeframe.

To reach this goal the BR developed extensive experience in the development and the usage of distributed computing and related technologies, including grids. These technologies are aimed to reduce the overall processing time by splitting CPU-intensive calculations into smaller tasks and running those tasks in parallel on several PCs.

The early days

The current planning procedures for HF broadcasting services under the Article 12 of the Radio Regulations that came into force in 1999 required the BR to perform field strength and compatibility calculations of all HF broadcasting transmissions used throughout the world. These calculations (involving a high number of HF transmissions and test points) needed to be performed monthly to take into account different propagation conditions in the ionosphere. The total capacity required for the monthly processing was over 150 hours on a standard Desktop 1999 PC.

To face this challenge the BR decided to use staff Desktop PCs available overnight. Those machines where grouped in a pool consisting of up to 50 PCs, with a priority mechanism favoring those with higher processing power.

A “server” program, developed in Visual Basic and taking advantage of the Windows scheduling service, was designed to check PC’s availability, to subdivide calculations into smaller tasks and to submit those tasks to available PCs. This “server” program also retrieves calculations results verifies job completeness and prepares a run report.

The HFBC distributed system is still in use today however it does not require as many computers as in 1999 thanks to faster personal computers processors and continued optimization of the calculation computing routines.

Planning for digital broadcasting services (RRC-06)

In 2002 the BR developed a distributed computing client-server system in order to speed up processing of terrestrial notices in the frequency bands shared by terrestrial and space services. This system used limited resources, consisting in a few staff Desktop PCs available overnight, and has successfully been used since then. This system was greatly enhanced during the BR preparation for the Regional Radio Conference RRC-06, which established in 2006 a new frequency plan for the introduction of digital broadcasting in the VHF (174-230 MHz) and UHF (470-862 MHz) bands.

The most time consuming processing at the RRC-06 was the compatibility analysis, which evaluates the interference between broadcasting transmissions and between broadcasting and other radiocommunication transmissions to identify those that can share the same channel. This analysis needed do be performed in several iterations, each of them to be completed within 12 hours.

At the RRC-06, the total capacity required for the needed processing was estimated to several hundred CPU-days on a high-end 2006 PC. To meet those constraints the BR decided to buy a PC farm, which in its final configuration was composed of 84 high-end dedicated 3.6 GHz hyper threading PCs.

The BR system (see Fig.1) is a distributed client-server systems that minimizes the idle time for each client PC by having the server PC submitting a new task to an individual client PC as soon as the latter completes the processing of the previous one. The system is implemented with Perl scripts installed as Windows services and a custom communication protocol based on UDP/IP. The UDP packets carried information on the executable to be run and on the relevant input parameters. In the reliable internal network of the ITU farm the packet loss was not a problem. The server implemented two Windows services, a Listener and a Dispatcher, responsible for task submission, task management and workload balancing. The system automatically managed the task status and resubmitted the ones which were not completed. The clients implemented two Windows services, the TaskManager responsible for running tasks according to Dispatcher requests and the TaskController responsible for monitoring and control operations. A web application (implemented with ASP.NET and C#) running on a dedicated machine (WebInter face), provided monitoring and control interfaces to operate the system.

Figure 1: Architecture of the BR distributed system

Accurate measurements showed that hyper threading permits to gain about 30% in computing time by running two tasks in parallel on one PC with respect to the situation when the same tasks are run sequentially. The BR system at the RRC-06 was able to run 168 parallel tasks. The distribution of the second RRC-06 iteration tasks processing time is shown in Fig. 2.

Figure 2: Distribution of task execution time (RRC-06 iteration 2)

The ITU system at the RRC-06 ran more than 180 thousand tasks for an overall integrated elapsed time of 4500 CPU/hours, i.e. more than half a CPU year.

At RRC-06, to extend the computing capacity and improve dependability the BR engaged in collaboration with the European Organization for Nuclear Research (CERN)[1] to deploy a distributed system based on the EGEE Grid (Enabling Grids for E-SciencE)[2] and supported by the CERN IT department. The EGEE Grid has been designed and operated mostly for batch processing. The EGEE middleware services integrate computing farms and the batch queues into a single, globally distributed system. The access to the distributed resources is typically controlled by the fair-share mechanisms, ensuring usage of resources by groups of users according to predefined policies. This architecture is suitable for high-throughput computing but is not efficient for low latency, dependable computing which is stipulated by the RRC-06 processing application.

To meet RRC-06 requirements we used a high-level tool developed within CERN IT (the Diane system[3]) to control the job workload onto the EGEE Grid infrastructure.

This allowed us to successfully use the Grid for the RRC-06 also providing a demonstration of usage of the Grid as a low-latency, dependable resource. The BR personnel needed limited support and training to adopt the Grid technology for RRC-06. This demonstrates the maturity of Grid technology for usage in new scientific communities and technical activities.

Future plans

In our effort to staying at the forefront of technologies and in order to be prepared for future events which may require even more computing capabilities than recent past conferences, we are also interested in investigating additional emerging computing paradigm, like cloud computing, where dynamically scalable resources are provided as a service over the Internet. For this purpose, we are planning a pilot project system where time-consuming large scale frequency compatibility calculations could be submitted by Member States via the web. This system could then perform those calculations using either a local distributed infrastructure, grid or cloud resources. This would also present an opportunity for future collaboration between ITU and CERN on this issue.

See ITU News articles on distributed computing:

Distributed computing

See RRC-06 Closing Press Release:

Digital broadcasting set to transform communication landscape by 2015 (Accord is major step in implementing World Summit on the Information Society objectives)

See arXiv.org (Cornell University Library):

Dependable Distributed Computing for the ITU Regional Radiocommunication Conference (RRC-06)

______________
Footnotes:

[1] CERN: http://public.web.cern.ch/public

[2] Enabling Grid for E-sciencE (EGEE): http://www.eu-egee.org

[3] Distributed Analysis Environment: http://cern.ch/diane

BR promotion contact