This article discusses the ITU distributed computing
experience developed together with EBU and CERN for the Regional
Radiocommunication Conference 2006 (RRC-06).
Cooperation with EBU and CERN
A key ingredient for the success of the Conference was the unprecedented level
of cooperation between ITU, the European Broadcasting Union (EBU) and the
European Organization for Nuclear Research (CERN).
The complex planning activities conducted at this conference and during the
intersessional period were based on the software developed by EBU, which
includes hundreds of thousands of programme lines. In preparing the Plan for
digital terrestrial broadcasting, ITU experts performed meticulous calculations
within a limited timeframe using two independent infrastructures: the ITU
distributed system with 100 PCs; and the CERN Grid infrastructure that is based
on a few hundred dedicated CPUs from several European institutions.
Distributed computing: an ITU BR experience To meet
its statutory obligations for daily processing of space and terrestrial radio
service notices and to solve complex problems as planning of limited radio
spectrum and orbit resources for treaty-related conferences the ITU
Radiocommunication Bureau (BR) has a need to
perform CPU intensive calculations in a limited timeframe.
To reach this goal the BR developed extensive experience in the development and
the usage of distributed computing and related technologies, including grids.
These technologies are aimed to reduce the overall processing time by splitting
CPU-intensive calculations into smaller tasks and running those tasks in
parallel on several PCs.
The early days
The current planning procedures for HF broadcasting services under the Article
12 of the Radio Regulations that came into force in 1999 required the BR to
perform field strength and compatibility calculations of all HF broadcasting
transmissions used throughout the world. These calculations (involving a high
number of HF transmissions and test points) needed to be performed monthly to
take into account different propagation conditions in the ionosphere. The total
capacity required for the monthly processing was over 150 hours on a standard
Desktop 1999 PC.
To face this challenge the BR decided to use staff Desktop PCs available
overnight. Those machines where grouped in a pool consisting of up to 50 PCs,
with a priority mechanism favoring those with higher processing power.
A “server” program, developed in Visual Basic and taking advantage of the
Windows scheduling service, was designed to check PC’s availability, to
subdivide calculations into smaller tasks and to submit those tasks to available
PCs. This “server” program also retrieves calculations results verifies job
completeness and prepares a run report.
The HFBC distributed system is still in use today however it does not require as
many computers as in 1999 thanks to faster personal computers processors and
continued optimization of the calculation computing routines.
Planning for digital broadcasting services (RRC-06)
In 2002 the BR developed a distributed computing client-server system in order
to speed up processing of terrestrial notices in the frequency bands shared by
terrestrial and space services. This system used limited resources, consisting
in a few staff Desktop PCs available overnight, and has successfully been used
since then. This system was greatly enhanced during the BR preparation for the
Regional Radio Conference RRC-06, which established in 2006 a new frequency plan
for the introduction of digital broadcasting in the VHF (174-230 MHz) and UHF
(470-862 MHz) bands.
The most time consuming processing at the RRC-06 was the compatibility analysis,
which evaluates the interference between broadcasting transmissions and between
broadcasting and other radiocommunication transmissions to identify those that
can share the same channel. This analysis needed do be performed in several
iterations, each of them to be completed within 12 hours.
At the RRC-06, the total capacity required for the needed processing was
estimated to several hundred CPU-days on a high-end 2006 PC. To meet those
constraints the BR decided to buy a PC farm, which in its final configuration
was composed of 84 high-end dedicated 3.6 GHz hyper threading PCs.
The BR system (see Fig.1) is a distributed client-server systems that minimizes
the idle time for each client PC by having the server PC submitting a new task
to an individual client PC as soon as the latter completes the processing of the
previous one. The system is implemented with Perl scripts installed as Windows
services and a custom communication protocol based on UDP/IP. The UDP packets
carried information on the executable to be run and on the relevant input
parameters. In the reliable internal network of the ITU farm the packet loss was
not a problem. The server implemented two Windows services, a Listener and a
Dispatcher, responsible for task submission, task management and workload
balancing. The system automatically managed the task status and resubmitted the
ones which were not completed. The clients implemented two Windows services, the
TaskManager responsible for running tasks according to Dispatcher requests and
the TaskController responsible for monitoring and control operations. A web
application (implemented with ASP.NET and C#) running on a dedicated machine (WebInter
face), provided monitoring and control interfaces to operate the system. Figure 1: Architecture of the BR distributed system
|
Accurate measurements showed that hyper threading permits to
gain about 30% in computing time by running two tasks in parallel on one PC with
respect to the situation when the same tasks are run sequentially. The BR system
at the RRC-06 was able to run 168 parallel tasks. The distribution of the second
RRC-06 iteration tasks processing time is shown in Fig. 2.
Figure 2: Distribution of task execution time
(RRC-06 iteration 2)
The ITU system at the RRC-06 ran more than 180 thousand tasks for an overall
integrated elapsed time of 4500 CPU/hours, i.e. more than half a CPU year.
At RRC-06, to extend the computing capacity and improve dependability the BR
engaged in collaboration with the European Organization for Nuclear Research
(CERN)to deploy a distributed system based on the EGEE Grid (Enabling Grids for
E-SciencE) and supported by the CERN IT department. The EGEE Grid has been
designed and operated mostly for batch processing. The EGEE middleware services
integrate computing farms and the batch queues into a single, globally
distributed system. The access to the distributed resources is typically
controlled by the fair-share mechanisms, ensuring usage of resources by groups
of users according to predefined policies. This architecture is suitable for
high-throughput computing but is not efficient for low latency, dependable
computing which is stipulated by the RRC-06 processing application.
To meet RRC-06 requirements we used a high-level tool developed within CERN IT
(the Diane system) to control the job workload onto the EGEE Grid
infrastructure.
This allowed us to successfully use the Grid for the RRC-06 also providing a
demonstration of usage of the Grid as a low-latency, dependable resource. The BR
personnel needed limited support and training to adopt the Grid technology for
RRC-06. This demonstrates the maturity of Grid technology for usage in new
scientific communities and technical activities.
Future plans
In our effort to staying at the forefront of technologies and in order to be
prepared for future events which may require even more computing capabilities
than recent past conferences, we are also interested in investigating additional
emerging computing paradigm, like cloud computing, where dynamically scalable
resources are provided as a service over the Internet. For this purpose, we are
planning a pilot project system where time-consuming large scale frequency
compatibility calculations could be submitted by Member States via the web. This
system could then perform those calculations using either a local distributed
infrastructure, grid or cloud resources. This would also present an opportunity
for future collaboration between ITU and CERN on this issue.
See ITU News articles on distributed computing:
See
RRC-06 Closing
Press Release:
See
arXiv.org (Cornell University
Library):
|
BR promotion contact
|
|