Scientific Computing Laboratory
Logo by Dusko Latas
 
EGEE-III Service Activities
 
 
The Service Activities in EGEE-III build upon the experience gained and the infrastructure deployed in the predecessor projects EGEE and EGEE-II. This infrastructure is a leading global Grid, in terms both of the scale of resources provided and the number of user communities supported.

The service activities within EGEE-III are aimed at ensuring that the Grid infrastructure delivers a service that focuses on enabling and supporting science in diverse research communities while taking appropriate steps towards a sustainable infrastructure in Europe. This will be achieved through provision of a production infrastructure (SA1), provision of networking support coordination with GÉANT2 and NRENs (SA2), and provision of a middleware distribution to the production infrastructure as well as to related efforts worldwide (SA3). The EGEE middleware distribution (gLite) combines components from different providers, most importantly the EGEE middleware engineering activity JRA1, but also the Virtual Data Toolkit (VDT) distribution of OSG, application projects such as LCG etc. The components are chosen to satisfy the requirements of the EGEE user communities and operations. Interoperability is one of the drivers of the gLite distribution as allowing access to a diverse set of research infrastructures is a major goal of EGEE-III. These infrastructures comprise in particular related efforts such as SEE-GRID-SCI, BalticGrid, NorduGrid, and DEISA in Europe, OSG and Teragrid in the US, NAREGI in Japan.

Based on the existing EGEE-II procedures, EGEE-III will provide the following infrastructures in order to carry out its mission:
  • A Production Service infrastructure, with incremental growth anticipated within the existing structure, and expanded through collaborating infrastructure projects listed in section 3. Interoperability with other Grid infrastructures will evolve at all levels from campus to international.
  • A Pre-Production Service (PPS) will demonstrate new services, or new versions of existing services before they move to production. This will provide an environment for applications to test new services and to integrate their software with Grid services. The PPS has also shown itself to be invaluable for deployment testing in the Regional Operations Centres (ROC) before full distribution of new or updated services.
  • The EGEE Network Operations Centre (ENOC) which caters for the network operational coordination between EGEE and the network providers (GÉANT2 /NRENs).
This is complemented by the training infrastructure and the certification test-beds as well as the needed support structures and policy groups.

The EGEE Service Activities consist of five activities:
  • SA1: Grid Operations
  • SA2: Networking support
  • SA3: Integration, Testing and Certification
Each of these is summarised below. The figure illustrates interactions between the activities.

EGEE-III Service Activities and Joint Ressearch Activities


SA1: Grid Operations

The principal objective of the activity is to operate the EGEE production infrastructure, providing a high quality service to the application groups. The operational procedures and tools will be enhanced and structural changes needed for the transition to a sustainable mode will be implemented. This is made possible by a number of support structures including Regional Operation Centres, A Global Grid User Support (GGUS) for user support, an EGEE Network Operations Centre (ENOC), Certification and Testing (with SA3), and groups for Grid security coordination.

The Operations activities in EGEE-III will be firmly based on the work done in EGEE and EGEE-II, with some adjustments to improve the overall responsiveness and to address problem areas. However, no major changes in the infrastructure or basic mechanisms are anticipated. In the EGEE-III project lifetime it will be important to set the groundwork for an eventual migration to the EGI/NGI model, which is today understood to be based on coordination at the European level of National Grid Infrastructures. This transition clearly cannot happen in one go, and the migration will need to be carefully planned and understood. It is important, therefore, that EGEE-III plans and tests possible transitional organisational structures. Here we list the organisational components that we either understand as necessary in EGEE-III or that seem to be necessary in a transition:
  • Operations Coordination. The Operations Coordination Centre (OCC) at CERN will remain in basically the same form as in EGEE-II. The existing roles and functions will continue to be necessary.
  • Operations Centres. The concept of the Regional Operations Centres (ROCs) has been shown to work well during the first 2 phases of EGEE. In particular, the “Operator on Duty” rotation is an essential part of the core Grid operation. This structure will be retained intact in EGEE-III. During EGEE-III we will plan a transition to an operational model based on National Grid Infrastructures. This will require also striving to reduce the effort required in daily operations activities, hence in EGEE-III there will be a strong focus on automating the tools and processes needed to achieve that. The operations activity will put strong requirements on the service management aspects of the middleware to ensure that services are as reliable and as straightforward to manage as possible.
Experience has shown that a regional operations centre should manage at least 10 sites. For less than this it is difficult to justify the setting up of the organisation and the incremental staffing required. During EGEE-III the emphasis will be on how the ROCs can manage (or coordinate) more sites with a given level of effort, with a strong emphasis on the tools available to do this. The existing ROCs will be retained with one in each of the Federations, with the exception that the new Nordic and Benelux federations will continue to collaborate in providing a single ROC covering the Nordic and Benelux regions. Thus, although there are now 12 Federations proposed for EGEE-III, there will be 11 ROCs as in EGEE-II.
  • Security. There are a set of existing security and policy groups that exist and will be maintained and evolved. These are: Joint Security Policy Group; Operational Security Coordination Team; Grid Security Vulnerability Group; EuGridPMA/IGTF work. In addition, it is anticipated that basic site auditing to verify security best practices, use of appropriate and adequate intrusion monitoring tools, etc. will be a task of the security groups. Overall security coordination is through the Security Coordination Group, SCG).
  • Support activities. The scope of this is clear from EGEE/EGEE-II and these activities will be retained and strengthened. Much of the problems seen in support arise from a lack of experienced or trained support staff. This is a vital area to strengthen. The activities include:
    • Operations support – based on the GGUS infrastructure. This will be focused by regular meetings of a user-driven advisory group and workshops to ensure that the needs of users are understood and responded to.
    • User Support (helpdesk/call centre) – each ROC will provide user support effort. In addition it is important that teams in the VOs or major applications provide the front-line for their communities. This will be complimented by effort in NA4.
    • VO Support: teams within the applications communities providing advice and help, acting as front-line user support.
    • Application integration teams. These teams will be located together supporting application communities or groups of communities. These SA1 teams will collaborate and share experiences with the application support teams in NA4.
The services and test-beds are supported through a full set of procedures and support organisations that have evolved and matured during EGEE and EGEE-II. These include:
  • Operational support mechanisms, managed through the Operations Coordination Centre (OCC) - ROC hierarchy;
  • User support mechanisms, also managed through the OCC-ROCs;
  • Coordination with network support through the EGEE Network Operations Centre (ENOC, in SA2);
  • Grid Security at both the operational and policy levels;
  • Oversight and coordination of allocation of resources through a Resource Allocation Group.
SA2: Networking support

For EGEE-III, the objective of the SA2 activity is to interface between the EGEE infrastructure and the NRENs and GÉANT2. More specifically, the goals are two-fold:
  • Ensure the daily operational interface between the infrastructures including notably the information exchange between the network operational entities and the Grid operations and the network user support in the EGEE operational model;
  • Ensure that the applications network requirements are fulfilled and that new network functionalities (such as network Quality of Service or IPv6) are advertised to the EGEE users and provided in the EGEE infrastructure.
The network infrastructure is a major building block of the Grid infrastructure which is often presented as an “overlay network” of sites and services which relies on the underlying network for its proper running. This activity acts as an interface with the network providers that connect all the computing and storage resource providers. During the two first phases of the EGEE project, the role of this interface has been achieved in four ways, and they will continued in EGEE-III:
  • The Technical Network Liaison Committee (TNLC) is a committee including the NRENs involved in the EGEE project plus GÉANT2. This is where the dialogue between the two communities occurs (for instance about new requirements and new services), where technical issues are discussed, and where the stakeholders propose new actions to improve the collaboration (for instance about the standardisation of trouble ticket exchanges);
  • The effort made by EGEE to prepare the use of advanced network services through Service Level Agreements with the network providers enables EGEE for use such services by the applications. The expected deployment of automatic mechanisms in the network (the GÉANT2 Advance Multi-domain Provisioning System or AMPS) will be a great step forward towards an increased usage of the services and a wider adoption by application users. We expect that application developers will take advantage of advanced network services to improve their workflow.
  • The EGEE Network Operational Centre (ENOC) is the dedicated entity that plays the role of the daily operational interface between its counterparts in each NREN and the operational support of EGEE. The concept of a transversal entity able to coordinate the actions of various operational groups in multiple different administrative domains has proven its usefulness and reliability. It has now been adopted by GÉANT2 with the End-to-End Coordination Unit (E2ECU) to support the various project dedicated end-to-end links provided by the European NRENs and GÉANT2.
  • There is also a requirement to support IPv6 within EGEE. Indeed expertise is needed to build and run a functional Grid IPv6 testbed and to provide IPv6 testing and certification methodologies for developers and testing & certification teams in order to validate the middleware compliance in an IPv6 environment.
SA3: Integration, Testing and Certification

SA3 will manage the process of building deployable and documented gLite middleware distributions. Its main objectives are to :
  • Produce well-tested and documented gLite releases together with associated configuration tools;
  • Improve the multi-platform support of gLite;
  • Increase interoperability of different Grid infrastructures by working towards best practices and established standards and provide input to standardisation bodies.
The goal of the SA3 activity is to manage and coordinate the process of building deployable and documented middleware distributions, called gLite, starting with the integration of middleware packages and components from a variety of sources. The activity will refine the criteria for accepting components which have been defined and documented in EGEE-II, and will run an integration and build infrastructure using as much as possible results of the ETICS project and will cooperate with potential projects providing adequate services or tool sets.

To ensure that the middleware is reliable, robust, scalable, and as usable as possible, a testing and certification activity will be run. SA3 will focus the effort on foundation middleware, essential core components on which complex higher level services are constructed (see also the related description in JRA1).

Following the successful component based release model introduced in EGEE-II, the goal of each update will be the provision of a deployable gLite distribution focusing on making the components in the distribution work effectively for users when deployed. This versioned middleware distribution will be available for other interested parties, especially for related projects, such as SEE-GRID-SCI, Baltic-Grid, EELA and several others. These related projects often adapt the gLite middleware releases to meet their specific local needs. To ease this it is important that the releases are as modular as possible. In addition, support for multiple platforms and operating systems is essential. Apart from the currently supported Scientific Linux (a RedHat Enterprise variant) other versions of Linux and other operating systems need to be supported on both 32 and 64 bit platforms. The selection of platforms to be supported and the prioritisation has to be driven by users and infrastructures via the Technical Management Board (TMB). Given the number of different platforms and the overall resource level of SA3 and JRA1 the project will focus on providing adequate subsets of components for a given platform.

The SA3 activity decouples the production of deployable middleware distributions from the middleware developments as far as necessary to ensure an effective certification and allow the integration of best matching components, independent of their origin. This is crucial at this point in the EGEE programme, as the focus must be on making the infrastructure that now exists as reliable and robust as possible. Further middleware and services development will be driven by need and utility as determined by the users and operations group via the TMB which has been driving the functional development already during EGEE-II. SA3 will have developers who work within the team in order to provide sufficient capacity and competence to identify complex bugs, develop extensive tests for scalability. It is expected that these developers will undertake small development efforts to “glue” together middleware components, provide missing minor tools or temporary solutions, carry out small modifications and link with external developers. Larger developments will be negotiated with the JRA1 activity and with other middleware providers (such as Open Science Grid (VDT), etc.) under the supervision of the TMB.

While in EGEE-II JRA1 and SA3 have been loosely coupled, a significant part of the testing and release preparation work in EGEE-III will be carried out by SA3 partners close to JRA1 partners, forming together Clusters of Competence. This is in line with the concept of component based releases that has been developed and applied successfully during EGEE-II. The goal is to minimize losses during times of rapid change, but ensure that the middleware fulfils high standards of deployability and usability.
April 28, 2010
Director's Letter for April 2010
April 09, 2010
EGEE Grid training at MI SANU
April 07, 2010
New AEGIS Grid site at MI SANU
March 15, 2010
Director's Letter for March 2010
March 09, 2010
SCL seminar at MI SANU
February 19, 2010
Introduction to gLite Distributed Computing
February 12, 2010
Director's Letter for February 2010
January 25, 2010
Director's Letter for January 2010
December 17, 2009
Director's Letter for December 2009
November 30, 2009
ATLAS Resources Coordinator visits SCL
 
Press Archive