ARC Compute Element (CE) is a Grid front-end on top of a conventional computing resource (e.g. a Linux cluster or a standalone workstation).
The Argus Authorization Service renders consistent authorization decisions for distributed services (e.g., user interfaces, portals, computing elements, storage elements). The service is based on the XACML standard, and uses authorization policies to determine if a user is allowed or denied to perform a certain action on a particular service.
The following documentation mainly covers existing batch system middleware integration components and some general links to the most popular batch systems used within WLCG. In particular, CREAM CE integration with Torque, LSF, GE and SLURM is well documented as well as EMI implementation of the TORQUE batch system.
The CREAM (Computing Resource Execution And Management) Service is a simple, lightweight service for job management operation at the Computing Element (CE) level.
CREAM accepts job submission requests (which are described with the same JDL language used to describe the jobs submitted to the Workload Management System) and other job management requests (e.g. job cancellation, job monitoring, etc).
CREAM can be used by the Workload Management System (WMS), via the ICE service, or by a generic client, e.g. an end-user willing to directly submit jobs to a CREAM CE. For the latter user case a command line interface (CLI) is available.
CREAM exposes a web service interface.
CERNVM File System (CVMFS) is a network file system based on HTTP and optimized to deliver experiment software in a fast, scalable, and reliable way. Files and file metadata are aggressively cached and downloaded on demand. Thereby the CernVM-FS decouples the life cycle management of the application software releases from the operating system.
dCache is a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods.
The Disk Pool Manager (DPM) is a lightweight storage solution for grid sites. It offers a simple way to create a disk-based grid storage element and supports relevant protocols (SRM, gridFTP, RFIO) for file management and access.
It focus on manageability (ease of installation, configuration, low effort of maintenance), while providing all required functionality for a grid storage solution (support for multiple disk server nodes, different space types, multiple file replicas in disk pools).
EOS is a disk-based service providing a low latency storage infrastructure for physics users. EOS provides a highly-scalable hierarchical namespace implementation. Data access is provided by the XROOT protocol.
The main target area for the service are physics data analysis, which is characterized by many concurrent users, a significant fraction random data access and a large file open rate.
For user authentication EOS supports Kerberos (for local access) and X.509 certificates for grid access. To ease experiment workflow integration SRM as well as gridftp access are provided. EOS further supports the XROOT third-party copy mechanism from/to other XROOT enabled storage services at CERN.
FTS3 is the service responsible for globally distributing the majority of the LHC data across the WLCG infrastructure. It is a low level data movement service, responsible for reliable bulk transfer of files from one site to another while allowing participating sites to control the network resource usage.
GFAL (Grid File Access Library ) is a C library providing an abstraction layer of the grid storage system complexity.
An OSG Compute Element (CE) is the entry point for the OSG to your local resources: a layer of software that you install on a machine that can submit jobs into your local batch system. At the heart of the CE is the job gateway software, which is responsible for handling incoming jobs, authorizing them, and delegating them to your batch system for execution. Historically, the OSG only had one option for a job gateway solution, Globus Toolkit’s GRAM-based gatekeeper, but now offers the HTCondor CE as an alternative.
Today in OSG, most jobs that arrive at a CE (called grid jobs) are not end-user jobs, but rather pilot jobs submitted from factories. Successful pilot jobs create and make available an environment for actual end-user jobs to match and ultimately run within the pilot job container. Eventually pilot jobs remove themselves, typically after a period of inactivity.
HTCondor CE is a special configuration of the HTCondor software designed to be a job gateway solution for the OSG. It is configured to use the JobRouter daemon to delegate jobs by transforming and submitting them to the site’s batch system.
StoRM (STOrage Resource Manager) is a light, scalable, flexible, high-performance, file system independent, storage manager service (SRM) for generic disk based storage system, compliant with the standard SRM interface version 2.2.
StoRM provides data management capabilities in a Grid environment to share, access and transfer data among heterogeneous and geographically distributed data centres.In particular, StoRM works on each POSIX filesystems (ext3, ext4, xfs, basically on everything than can be mounted on a Linux machine) but it also brings in Grid the advantages of high performance storage systems based on cluster file system (such as GPFS from IBM or Lustre from Sun Microsystems) supporting direct access (native POSIX I/O call) to shared files and directories, as well as other standard Grid access protocols. StoRM is adopted in the context of WLCG computational Grid framework.
The User Interface (UI) is the access point to the Grid Infrastructure. This can be any machine where users have personal account and where their user certificate is installed. From the UI, the user can be authenticated and authorised to use the Grid resources and can access the functionalities offered by the Information, Workload and Data Management Systems.
The Worker Node (WN) is the computing node inside the Grid where the user's jobs are finally executed at a site. On the WN, the necessary middleware components are installed. Additional software components may be necessary according to the requirements of the site supported VOs.
The Virtual Organization Membership Service is a Grid attribute authority which serves as central repository for VO user authorization information, providing support for sorting users into group hierarchies and keeping track of their roles and other attributes. These information are used to issue trusted attribute certificates and assertions used in the Grid environment for authorization purposes.
The grid information system provides detailed information about grid services which is needed for various different tasks. The grid information system has a hierarchical structure of three levels. The fundamental building block used in this hierarchy is the Berkley Database Information Index (BDII). The resource level or core BDII is usually co-located with the grid service and provides information about that service. Each grid site runs a site level BDII. This aggregates the information from all the resource level BDIIs running at that site. The top level BDII aggregates all the information from all the site level BDIIs and hence contains information about all grid services. There are multiple instances of the top level BDII in order to provide a fault tolerant, load balanced service. The information system clients query a top level BDII to find the information that they require.
The XROOTD project aims at giving high performance, scalable fault tolerant access to data repositories of many kinds. The typical usage is to give access to file-based ones. It is based on a scalable architecture, a communication protocol, and a set of plugins and tools based on those. The freedom to configure it and to make it scale (for size and performance) allows the deployment of data access clusters of virtually any size, which can include sophisticated features, like authentication/authorization, integrations with other systems, WAN data distribution, etc.
XRootD software framework is a fully generic suite for fast, low latency and scalable data access, which can serve natively any kind of data, organized as a hierarchical filesystem-like namespace, based on the concept of directory. As a general rule, particular emphasis has been put in the quality of the core software parts.
Operations Coordination Meeting
T1 and T2 sites are invited to raise any issue they are concerned about at the monthly Operations Coordination meeting that usually takes place the 1st Thursday of the month from 15h30 to 17h CE(S)T. There is a section on the agenda for this. You can also write to wlcg-ops-coord-chairpeople in advance to make sure a specific slot is scheduled in the agenda.
WLCG Middleware Baseline
The WLCG Middleware Baseline lists the minimum recommended versions of middleware services that should be installed by WLCG sites to be part of the production infrastructure. It does not necessarily reflect the latest versions of packages available in the UMD, OSG or EPEL repositories. It contains the latest version fixing significant bugs or introducing important features. Versions newer than those indicated are assumed to be at least as good, unless otherwise indicated. In other words: if you have a version older than the baseline, you should upgrade at least to the baseline. For more details, please check the list of versions in the following link:
Middleware Known Issues
A list of middleware known issues is maintained by the WLCG Middleware Officer. The list contains known middleware issues affecting the operations of the WLCG infrastructure. For more details please check the following link:
To report a new known issue, please, contact the WLCG Middleware Officer.