Wiki:Projects
From CSWiki
Contents |
Active Projects
Swift: Fast, Reliable, Loosely Coupled Parallel Computation
Scientific computation often involves thousands or even millions of tasks operating on large quantities of data, such data is often diversely structured and stored in heterogeneous physical formats, and scientists must specify and run such computations over extended periods on collections of compute, storage and network resources that frequently change constantly. Swift is a parallel programming tool specifically designed to address such challenges for concise specification, and fast and reliable execution of large-scale scientific computation.
Swift on the Cloud
Since later 2008, Amazon launched the first commercial Cloud Service (AWS), and other industries, like IBM, Microsoft, also announced that they will launch their Cloud soon. Compare to Grid, Cloud has the advantage of flexible, quickly deployment; for Amazon Web Service, anyone who has a credit card can get the access to the cloud instantly and the size of cluster can be shrunk or extended at user’s desire. Therefore, we are thinking of involving the cloud into the current “swift” system to provide a more flexible usage for users. We are now analyzing the performance of “swift” based on two mode of use case; data on local and data on S3 and we are also trying to compare different sharing strategies of “swift” to tune up the runtime performance according to Cloud’s abstraction and infrastructure. Our goal is make it possible for people to run Swift programs “instantly” by taking advantage of characteristic of Cloud.
Falkon: a Fast and Light-weight tasK executiON framework
Falkon aims to enable the rapid and efficient execution of many independent jobs on large compute clusters. Falkon combines three techniques to achieve this goal: (1) multi-level scheduling techniques to enable separate treatments of resource provisioning and the dispatch of user tasks to those resources; (2) a streamlined task dispatcher able to achieve order-of-magnitude higher task dispatch rates than conventional schedulers; and (3) performs data caching and uses a data-aware scheduler to leverage the co-located computational and storage resources to minimize the use of shared storage infrastructure. Falkon has become a Globus incubator project since November 2007, which has information about source code downloads, instalation and usage instructions, and mailing lists; the Falkon Globus Incubator Site can be found here; the general Falkon project page can be found here.
AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis
The AstroPortal is an astronomy gateway to grid resources on the TeraGrid using the Falkon framework. The astronomy dataset used is the Sloan Digital Sky Survey (SDSS), DR4/DR5, which is comprised of over 300 million objects dispersed over 1.3 million files adding up to 3 terabytes of compressed data or over 9TB of uncompressed imaging data. The analysis currently supported by the AstroPortal prototype is stacking, the summation of multiple observations of the same part of the sky; stacking will both help identify variable sources and to detect faint objects. The AstroPortal gives the astronomy community a new tool to advance their research and to open new doors to opportunities never before possible on such a large scale.
SPRUCE:Urgent Computing
SPRUCE is a system to support urgent or event-driven computing on both traditional supercomputers and distributed Grids. Simply put, urgent computations are defined by a strict deadline after which point the results may have very little use. For instance, the results of a simulation that models severe weather events may be used to help guide the evacuation of residents in the targeted area. Clearly, if these results are not produced until after the event occurs, they will have very little value. Scientists are provided with transferable Right-of-Way tokens with varying urgency levels. During an emergency, a token may be activated to provide urgent computations with priority access to computational and network resources. For computational resources, priority access may include providing "next-to-run" status or immediately preempting other jobs.
Past Projects
DiPerF: an automated DIstributed PERformance testing Framework
DiPerF aims to simplify and automate service performance evaluation. DiPerF coordinates a pool of machines that test a target service, collects and aggregates performance metrics, and generates performance statistics. The aggregate data collected provide information on service throughput, on service ‘fairness’ when serving multiple clients concurrently, and on the impact of network latency on service performance. Furthermore, using this data, it is possible to build predictive models that estimate a service performance given the service load.
ServMark: an Architecture for Testing Grid Services
ServMark is the result of the integration of the GrenchMark project and DiPerF. ServMark addresses two orthogonal research questions: (1) How to test a large-scale, distributed, and (grid-)service-based environment? and (2) How to generate realistic testing traces for a wide-range of testing scenarios?
DI-GRUBER: A Distributed Grid Resource Broker
DI-GRUBER, an extension to the GRUBER brokering framework, was developed as a distributed grid UUsage SLA-based resource broker that allows multiple decision points to coexist and cooperate in real-time. DI-GRUBER ultimately addresses issues regarding how usage USLAs can be stored, retrieved, and disseminated efficiently in a large distributed environment.
GangSim Simulator
GangSim is a tool developed for Grid scheduling studies, capable of supporting studies for controlled resource sharing based on uSLAs. The new name, GangSim, reflects both the origins of the implementation (Ganglia Monitoring Toolkit) and the fact that it can be used to simulate "gangs" of consumers and resources.
GriPhyN: Grid Physics Network
The Grid Physics Network collaboration is a team of experimental physicists and information technology (IT) researchers who plan to implement the first Petabyte-scale computational environments for data intensive science in the 21st century. Driving the project are unprecedented requirements for geographically dispersed extraction of complex scientific information from very large collections of measured data. To meet these requirements, which arise initially from the four physics experiments involved in this project but will also be fundamental to science and commerce in the 21st century, GriPhyN will deploy computational environments called Petascale Virtual Data Grids (PVDGs) that meet the data-intensive computational needs of a diverse community of thousands of scientists spread across the globe.
beta-Grid
The beta-Grid project aims to:
- define the behavior of a standard "Grid-enabled cluster" in terms of the protocols it must speak, the scheduling disciplines it must implement, its performance characteristics, etc.
- develop a standard software suite, comprising protocol implementations, schedulers, configuration scripts, etc., that implement the standard behaviors.
GrADS: Grid Application Development Software Project
The goal of the Grid Application Development Software Project is to simplify distributed heterogeneous computing in the same way that the World Wide Web simplified information sharing over the Internet. The GrADS project will explore the scientific and technical problems that must be solved to make grid application development and performance tuning for real applications an everyday practice.
