| Institute for Mathematics and its Applications University of Minnesota 114 Lind Hall 207 Church Street SE Minneapolis, MN 55455 |
2010-2011 Program
See http://www.ima.umn.edu/2010-2011/ for a full description of the 2010-2011 program on Simulating Our Complex World: Modeling, Computation and Analysis.
| 10:00am-11:00am | Registration and coffee | Keller Hall 3-176 | SW1.3-5.11 | |
| 11:00am-11:15am | Welcome to the IMA | Fadil Santosa (University of Minnesota) | Keller Hall 3-180 | SW1.3-5.11 |
| 11:15am-12:15pm | Points on Shimura varieties mod p | Mark Kisin (Harvard University) | Keller Hall 3-180 | SW1.3-5.11 |
| 12:15pm-2:00pm | Lunch | SW1.3-5.11 | ||
| 2:00pm-3:00pm | Elliptic curves: problems and applications | Carl Pomerance (Dartmouth College) | Keller Hall 3-180 | SW1.3-5.11 |
| 3:00pm-3:10pm | Group photo | SW1.3-5.11 | ||
| 3:10pm-3:40pm | Coffee break | Keller Hall 3-176 | SW1.3-5.11 | |
| 3:40pm-4:40pm | Selmer ranks of elliptic curves in families of quadratic twists | Karl Rubin (University of California, Irvine) | Keller Hall 3-180 | SW1.3-5.11 |
| 5:00pm-7:00pm | Reception at the Campus Club Bar Lounge | Campus Club Bar Lounge | SW1.3-5.11 |
| 8:30am-9:00am | Coffee | Keller Hall 3-176 | SW1.3-5.11 | |
| 9:00am-10:00am | An equivariant main conjecture in Iwasawa theory and applications | Cristian D Popescu (University of California, San Diego) | Keller Hall 3-180 | SW1.3-5.11 |
| 10:00am-10:30am | Coffee break | Keller Hall 3-176 | SW1.3-5.11 | |
| 10:30am-11:30am | Permanence following Temkin | Michel Raynaud (Université de Paris XI (Paris-Sud)) | Keller Hall 3-180 | SW1.3-5.11 |
| 11:30am-2:00pm | Lunch | SW1.3-5.11 | ||
| 2:00pm-3:00pm | On the geometry of character varieties | Fernando Rodriguez Villegas (University of Texas at Austin) | Keller Hall 3-180 | SW1.3-5.11 |
| 3:00pm-3:30pm | Coffee break | Keller Hall 3-176 | SW1.3-5.11 | |
| 3:30pm-4:30pm | The average rank of elliptic curves | Manjul Bhargava (Princeton University) | Keller Hall 3-180 | SW1.3-5.11 |
| 6:00pm-8:00pm | Banquet at McNamara Alumni Center, Heritage Gallery | McNamara Alumni Center, Heritage Gallery 200 Oak Street S.E. Minneapolis, MN 55455 612-624-9831 |
SW1.3-5.11 |
| 8:30am-9:00am | Coffee | Keller Hall 3-176 | SW1.3-5.11 | |
| 9:00am-10:00am | P-adic periods and derived de Rham cohomology | Alexander A. Beilinson (University of Chicago) | Keller Hall 3-180 | SW1.3-5.11 |
| 10:00am-10:30am | Coffee break | Keller Hall 3-176 | SW1.3-5.11 | |
| 10:30am-11:30am | Random maximal isotropic subspaces and Selmer groups | Bjorn Poonen (Massachusetts Institute of Technology) | Keller Hall 3-180 | SW1.3-5.11 |
| 11:30am-1:00pm | Lunch | SW1.3-5.11 | ||
| 1:00pm-2:00pm | Vector bundles and p-adic Galois representations | Jean-Marc Fontaine (Université de Paris XI (Paris-Sud)) | Keller Hall 3-180 | SW1.3-5.11 |
| 2:00pm-2:05pm | Closing remarks | Keller Hall 3-180 | SW1.3-5.11 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| All Day | Eric Darve (Stanford University) –
Chair morning session and the opening
of the afternoon session. David E. Keyes (KAUST/Columbia University) will preside over at the end of Cris Cecka's afternoon session. | T1.9.11 | ||
| 9:00am-9:30am | Registration and coffee | Lind Hall 400 | T1.9.11 | |
| 9:30am-10:30am | Lecture 1: Implications of the exascale roadmap for algorithms | David E. Keyes King Abdullah University of Science & Technology, Columbia University | Lind Hall 305 | T1.9.11 |
| 10:30am-11:00am | Break | Lind Hall 400 | T1.9.11 | |
| 11:00am-12:00pm | Lecture 2: Implications of the exascale roadmap for algorithms | David E. Keyes King Abdullah University of Science & Technology, Columbia University | Lind Hall 305 | T1.9.11 |
| 12:00pm-1:30pm | Lunch | T1.9.11 | ||
| 1:30pm-2:30pm | Lecture 1: Scientific Computing using Graphics Processors | Cris Cecka (Stanford University) | Lind Hall 305 | T1.9.11 |
| 2:30pm-3:30pm | Lecture 2: Scientific Computing with Graphics Processors | Cris Cecka (Stanford University) | Lind Hall 305 | T1.9.11 |
| 3:30pm-4:00pm | Break | Lind Hall 400 | T1.9.11 | |
| 4:00pm-4:30pm | Lecture 3: Introduction to heterogeneous computing with GPUs | Cris Cecka (Stanford University) | Lind Hall 305 | T1.9.11 |
| All Day | Morning Chair: David E. Keyes (King Abdullah University of Science & Technology/ Columbia University) Afternoon Chair: Lorena A. Barba (Boston University) | W1.10-14.11 | ||
| 8:15am-9:15am | Registration and coffee | Keller Hall 3-176 | W1.10-14.11 | |
| 9:15am-9:30am | Welcome to IMA | Fadil Santosa (University of Minnesota) | Keller Hall 3-180 | W1.10-14.11 |
| 9:30am-10:30am | The Exascale: Why and How | David E. Keyes King Abdullah University of Science & Technology, Columbia University | Keller Hall 3-180 | W1.10-14.11 |
| 10:30am-11:00am | Break | Keller Hall 3-176 | W1.10-14.11 | |
| 11:00am-12:00pm | Architecture-aware Algorithms and Software for Scalable Performance and Resilience on Heterogeneous Architectures | Jack J. Dongarra (University of Tennessee) | Keller Hall 3-180 | W1.10-14.11 |
| 12:00pm-2:00pm | Lunch | W1.10-14.11 | ||
| 2:00pm-3:00pm | Everyday Parallelism | Robert Strzodka (Max-Planck-Institut für Informatik) | Keller Hall 3-180 | W1.10-14.11 |
| 3:00pm-4:00pm | The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs or How to Avoid the Heroics of GPU Programming | Miriam Leeser (Northeastern University) | Keller Hall 3-180 | W1.10-14.11 |
| 4:00pm-4:30pm | Break | Keller Hall 3-176 | W1.10-14.11 | |
| 4:30pm-5:30pm | GPU programming from higher level representations | Matthew Gregg Knepley (University of Chicago) | Keller Hall 3-180 | W1.10-14.11 |
| All Day | Chair: Matthew Gregg Knepley (University of Chicago) | W1.10-14.11 | ||
| 8:00am-8:30am | Coffee | Keller Hall 3-176 | W1.10-14.11 | |
| 8:30am-9:30am | Thinking parallel: sparse iterative solvers with CUDA | Jonathan M. Cohen (NVIDIA Corporation) | Keller Hall 3-180 | W1.10-14.11 |
| 9:30am-10:30am | A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures | Olaf Schenk (Universität Basel) | Keller Hall 3-180 | W1.10-14.11 |
| 10:30am-11:00am | Break | Keller Hall 3-176 | W1.10-14.11 | |
| 11:00am-12:00pm | Large Scale Frictional Contact Dynamics on the GPU | Dan Negrut (University of Wisconsin-Madison) | Keller Hall 3-180 | W1.10-14.11 |
| 12:00pm-2:00pm | Lunch | W1.10-14.11 | ||
| 2:00pm-4:00pm | Group photo and Discussion | Keller Hall 3-180 | W1.10-14.11 | |
| 4:00pm-4:30pm | Break | Keller Hall 3-176 | W1.10-14.11 | |
| 4:30pm-6:00pm | Reception and Poster Session Poster submissions welcome from all participants Instructions | Lind Hall 400 | W1.10-14.11 | |
| Algorithms for Lattice Field Theory at Extreme Scales | Richard C. Brower (Boston University) | |||
| Medical Imaging on the GPU Using OpenCL: 3D Surface Extraction and 3D Ultrasound Reconstruction | Anne C. Elster (Norwegian University of Science and Technology (NTNU)) | |||
| Development of a new massively parallel tool for nonlinear free surface wave simulation | Allan Peter Engsig-Karup (Technical University of Denmark) | |||
| Development of Desktop Computing Applications and Engineering Tools on GPUs | Allan Peter Engsig-Karup (Technical University of Denmark) | |||
| A Domain Decomposition Method that Converges in Two Iterations for any Subdomain Decomposition and PDE | Martin J. Gander (Universite de Geneve) | |||
| Efficient Uncertainty Quantification using GPUs | Gaurav Gaurav (University of Minnesota) | |||
| Brain Perfusion: Multi-scale Simulations and Visualization | Leopold Grinberg (Brown University) | |||
| Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers | Dominik Göddeke (Universität Dortmund) Robert Strzodka (Max-Planck-Institut für Informatik) | |||
| The Build to Order Compiler for Matrix Algebra Optimization | Elizabeth R. Jessup (University of Colorado) | |||
| Global symbolic manipulations and code generation for Finite Elements on SIM[DT] hardware | Hugo Leclerc (École Normale Supérieure de Cachan) | |||
| GPU Acceleration in a Modern Problem Solving Environment: SCIRun's Linear System Solvers | Miriam Leeser (Northeastern University) | |||
| Hyperspectral Image Analysis for Abundance Estimation using GPUs | Nayda G. Santiago (University of Puerto Rico) | |||
| A GPU-accelerated Boundary Element Method and Vortex Particle Method | Mark J. Stock (Applied Scientific Research) | |||
| Locally-Self-Consistent Multiple-Scattering code (LSMS) for GPUs | Keita Teranishi (CRAY Inc) | |||
| Digital rocks physics: fluid flow in rocks | Jonas Tölke (Ingrain) | |||
| Preparing Algebraic Multigrid for Exascale | Ulrike Meier Yang (Lawrence Livermore National Laboratory) | |||
| Fast Multipole Methods on large cluster of GPUs | Rio Yokota (Boston University) |
| All Day | Morning Chair: Jonathan M. Cohen (NVIDIA Corporation) Afternoon Chair: Susanne C. Brenner (Louisiana State University) | W1.10-14.11 | ||
| 8:00am-8:30am | Coffee | Keller Hall 3-176 | W1.10-14.11 | |
| 8:30am-9:30am | Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics | Cris Cecka (Stanford University) | Keller Hall 3-180 | W1.10-14.11 |
| 9:30am-10:30am | Lattice Boltzmann Multi-Phase Simulations in Porous Media using GPUs | Jonas Tölke (Ingrain) | Keller Hall 3-180 | W1.10-14.11 |
| 10:30am-11:00am | Break | Keller Hall 3-176 | W1.10-14.11 | |
| 11:00am-12:00pm | The basis and perspectives of an exascale algorithm: our ExaFMM project | Lorena A. Barba (Boston University) | Keller Hall 3-180 | W1.10-14.11 |
| 12:00pm-2:00pm | Lunch | W1.10-14.11 | ||
| 2:00pm-3:00pm | Discussion [note room assignment] | W1.10-14.11 | ||
| Reproducible results and open source code Keller Hall 3-180 | Lorena A. Barba (Boston University) | |||
| Mixed precision computing Lind Hall 401 | Richard C. Brower (Boston University) | |||
| Exascale programming models Lind Hall 409 | Michael A. Heroux (Sandia National Laboratories) | |||
| 3:00pm-4:00pm | High-order DG Wave Propagation on GPUs: Infrastructure, Implementation, Method Improvements | Andreas Klöckner (New York University) | Keller Hall 3-180 | W1.10-14.11 |
| 6:00pm-8:30pm | Workshop social reception at Stub and Herbs 227 Oak St Minneapolis, MN 55414 (612) 379-0555 | Stub and Herbs 227 Oak St Minneapolis, MN 55414 (612) 379-0555 |
W1.10-14.11 |
| All Day | Morning Chair: Ulrike Meier Yang (Lawrence Livermore National Laboratory) Afternoon Chair: Mike Giles (University of Oxford) | W1.10-14.11 | ||
| 8:00am-8:30am | Coffee | Keller Hall 3-176 | W1.10-14.11 | |
| 8:30am-9:30am | Algorithms and Tools for Bioinformatics on GPUs | Bertil Schmidt (Nanyang Technological University) | Keller Hall 3-180 | W1.10-14.11 |
| 9:30am-10:30am | Algorithmic Fluid Art – Influences, Process, and Works | Mark J. Stock (Applied Scientific Research) | Keller Hall 3-180 | W1.10-14.11 |
| 10:30am-11:00am | Break | Keller Hall 3-176 | W1.10-14.11 | |
| 11:00am-12:00pm | OP2: an open-source library for unstructured grid applications | Mike Giles (University of Oxford) | Keller Hall 3-180 | W1.10-14.11 |
| 12:00pm-2:00pm | Lunch | W1.10-14.11 | ||
| 2:00pm-3:00pm | Ultraparallel solvers for multi-scale brain blood flow simulations on exascale computers | Leopold Grinberg (Brown University) | Keller Hall 3-180 | W1.10-14.11 |
| 3:00pm-4:00pm | Clouds MapReduce and HPC | Geoffrey Charles Fox (Indiana University) | Keller Hall 3-180 | W1.10-14.11 |
| All Day | Chair: Mark J. Stock (Applied Scientific Research) | W1.10-14.11 | ||
| 8:00am-8:30am | Coffee | Keller Hall 3-176 | W1.10-14.11 | |
| 8:30am-9:30am | I See GPU Shapes in the Clouds | David Mayhew (Advanced Micro Devices) | Keller Hall 3-180 | W1.10-14.11 |
| 9:30am-10:30am | Real-Time Medical and Geological Processing on GPU-based Systemss: Experiences and Challenges | Anne C. Elster (Norwegian University of Science and Technology (NTNU)) | Keller Hall 3-180 | W1.10-14.11 |
| 10:30am-11:00am | Break | Keller Hall 3-176 | W1.10-14.11 | |
| 11:00am-12:00pm | Emerging Programming and Machine Models: Opportunities for Numerical Algorithms R&D | Michael A. Heroux (Sandia National Laboratories) | Keller Hall 3-180 | W1.10-14.11 |
| 12:00pm-12:05pm | Closing remarks | Keller Hall 3-180 | W1.10-14.11 |
| All Day | Martin Luther King, Jr. Day. The IMA is closed. |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 | ||
| 11:15am-12:15pm | On dispersive effect of the Coriolis force for the stationary Navier-Stokes equations | Pawel Konieczny (University of Minnesota) | Lind Hall 305 | PS |
| 10:45am-11:15am | Coffee break | Lind Hall 400 | ||
| 2:30pm-3:30pm | Math 8994: Discontinuous Galerkin methods: An introduction - The original method: linear scalar transport | Bernardo Cockburn (University of Minnesota) | Lind Hall 305 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 |
| 10:45am-11:15am | Coffee break | Lind Hall 400 | ||
| 1:30pm-2:30pm | Tutorial Lectures: Modeling Hurricane Storm Surges - Lecture 1: Introduction to the shallow water equations | Clint Dawson (University of Texas at Austin) | Lind Hall 305 |
Event Legend: |
|
| PS | IMA Postdoc Seminar |
| SW1.3-5.11 | First Abel Conference A Mathematical Celebration of John Tate |
| T1.9.11 | Scientific Computing Using Graphics Processors |
| W1.10-14.11 | High Performance Computing and Emerging Architectures |
| Lorena A. Barba (Boston University) | The basis and perspectives of an exascale algorithm: our ExaFMM project |
| Abstract: Linearly scaling algorithms will be crucial for the problem sizes that will be tackled in capability exascale systems. It is interesting to note that many of the most successful algorithms are hierarchical in nature, such as multi-grid methods and fast multipole methods (FMM). We have been leading development efforts for open-source FMM software for some time, and recently produced GPU implementations of the various computational kernels involved in the FMM algorithm. Most recently, we have produced a multi-GPU code, and performed scalability studies showing high parallel efficiency in strong scaling. These results have pointed to several features of the FMM that make it a particularly favorable algorithm for the emerging heterogeneous, many-core architectural landscape. We propose that the FMM algorithm offers exceptional opportunities to enable exascale applications. Among its exascale-suitable features are: (i) it has intrinsic geometric locality, and access patterns are made local via particle indexing techniques; (ii) we can achieve temporal locality via an efficient queuing of GPU tasks before execution, and at a fine level by means of memory coalescing based on the natural index-sorting techniques; (iii) global data communication and synchronization, often a significant impediment to scalability, is a soft barrier for the FMM, where the most time-consuming kernels are, respectively, purely local (particle-to-particle interactions) and "hierarchically synchronized" (multipole-to-local interactions, which happen simultaneously at every level of the tree). In addition, we suggest a strategy for achieving the best algorithmic performance, based on two key ideas: (i) hybridize the FMM with treecode by choosing on-the-fly between particle-particle, particle-box, and box-box interactions, according to a work estimate; (ii) apply a dynamic error-control technique, effected on the treecode by means of a variable "box-opening angle" and on the FMM by means of a variable order of the multipole expansion. We have carried out preliminary implementation of these ideas/techniques, achieving a 14x speed-up with respect to our current published version of the FMM. Considering that this effort was only exploratory, we are certain to possess the potential for unprecedented performance with these algorithms. | |
| Lorena A. Barba (Boston University) | Reproducible results and open source code Keller Hall 3-180 |
| Abstract: No Abstract | |
| Alexander A. Beilinson (University of Chicago) | P-adic periods and derived de Rham cohomology |
| Abstract: I will show that Fontaine's ring of p-adic periods can be realized as the ring of universal p-adic constants in the sense of derived algebraic geometry, and discuss a possible new construction of the p-adic periods map. | |
| Manjul Bhargava (Princeton University) | The average rank of elliptic curves |
| Abstract: No Abstract | |
| Richard C. Brower (Boston University) | Algorithms for Lattice Field Theory at Extreme Scales |
| Abstract: Increases in computational power allow lattice field theories to resolve smaller scales, but to realize the full benefit for scientific discovery, new multi-scale algorithms must be developed to maximize efficiency. Examples of new trends in algorithms include adaptive multigrid solvers for the quark propagator and an improved symplectic Force Gradient integrator for the Hamiltonian evolution used to include the quark contribution to vacuum fluctuations in the quantum path integral. Future challenges to algorithms and software infrastructure targeting many-core GPU accelerators and heterogeneous extreme scale computing are discussed. | |
| Richard C. Brower (Boston University) | Mixed precision computing Lind Hall 401 |
| Abstract: No Abstract | |
| Cris Cecka (Stanford University) | Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics |
| Abstract: We discuss multiple strategies to perform general computations on unstructured grids using a GPU, with specific application to the assembly of systems of equations in finite element methods (FEMs). For each method, we discuss the GPU hardware's limiting resources, optimizations, key data structures, and dependence of the performance with respect to problem size, element size, and GPU hardware generation. These methods are applied to a nonlinear hyperelastic material model to develop a large-scale real-time interactive elastodynamic visualization. By performing the assembly, solution, update, and visualization stages solely on the GPU, the similuation benefits from speed-ups in each stage and avoids costly GPU-CPU transfers of data. | |
| Cris Cecka (Stanford University) | Lecture 1: Scientific Computing using Graphics Processors |
| Abstract: In this short course, we introduce the GPU as a coprocessor for scientific computing. The course will review modern hardware, CUDA programming, algorithm design, and optimization considerations for this unique compute environment. Introductory example codes and slides will be available to aid attendees in using GPUs to accelerate their applications. | |
| Cris Cecka (Stanford University) | Lecture 3: Introduction to heterogeneous computing with GPUs |
| Abstract: see abstract for Lecture 1 | |
| Cris Cecka (Stanford University) | Lecture 2: Scientific Computing with Graphics Processors |
| Abstract: see abstract for Lecture 1 | |
| Jonathan M. Cohen (NVIDIA Corporation) | Thinking parallel: sparse iterative solvers with CUDA |
| Abstract: Iterative sparse linear solvers are a critical component of a scientific computing platform. Developing effective preconditioning strategies is the main challenge in developing iterative sparse solvers on massively parallel systems. As computing systems become increasingly power-constrained, memory hierarchies for massively parallel systems will become deeper and more hierarchical. Parallel algorithms with all-to-all communication patterns that assume uniform memory access times will be inefficient on these systems. In this talk, I will outline the challenges of developing good parallel preconditioners, and demonstrate that domain decomposition methods have communication patterns that match emerging parallel platforms. I will present recent work to develop restricted additive Schwarz (RAS) preconditioners as part of the open source 'cusp' library of sparse parallel algorithms. On 2d Poisson problems, a RAS preconditioner is consistently faster than diagonal preconditioning in time-to-solution. Detailed analysis demonstrates that the communication pattern of RAS matches the on-chip bandwidths of a Fermi GPU. Line smoothing, which requires solving a large number of small tridiagonal linears systems in local memory, is another preconditioning approach with similar communication patterns. I will conclude with a roadmap for devoping a range of preconditioners, smoothers, and linear solvers on massively parallel hardware based on the domain decomposition and line smoothing approaches. | |
| Clint Dawson (University of Texas at Austin) | Tutorial Lectures: Modeling Hurricane Storm Surges - Lecture 1: Introduction to the shallow water equations |
| Abstract: An overview of the two-dimensional, depth-averaged shallow water equations. I will give underlying assumptions, derivation from the Navier-Stokes equations, and discuss the relevant forcing terms, including tides, wind and atmospheric pressure, gravity, and bottom friction. | |
| Jack J. Dongarra (University of Tennessee) | Architecture-aware Algorithms and Software for Scalable Performance and Resilience on Heterogeneous Architectures |
Abstract:
In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder.
We will look at five areas of research that will have an importance impact in the development of software and algorithms.
We will focus on following themes:
|
|
| Anne C. Elster (Norwegian University of Science and Technology (NTNU)) | Real-Time Medical and Geological Processing on GPU-based Systemss: Experiences and Challenges |
| Abstract: GPUs are now massive floating-point stream processors that offer a source of energy-efficient compute power on our laptops and desktops. Recent development of tools such as CUDA and OpenCL have made it much easier to utilize the computational power these systems offer. However, in order to optimally harness the the power of these GPU-based systems, there still are many challenges to overcome. In this talk, several issues related to our experiences with medical and geological processing applications that can benefit from real-time processing of data on GPUs, will be discussed. These include real-time medical imaging, e.g. for ultrasound-guided discovery and surgery, real-time seismic CT image enhancement, and using GPUs for real-time compression of seismic data in order to lower I/O latency. This talk will highlight work our research group has been involved dating back from 2006 through today. | |
| Anne C. Elster (Norwegian University of Science and Technology (NTNU)) | Medical Imaging on the GPU Using OpenCL: 3D Surface Extraction and 3D Ultrasound Reconstruction |
| Abstract: Collaborators: Frank Linseth, Holger Ludvigsen, Erik Smistad and Thor Kristian ValgerhaugGPUs offer a lot of compute power enabling real-time processing of images. This poster depict som our of group's recent work on image processing for medical applications on GPUs including 3D surface extraction using marching cubes and 3D ultrasound reconstruction. We have previously developed Cg and CUDA codes for wavelet transforms and CUDA codes for surface extraction for seismic images. | |
| Allan Peter Engsig-Karup (Technical University of Denmark) | Development of a new massively parallel tool for nonlinear free surface wave simulation |
| Abstract: The research objective of this work is to develop a new dedicated and massively parallel tool for efficient simulation of unsteady nonlinear free surface waves. The tool will be used for applications in coastal and offshore engineering, e.g. in connection with prediction of wave kinematics and forces at or near human-made structures. The tool is based on a unified potential flow formulation which can account for fully nonlinear and dispersive wave motion over uneven depths under the assumptions of nonbreaking waves, irrotational and inviscid flow. This work is a continuation of earlier work and will continue to contribute to advancing state-of-the-art for efficient wave simulation. The tool is expected to be orders of magnitude faster than current tools due to efficient algorithms and utilization of available hardware resources. | |
| Allan Peter Engsig-Karup (Technical University of Denmark) | Development of Desktop Computing Applications and Engineering Tools on GPUs |
| Abstract: GPULab - A competence center and laboratory for research and collaboration within academia and partners in industry has been established in 2008 at section for Scientific Computing, DTU informatics, Technical University of Denmark. In GPULab we focus on the utilization of Graphics Processing Units (GPUs) for high-performance computing applications and software tools in science and engineering, inverse problems, visualization, imaging, dynamic optimization. The goals are to contribute to the development of new state-of-the-art mathematical models and algorithms for maximum throughout performance, improved performance profiling tools and assimilation of results to academic and industrial partners in our network. Our approaches calls for multi-disciplinary skills and understanding of hardware, software development, profiling tools and tuning techniques, analytical methods for analysis and development of new approaches, together with expert knowledge in specific application areas within science and engineering. We anticipate that our research in a near future will bring new algorithms and insight in engineering and science applications targeting practical engineering problems. | |
| Jean-Marc Fontaine (Université de Paris XI (Paris-Sud)) | Vector bundles and p-adic Galois representations |
| Abstract: Let $F$ be a perfect field of characteristic $p>0$ equipped with a non trivial absolute value, $E$ a non archimedean locally compact field whose residue field is contained in $F$ and $pi$ a uniformizing parameter of $E$. We associate functorially to these datas a separated integral noetherian regular scheme $X=X_{F,E,pi}$ of dimension $1$ defined over $E$. There is an equivalence of categories between semi-stable vector bundles of slope $0$ over $X$ and continuous $E$-linear representations of the absolute Galois group $H_F$ of $F$.When $F$ is algebraically closed, the closed points of $F$ can be described in terms of the Lubin-Tate formal group of $E$ corresponding to $pi$.If $C$ is the $p$-adic completion of $overline Q_p$, one can associate to $C$ an algebraically closed field $F=F(C)$ as above and ${rm Gal)(overlineQ_p/Q_p)$ acts on the curve $X=X_{F(C),Q_p,p}$. The two main results of $p$-adic Hodge theory can be recovered from the classification of vector bundles over $X$.(joint work with Laurent Fargues)Read more at http://www.math.u-psud.fr/~fargues/Prepublications.html. | |
| Geoffrey Charles Fox (Indiana University) | Clouds MapReduce and HPC |
| Abstract: 1) We analyze the different tradeoffs and goals of Grid, Cloud and parallel (cluster/supercomputer) computing. 2) They tradeoff performance, fault tolerance, ease of use (elasticity), cost, interoperability. 3) Different application classes (characteristics) fit different architectures and we describe a hybrid model with Grids for data, traditional supercomputers for large scale simulations and clouds for broad based "capacity computing" including many data intensive problems. 4) We discuss the impressive features of cloud computing platforms and compare MapReduce and MPI. 5) We take most of our examples from the life science area. 6) We conclude with a description of FutureGrid -- a TeraGrid system for prototyping new middleware and applications. | |
| Martin J. Gander (Universite de Geneve) | A Domain Decomposition Method that Converges in Two Iterations for any Subdomain Decomposition and PDE |
| Abstract: Joint work with Felix Kwok.All domain decomposition methods are based on a decomposition of the physical domain into many subdomains and an iteration, which uses subdomain solutions only (and maybe a coarse grid), in order to compute an approximate solution of the problem on the entire domain. We show in this poster that it is possible to formulate such an iteration, only based on subdomain solutions, which converges in two steps to the solution of the underlying problem, independently of the number of subdomains and the PDE solved. This method is mainly of theoretical interest, since it contains sophisticated non-local operators (and a natural coarse grid component), which need to be approximated in order to obtain a practical method. | |
| Gaurav Gaurav (University of Minnesota) | Efficient Uncertainty Quantification using GPUs |
| Abstract: Joint work with Steven F. Wojtkiewicz (Department of Civil Engineering, University of Minnesota, Minneapolis, MN 55414, USA. bykvich@umn.edu).Graphics processing units (GPUs) have emerged as a much economical and a highly competitive alternative to CPU-based parallel computing. Recent studies have shown that GPUs consistently outperform their best corresponding CPU-based parallel computing equivalents by up to two orders of magnitude in certain applications. Moreover, the portability of the GPUs enables even a desktop computer to provide a teraflop (1012 floating point operations per second) of computing power. This study presents the gains in computational efficiency obtained using the GPU-based implementations of five types of algorithms frequently used in uncertainty quantification problems arising in the analysis of dynamical systems with uncertain parameters and/or inputs. | |
| Mike Giles (University of Oxford) | OP2: an open-source library for unstructured grid applications |
| Abstract: Based on an MPI library written over 10 years ago, OP2 is a new open-source library which is aimed at application developers using unstructured grids. Using a single API, it targets a variety of backend architectures, including both manycore GPUs and multicore CPUs with vector units. The talk will cover the API design, key aspects of the parallel implementation on the different platforms, and preliminary performance results on a small but representative CFD test code. | |
| Leopold Grinberg (Brown University) | Ultraparallel solvers for multi-scale brain blood flow simulations on exascale computers |
| Abstract: Solvers for coupled multi-scale (multi-physics) may be constructed by coupling an array of existing and well tested parallel numerical solvers, each designed to tackle a problem at different spatial and temporal scale. Each solver can be optimized/designed for different computer architecture. Future supercomputers may be composed of heterogeneous processing units, i.e., CPU/GPU. To make an efficient use of computational recourses, the coupled solvers must support topology-aware mapping of tasks to the processing units were the best parallel efficiency could be achieved.Arterial blood circulation is a multi-scale process where time and space scales range from nanoseconds (nanometers) to seconds (meters), reciprocally. The macro-vascular scales describing the flow dynamics in larger vessels are coupled to the meso-vascular scales unfolding dynamics of individual blood cells. The meso- vascular events are coupled to the micro-vascular ones accounting for blood perfusion, clot formation, adhesion of the blood cells to the arterial walls, etc. Besides the multi-scale nature of the problem, its size often presents a substantial computational challenge even for simulations considering a single scale.In this talk we will try to envision the design of a multi-scale solver for blood flow simulations, tailored to heterogeneous computer architecture. | |
| Leopold Grinberg (Brown University) | Brain Perfusion: Multi-scale Simulations and Visualization |
| Abstract: Joint work with J. Insley, M. Papka, and G. E. Karniadakis.Interactions of blood flow in the human brain occur between different scales, determined by flow features in the large arteries (above 0.5mm diameter), arterioles, and the capillaries (of 5E-3 mm). To simulate such multi-scale flow we develop mathematical models, numerical methods, scalable solvers and visualization tools. Our poster will present NektarG - a research code developed at Brown University for continuum and atomistic simulations. NektarG is based on a high-order spectral/hp element discretization featuring multi-patch domain decomposition for continuum flow simulations, and modified DPD-LAMMPS for mesoscopic simulations. The continuum and atomistic solvers are coupled via Multi-level Communicating Interface to exchange data required by interface conditions. The visualization software is based on ParaView and NektarG utilities accessed through the ParaView GUI. The new visualization software allows to simultaneously present data computed in coupled (multi-scale) simulations. The software automatically synchronizes the display of time evolution of solutions at multiple scales. | |
| Dominik Göddeke (Universität Dortmund), Robert Strzodka (Max-Planck-Institut für Informatik) | Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers |
| Abstract: We present efficient fine-grained parallelization techniques for robust multigrid solvers and Krylov subspace schemes, in particular for numerically strong smoothing and preconditioning operators. We apply them to sparse ill-conditioned linear systems of equations that arise from grid-based discretization techniques like finite differences, volumes and elements; the systems are notoriously hard to solve due to severe anisotropies in the underlying mesh and differential operator. These strong smoothers are characterized by sequential data dependencies, and do not parallelize in a straightforward manner. For linewise preconditioners, exact parallel algorithms exist, and we present a novel, efficient implementation of a cyclic reduction tridiagonal solver. For other preconditioners, traditional wavefront techniques can be applied, but their irregular and limited parallelism makes them a bad match for GPUs. Therefore, we discuss multicoloring techniques to recover parallelism in these preconditioners, by decoupling some of the dependencies at the expense of at first reduced numerical performance. However, by carefully balancing the coupling strength (more colors) with the parallelization benefits, the multicolored variants retain almost all of the sequential numerical performance. Further improvements are achieved by merging the tridiagonal and Gauß-Seidel approach into a smoothing operator that combines their advantages, and by employing an alternating direction implicit scheme to gain independence of the numbering of the unknowns. Due to their advantageous numerical properties, multigrid solvers equipped with strong smoothers are between four and eight times more efficient than with simple Gauß-Seidel preconditioners, and we achieve speedups factors between six and 18 with the GPU implementations over carefully tuned CPU variants. | |
| Michael A. Heroux (Sandia National Laboratories) | Emerging Programming and Machine Models: Opportunities for Numerical Algorithms R&D |
| Abstract: After 15-20 years of architectural stability, we are in the midst of a dramatic change in high performance computing systems design. In this talk we discuss the commonalities across the viable systems of today, and look at opportunities for numerical algorithms research and development. In particular, we explore possible programming and machine abstractions and how we can develop effective algorithms based on these abstractions, addressing, among other things, robustness issues for preconditioned iterative methods and resilience of algorithms in the presence of soft errors. | |
| Michael A. Heroux (Sandia National Laboratories) | Exascale programming models Lind Hall 409 |
| Abstract: No Abstract | |
| Elizabeth R. Jessup (University of Colorado) | The Build to Order Compiler for Matrix Algebra Optimization |
| Abstract: The performance of many high performance computing applications is limited by data movement from memory to the processor. Often their cost is more accurately expressed in terms of memory traffic rather than floating-point operations and, to improve performance, data movement must be reduced. One technique to reduce memory traffic is the fusion of loops that access the same data. We have built the Build to Order (BTO) compiler to automate the fusion of loops in matrix algebra kernels. Loop fusion often produces speedups proportional to the reduction in memory traffic, but it can also lead to negative effects in cache and register use. We present the results of experiments with BTO that help us to understand the workings of loop fusion. | |
| David E. Keyes (Columbia University) | The Exascale: Why and How |
| Abstract: Sustained floating-point computation rates on real applications, as tracked by the ACM Gordon Bell Prize, increased by three orders of magnitude from 1988 (1 Gigaflop/s) to 1998 (1 Teraflop/s), and by another three orders of magnitude to 2008 (1 Petaflop/s). Computer engineering provided only a couple of orders of magnitude of improvement for individual cores over that period; the remaining factor came from concurrency, which is approaching one million-fold.Algorithmic improvements contributed meanwhile to making each flop more valuable scientifically. As the semiconductor industry now slips relative to its own roadmap for silicon-based logic and memory, concurrency, especially on-chip many-core concurrency and GPGPU SIMD-type concurrency, will play an increasing role in the next few orders of magnitude, to arrive at the ambitious target of 1 Exaflop/s, extrapolated for 2018. An important question is whether today's best algorithms are efficiently hosted on such hardware and how much co-design of algorithms and architecture will be required.From the applications perspective, we illustrate eight reasons why today's computational scientists have an insatiable appetite for such performance: resolution, fidelity, dimension, artificial boundaries, parameter inversion, optimal control, uncertainty quantification, and the statistics of ensembles.The paths to the exascale summit are debated, but all are narrow and treacherous, constrained by fundamental laws of physics, cost, power consumption, programmability, and reliability. Drawing on recent reports, workshops, vendor projections, and experiences with scientific codes on contemporary platforms, we propose roles for today's researchers in one of the great global scientific quests of the next decade. | |
| David E. Keyes (Columbia University) | Lecture 1: Implications of the exascale roadmap for algorithms |
| Abstract: The central challenge in progressing from petascale to exascale supercomputing is the same as that in progressing from gigascale to terascale personal computing: strong scaling within shared memory on a single node of up to 1K simultaneously active computational threads. Many issues in algorithmic design and implementation are identical in these two simultaneous quests; however, the exascale quest has additional challenges due to practical limits on total power consumption (which come at the expense of resilience and node performance uniformity), to system-scale reliability (due to more points of failure), and to the need to merge the on-node programming environment with a million others (a weak scaling that is not in itself difficult, but will lead to challenges of coordination). This lecture series presents the issues, as digested from recent US Department of Energy roadmapping exercises, and focuses attention on some new issues that require mathematical attention. It is intended to provide those new to exascale computing with a working background for the week ahead, and motivation for the GPU scientific programming unit of the tutorial. | |
| David E. Keyes (Columbia University) | Lecture 2: Implications of the exascale roadmap for algorithms |
| Abstract: see abstract for Lecture 1 | |
| Mark Kisin (Harvard University) | Points on Shimura varieties mod p |
| Abstract: I will explain some results towards the Langlands-Rapoport conjecture which predicts the structure of the mod p points of a Shimura variety. A consequence of the conjecture is that the isogeny class of every mod p point contains a point which admits a lifting to a special (ie CM) point of the Shimura variety. One of the roots of the subject is the work of John Tate on CM liftings and endomorphisms of abelian varieties mod p. | |
| Andreas Klöckner (New York University) | High-order DG Wave Propagation on GPUs: Infrastructure, Implementation, Method Improvements |
| Abstract: Having recently shown that high-order unstructured discontinuous Galerkin (DG) methods are a discretization method for systems of hyperbolic conservation laws that is well-matched to execution on GPUs, in this talk I will explore both core and supporting components of high-order DG solvers for their suitability for and performance on modern, massively parallel architectures. Components examined range from software components facilitating implementation to strategies for automated tuning and, time permitting, numerical tweaks to the method itself. In concluding, I will present a selection of further design considerations and performance data. | |
| Matthew Gregg Knepley (University of Chicago) | GPU programming from higher level representations |
| Abstract: We discuss the construction and execution of GPU kernels from higher level specifications. Examples will be shown using low-order finite elements and fast multipole method. | |
| Pawel Konieczny (University of Minnesota) | On dispersive effect of the Coriolis force for the stationary Navier-Stokes equations |
| Abstract: The dispersive effect of the Coriolis force for the stationary and nonstationary Navier-Stokes equations is investigated. Existence of a unique stationary solution is shown for arbitrary large external force provided the Coriolis force is large enough. In addition to the stationary case, counterparts of several classical results for the non-stationary Navier-Stokes problem have been proven. The analysis is carried out in a new framework of the Fourier-Besov spaces. | |
| Hugo Leclerc (École Normale Supérieure de Cachan) | Global symbolic manipulations and code generation for Finite Elements on SIM[DT] hardware |
| Abstract: Tools have been developed to generate code to solve partial differential equations from high level descriptions (manipulation of files, global operators, ...). The successive symbolic transformations lead to a macroscopic description of the code to be executed, which can thus be translated into x86 (SSEx), C++ or cuda code. The point emphasized here is that the different processes can be adapted to the target hardware, taking into account the ratio gflops / gbps (making e.g. the choice between re-computations or cache), the SIM[DT] abilities, ... The poster will present the gains (compared to classical CPU/GPU implementations) for two implementation of a 3D unstructured FEM solver,using respectively a conjugate gradient and a domain decomposition method with repetitive patterns. | |
| Miriam Leeser (Northeastern University) | The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs or How to Avoid the Heroics of GPU Programming |
| Abstract: We live in the age of heroic programming for scientific applications on Graphics Processing Units (GPUs). Typically a scientist chooses an application to accelerate and a target platform, and through great effort maps their application to that platform. If they are a true hero, they achieve two or three orders of magnitude speedup for that application and target hardware pair. The effort required includes a deep understanding of the application, its implementation and the target architecture. When a new, higher performance architecture becomes available additional heroic acts are required. There is another group of scientists who prefer to spend their time focused on the application level rather than lower levels. These scientists would like to use GPUs for their applications, but would prefer to have parameterized library components available that deliver high performance without requiring heroic efforts on their part. The library components should be easy to use and should support a wide range of user input parameters. They should exhibit good performance on a range of different GPU platforms, including future architectures. Our research focuses on creating such libraries. We have been investigating parameterized library components for use with Matlab/Simulink and with the SCIRun Biomedical Problem Solving Environment from the University of Utah. In this talk I will discuss our library development efforts and challenges to achieving high performance across a range of both application and architectural parameters. I will also focus on issues that arise in achieving correct behavior of GPU kernels. One issue is correct behavior with respect to thread synchronization. Another is knowing whether or not your scientific application that uses floating point is correct when the results differ depending on the target architecture and order of computation. | |
| Miriam Leeser (Northeastern University) | GPU Acceleration in a Modern Problem Solving Environment: SCIRun's Linear System Solvers |
| Abstract: This research demonstrates the incorporation of GPU's parallel processing architecture into the SCIRun biomedical problem solving environment with minimal changes to the environment or user experience. SCIRun, developed at the University of Utah, allows scientists to interactively construct many different types of biomedical simulations. We use this environment to demonstrate the effectiveness of the GPU by accelerating time consuming algorithms present in these simulations. Specifically, we target the linear solver module, which contains multiple solvers that benefit from GPU hardware. We have created a class to accelerate the conjugate gradient, Jacobi and minimal residual linear solvers; the results demonstrate that the GPU can provide acceleration in this environment. A principal focus was to remain transparent by retaining the user friendly experience to the scientist using SCIRun's graphical user interface. NVIDIA's CUDA C language is used to enable performance on NVIDIA GPUs. Challenges include manipulating the sparse data processed by these algorithms and communicating with the SCIRun interface amidst computation. Our solution makes it possible to implement GPU versions of the existing SCIRun algorithms easily and can be applied to other parallel algorithms in the application. The GPU executes the matrix and vector arithmetic to achieve acceleration performance of up to 16x on the algorithms in comparison to SCIRun's existing multithreaded CPU implementation. The source code will contain single and double precision versions to utilize a wide variety of GPU hardware and will be incorporated and publicly available in future versions of SCIRun. | |
| David Mayhew (Advanced Micro Devices) | I See GPU Shapes in the Clouds |
| Abstract: Fusion (the integration of CPU and GPU into a single processing entity) is here. Cloud based software services are here. Large processing clusters are running massively parallel Hadoop programs now. Can large-scale, commercial, enterprise, server solutions be dynamically repurposed to run HPC problem sets? The future of HPC may well be a massive set of virtual machines running in "curve of the earth" sized data centers. The cost of HPC processing sponges (HPC problem sets that consume otherwise wasted processing cycles in scale-out server clusters) will probably make all but the most extreme purpose-built HPC systems obsolete. | |
| Dan Negrut (University of Wisconsin-Madison) | Large Scale Frictional Contact Dynamics on the GPU |
| Abstract: This talk summarizes an effort at the Modeling, Simulation and Visualization Center at the University of Wisconsin-Madison to model and simulate large scale discrete dynamics problems. This effort is motivation by a desire to address unsolved challenges posed by granular dynamics problems, mobility of tracked and wheeled vehicle on granular terrain, and digging into granular material, to name a few. In the context of simulating the dynamics of large systems of interacting rigid bodies, we briefly outline a method for solving large cone complementarity problems by means of a fixed-point iteration algorithm. The method is an extension of the Gauss-Jacobi algorithms with over-relaxation for symmetric convex complementarity problems. Convergent under fairly standard assumptions, the method is implemented in a scalable parallel computational framework by using a single instruction multiple data (SIMD) execution paradigm supported by the Compute Unified Device Architecture (CUDA) library for programming on the graphical processing unit (GPU). The simulation framework developed supports the analysis of problems with more than one million rigid bodies that interact through contact and friction forces, and whose dynamics are constrained by either unilateral or bilateral kinematic constraints. Simulation thus becomes a viable tool for investigating in the near future the dynamics of complex systems such as the Mars Rover operating on granular terrain, powder composites, and granular material flow. The talk concludes with a short summary of other applications that stand to benefit from the computational power available on today’s GPUs. | |
| Carl Pomerance (Dartmouth College) | Elliptic curves: problems and applications |
| Abstract: In the past three decades there have been some exciting applications of elliptic curves over finite fields to integer factoring, primality testing, and cryptography. These applications in turn have raised some interesting problems often of an unconventional flavor. For example, how often is the order of an elliptic curve group prime, or how often does it have all small prime factors? In this talk we will visit problems such as these, as well as other analytic-type problems relating to ranks of elliptic curves over function fields and to elliptic divisibility sequences. | |
| Bjorn Poonen (Massachusetts Institute of Technology) | Random maximal isotropic subspaces and Selmer groups |
| Abstract: We show that the p-Selmer group of an elliptic curve is naturally the intersection of two maximal isotropic subspaces in an infinite-dimensional locally compact quadratic space over F_p. By modeling this intersection as the intersection of a random maximal isotropic subspace with a fixed compact open maximal isotropic subspace, we can explain the known phenomena regarding distribution of Selmer ranks, such as the theorems of Heath-Brown, Swinnerton-Dyer, and Kane for 2-Selmer groups in certain families of quadratic twists, and the average size of 2- and 3-Selmer groups as computed by Bhargava and Shankar. The only distribution on Mordell-Weil ranks compatible with both our random model and Delaunay's heuristics for Sha[p] is the distribution in which 50% of elliptic curves have rank 0, and 50% have rank 1. We generalize many of our results to abelian varieties over global fields. This is joint work with Eric Rains. | |
| Cristian D Popescu (University of California, San Diego) | An equivariant main conjecture in Iwasawa theory and applications |
| Abstract: I will discuss the statement and proof of an Equivariant Main Conjecture (EMC) in the Iwasawa theory of arbitrary global fields. This will be followed by applications of the EMC (via Iwasawa co-descent) towards proving various well known conjectures on special values of global L-functions. In the process, an important role will be played by an explicit construction of ell-adic Tate sequences. This is based on joint work with Cornelius Greither (Munich). | |
| Michel Raynaud (Université de Paris XI (Paris-Sud)) | Permanence following Temkin |
| Abstract: If we specialize algebraic equations having good properties, we usually face degeneracies. Starting with a bad specialization, we can try to improve it , performing modifications under control. If we succeed to get a new specialization with the initial good properties preserved,we get a permanence statement. We shall present examples of permanence with particular interest concerning semi-stable models. | |
| Fernando Rodriguez Villegas (University of Texas at Austin) | On the geometry of character varieties |
| Abstract: We know, thanks to the Weil conjectures, that counting points of varieties over finite fields yields purely topological information about them. In this talk I will first describe how we may count the number of points over finite fields on the character varieties parameterizing certain representations of the fundamental group of a Riemann surface into GL_n. The calculation involves an array of techniques from combinatorics to the representation theory of finite groups of Lie type. I will then discuss the geometric implications of this computation and the conjectures it has led to. This is joint work with T. Hausel and E. Letellier | |
| Karl Rubin (University of California, Irvine) | Selmer ranks of elliptic curves in families of quadratic twists |
| Abstract: In joint work with Barry Mazur, we investigate the 2-Selmer rank in families of quadratic twists of elliptic curves over arbitrary number fields. We give sufficient conditions for an elliptic curve to have twists of arbitrary 2-Selmer rank, and we give lower bounds for the number of twists (with bounded conductor) with a given 2-Selmer rank. As a consequence, under appropriate hypotheses there are many twists with Mordell-Weil rank zero, and (assuming the Shafarevich-Tate conjecture) many others with Mordell-Weil rank one. Another application of our methods, using ideas of Poonen and Shlapentokh, is that if the Shafarevich-Tate conjecture holds then Hilbert's 10th problem has a negative answer over the ring of integers of any number field. | |
| Nayda G. Santiago (University of Puerto Rico) | Hyperspectral Image Analysis for Abundance Estimation using GPUs |
| Abstract: Hyperspectral images can be used for abundance estimation and anomaly detection, however, the algorithms involved tend to be I/O intensive. Parallelizing these algorithms can enable their use in real-time applications. A method of overcoming these limitations involves selecting parallelizable algorithms and implementing them using GPUs. GPUs are designed as throughput engines, built to process large amounts of dense data in a parallel fashion. RX's detectors and estimators of abundance will be parallelized and tested for correctness and performance. | |
| Fadil Santosa (University of Minnesota) | Welcome to the IMA |
| Abstract: No Abstract | |
| Olaf Schenk (Universität Basel) | A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures |
| Abstract: Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore crucial in order to reduce the time to solution. However, in the current complex hardware microarchitectures, meticulous architecture-specific tuning is required to elicit the machine's full compute power. We present a code generation and auto-tuning framework PATUS for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the autotuning methodology to optimize strategy-dependent parameters for the given hardware architecture. | |
| Bertil Schmidt (Nanyang Technological University) | Algorithms and Tools for Bioinformatics on GPUs |
| Abstract: The enormous growth of biological sequence data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of parallel accelerator technologies such as GPUs has made it possible to significantly reduce the execution times of many bioinformatics applications. In this talk I will present the design and implementation of scalable GPU algorithms based on the CUDA programming model in order to accelerate important bioinformatics applications. In particular, I will focus on algorithms and tools for next-generation sequencing (NGS) using error correction as an example.Detection and correction of sequencing errors is an important but time-consuming pre-processing step for de-novo genome assembly or read mapping. In this talk, I discuss the parallel algorithm design used for the CUDA-EC and DecGPU tools. I will also give an overview of other CUDA-enabled tools developed by my research group. | |
| Mark J. Stock (Applied Scientific Research) | A GPU-accelerated Boundary Element Method and Vortex Particle Method |
| Abstract: Vortex particle methods, when combined with multipole-accelerated boundary element methods (BEM), become a complete tool for direct numerical simulation (DNS) of internal or external vortex-dominated flows. In previous work, we presented a method to accelerate the vorticity-velocity inversion at the heart of vortex particle methods by performing a multipole treecode N-body method on parallel graphics hardware. The resulting method achieved a 17-fold speedup over a dual-core CPU implementation. In the present work, we will demonstrate both an improved algorithm for the GPU vortex particle method that outperforms an 8-core CPU by a factor of 43, but also a GPU-accelerated multipole treecode method for the boundary element solution. The new BEM solves for the unknown source, dipole, or combined strengths over a triangulated surface using all available CPU cores and GPUs. Problems with up to 1.4 million unknowns can be solved on a single commodity desktop computer in one minute, and at that size the hybrid CPU/GPU outperforms a quad-core CPU alone by 22.5 times. The method is exercised on DNS of impulsively-started flow over spheres at Re=500, 1000, 2000, and 4000. | |
| Mark J. Stock (Applied Scientific Research) | Algorithmic Fluid Art – Influences, Process, and Works |
| Abstract: In addition to my research into vortex particle methods, parallel N-body methods, and GPU programming, I create artwork using these same computer programs. The work consists of imagery and animations of fluid forms and other shapes and patterns in nature. Using relatively simple algorithms reflecting the origins of their underlying processes, many of these patterns can be recreated and their inherent beauty exposed. In this talk, I will discuss the technical aspects of my work, but mainly plan to distract attention with the works themselves.Biography:Mark Stock earned his PhD from Aerospace Engineering at the University of Michigan in 2006, and has been working for Applied Scientific Research in Santa Ana, CA since then. He has been creating computer imagery and numerical simulations for over 25 years, and started exhibiting his artwork in 2001. | |
| Robert Strzodka (Max-Planck-Institut für Informatik) | Everyday Parallelism |
| Abstract: Parallelism is largely seen as a necessary evil to cope with the power restrictions on a chip and most programmers would prefer to continue writing sequential programs rather than dealing with the alien and error-prone parallel programming. This talk will question this view and point out how the allegedly unfamiliar parallel processing is utilized by millions of people everyday. Parallelism appears as a course only when looking at it from the crooked illusion of sequential processing. Admittedly, there are critical decisions associated with specialization, data movement or synchronization, but we also have lots of experience in taking them because they are performed everyday. Presented results will demonstrate that the drawn analogies are not just theoretic. | |
| Keita Teranishi (CRAY Inc) | Locally-Self-Consistent Multiple-Scattering code (LSMS) for GPUs |
| Abstract: Locally-Self-Consistent Multiple-Scattering (LSMS) is one of the major petascale applications and highly tuned for supercomputer systems like Cray XT5 Jaguar. We present our recent effort on porting and tuning the major computational routine of LSMS to GPU based systems to demonstrate the feasibility of LSMS beyond petaflops. In particular, we discuss the techniques, including auto-tuning of dense matrix kernels and computation-communication overlap. | |
| Jonas Tölke (Ingrain) | Lattice Boltzmann Multi-Phase Simulations in Porous Media using GPUs |
| Abstract: We present a very efficient implementation of a multiphase lattice Boltzmann methods (LBM) based on CUDA. This technology delivers significant benefits for predictions of properties in rocks. The simulator on NVIDIA hardware enables us to perform pore scale multi-phase (oil-water-matrix) simulations in natural porous media and to predict important rock properties like absolute permeability, relative permeabilites, and capillary pressure. We will show videos of these simulations in complex real world porous media and rocks. | |
| Jonas Tölke (Ingrain) | Digital rocks physics: fluid flow in rocks |
| Abstract: We show how Ingrain's digital rock physics technology works to predict fluid flow properties in rocks. NVIDIA CUDA technology delivers significant acceleration for this technology. The simulator on NVIDIA hardware enables us to perform pore scale multi-phase (oil-water-matrix) simulations in natural porous media and to predict important rock properties like absolute permeability, relative permeabilites, and capillary pressure. | |
| Ulrike Meier Yang (Lawrence Livermore National Laboratory) | Preparing Algebraic Multigrid for Exascale |
| Abstract: Algebraic Multigrid (AMG) solvers are an essential component of many large-scale scientific simulation codes. Their continued numerical scalability and efficient implementation is critical for preparing these codes for exascale. Our experiences on modern multi-core machines show that significant challenges must be addressed for AMG to perform well on such machines. We discuss our experiences and describe the techniques we have used to overcome scalability challenges for AMG on hybrid architectures in preparation for exascale. | |
| Rio Yokota (Boston University) | Fast Multipole Methods on large cluster of GPUs |
| Abstract: The combination of algorithmic acceleration and hardware acceleration can have tremendous impact. The FMM is a fast algorithm for calculating matrix vector multiplications in O(N) time, and it runs very fast on GPUs. Its combination of high degree of parallelism and O(N) complexity make it an attractive solver for the Peta-scale and Exa-scale era. It has a wide range of applications, e.g. quantum mechanics, molecular dynamics, electrostatics, acoustics, structural mechanics, fluid mechanics, and astrophysics. | |
| Adebisi Agboola | University of California, Santa Barbara | 1/2/2011 - 1/6/2011 |
| Douglas N. Arnold | University of Minnesota | 9/1/2010 - 6/30/2011 |
| Gerard Michel Awanou | Northern Illinois University | 9/1/2010 - 6/10/2011 |
| Hasan Babaei | Auburn University | 1/9/2011 - 1/14/2011 |
| Matthew Baker | Georgia Institute of Technology | 1/3/2011 - 1/5/2011 |
| Nusret Balci | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Lorena A. Barba | Boston University | 1/9/2011 - 1/15/2011 |
| Hyman Bass | University of Michigan | 1/2/2011 - 1/5/2011 |
| Alexander A. Beilinson | University of Chicago | 1/2/2011 - 1/5/2011 |
| Jonathan Bentz | CRAY Inc | 1/10/2011 - 1/14/2011 |
| John Ferderick Bergdall | Brandeis University | 1/3/2011 - 1/5/2011 |
| Laurent Berger | École Normale Supérieure de Lyon | 1/1/2011 - 1/6/2011 |
| Vladimir Berkovich | Weizmann Institute of Science | 1/2/2011 - 1/6/2011 |
| Manjul Bhargava | Princeton University | 1/2/2011 - 1/5/2011 |
| Alexander Borisov | University of Pittsburgh | 1/2/2011 - 1/6/2011 |
| Nigel Boston | University of Wisconsin-Madison | 1/3/2011 - 1/5/2011 |
| Susanne C. Brenner | Louisiana State University | 9/1/2010 - 6/10/2011 |
| Richard C. Brower | Boston University | 1/9/2011 - 1/13/2011 |
| Armand Brumer | Fordham University | 1/2/2011 - 1/5/2011 |
| Joe Buhler | Center for Communications Research | 1/2/2011 - 1/5/2011 |
| Gregory Scott Call | Amherst College | 1/2/2011 - 1/5/2011 |
| Cris Cecka | Stanford University | 1/8/2011 - 1/14/2011 |
| Aycil Cesmelioglu | University of Minnesota | 9/30/2010 - 8/30/2011 |
| Byungchul Cha | Muhlenberg College | 1/2/2011 - 1/5/2011 |
| Chi Hin Chan | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Jung Hee Cheon | Seoul National University | 1/3/2011 - 1/5/2011 |
| Ionut Ciocan-Fontanine | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Dustin Clausen | Massachusetts Institute of Technology | 1/2/2011 - 1/5/2011 |
| Bernardo Cockburn | University of Minnesota | 9/1/2010 - 6/30/2011 |
| Jonathan M. Cohen | NVIDIA Corporation | 1/9/2011 - 1/14/2011 |
| Jintao Cui | University of Minnesota | 8/31/2010 - 8/30/2011 |
| Eric Felix Darve | Stanford University | 1/8/2011 - 1/9/2011 |
| Clint Dawson | University of Texas at Austin | 1/30/2011 - 2/5/2011 |
| Jack J. Dongarra | University of Tennessee | 1/9/2011 - 1/11/2011 |
| Geir Ellingsrud | University of Oslo | 1/2/2011 - 1/6/2011 |
| Anne C. Elster | Norwegian University of Science and Technology (NTNU) | 1/9/2011 - 1/15/2011 |
| Allan Peter Engsig-Karup | Technical University of Denmark | 1/9/2011 - 1/14/2011 |
| Carl Erickson | Harvard University | 1/2/2011 - 1/5/2011 |
| Selim Esedoglu | University of Michigan | 1/20/2011 - 6/10/2011 |
| Randy H. Ewoldt | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Liwu Fan | Auburn University | 1/9/2011 - 1/14/2011 |
| Oscar E. Fernandez | University of Minnesota | 8/31/2010 - 8/30/2011 |
| Daniel Flath | Macalester College | 1/3/2011 - 1/5/2011 |
| Jean-Marc Fontaine | Université de Paris XI (Paris-Sud) | 1/2/2011 - 1/8/2011 |
| Geoffrey Charles Fox | Indiana University | 1/12/2011 - 1/14/2011 |
| Martin J. Gander | Universite de Geneve | 1/9/2011 - 1/15/2011 |
| Paul Garrett | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Gaurav Gaurav | University of Minnesota | 1/9/2011 - 1/14/2011 |
| Toby Gee | Northwestern University | 1/3/2011 - 1/5/2011 |
| Mike Giles | University of Oxford | 1/8/2011 - 1/14/2011 |
| Dominik Göddeke | Universität Dortmund | 1/8/2011 - 1/15/2011 |
| Edray Herber Goins | Purdue University | 1/3/2011 - 1/5/2011 |
| Jay Gopalakrishnan | University of Florida | 9/1/2010 - 6/30/2011 |
| Vincent John Graziano | CRAY Inc | 1/9/2011 - 1/9/2011 |
| Leopold Grinberg | Brown University | 1/10/2011 - 1/15/2011 |
| Bobby Grizzard | University of Texas at Austin | 1/3/2011 - 1/5/2011 |
| Benedict Gross | Harvard University | 1/2/2011 - 1/5/2011 |
| Shiyuan Gu | Louisiana State University | 9/1/2010 - 6/30/2011 |
| Joseph Gunther | University of Texas at Austin | 1/2/2011 - 1/5/2011 |
| Ren Guo | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Thomas C. Hales | University of Pittsburgh | 1/2/2011 - 1/5/2011 |
| Michael A. Heroux | Sandia National Laboratories | 1/9/2011 - 1/14/2011 |
| Wei Ho | Columbia University | 1/2/2011 - 1/5/2011 |
| Yulia Hristova | University of Minnesota | 9/1/2010 - 8/31/2011 |
| Luc Illusie | Université de Paris XI (Paris-Sud) | 1/2/2011 - 1/6/2011 |
| Elizabeth R. Jessup | University of Colorado | 1/8/2011 - 1/14/2011 |
| Dihua Jiang | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Jennifer Johnson-Leung | University of Idaho | 1/2/2011 - 1/5/2011 |
| John Jones | Arizona State University | 1/2/2011 - 1/6/2011 |
| Nick Katz | Princeton University | 1/2/2011 - 1/5/2011 |
| Dinesh Kaushik | Argonne National Laboratory | 1/11/2011 - 1/14/2011 |
| Markus Keel | University of Minnesota | 7/21/2008 - 6/30/2011 |
| David E. Keyes | Columbia University | 1/8/2011 - 1/13/2011 |
| Mark Kisin | Harvard University | 1/2/2011 - 1/5/2011 |
| Andreas Klöckner | New York University | 1/9/2011 - 1/14/2011 |
| Matthew Gregg Knepley | University of Chicago | 1/9/2011 - 1/15/2011 |
| Pawel Konieczny | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Kenneth Kramer | Queens College, CUNY | 1/2/2011 - 1/5/2011 |
| Hugo Leclerc | École Normale Supérieure de Cachan | 1/9/2011 - 1/14/2011 |
| Miriam Leeser | Northeastern University | 1/9/2011 - 1/14/2011 |
| Gilad Lerman | University of Minnesota | 9/1/2010 - 6/30/2011 |
| Hengguang Li | University of Minnesota | 8/16/2010 - 8/15/2011 |
| Lizao (Larry) Li | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Peng Li | University of Minnesota | 1/9/2011 - 1/9/2011 |
| David J. Lilja | University of Minnesota | 1/10/2011 - 1/14/2011 |
| Zhi (George) Lin | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Baiying Liu | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Jonathan Lubin | Brown University | 1/1/2011 - 1/6/2011 |
| Benjamin E Lundell | Cornell University | 1/3/2011 - 1/6/2011 |
| Mitchell Luskin | University of Minnesota | 9/1/2010 - 6/30/2011 |
| Chris Lyons | University of Michigan | 1/2/2011 - 1/4/2011 |
| Kara Lee Maki | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Yu (David) Mao | University of Minnesota | 8/31/2010 - 8/30/2011 |
| David Mayhew | Advanced Micro Devices | 1/14/2011 - 1/14/2011 |
| William McCallum | University of Arizona | 1/3/2011 - 1/5/2011 |
| Lois Curfman McInnes | Argonne National Laboratory | 1/10/2011 - 1/13/2011 |
| William Messing | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Ina Mette | American Mathematical Society | 1/2/2011 - 1/5/2011 |
| Irina Mitrea | University of Minnesota | 8/16/2010 - 6/14/2011 |
| Dimitrios Mitsotakis | University of Minnesota | 10/27/2010 - 8/31/2011 |
| Kevin Mugo | Purdue University | 1/3/2011 - 1/5/2011 |
| Gregg Musiker | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Dan Negrut | University of Wisconsin-Madison | 1/9/2011 - 1/14/2011 |
| Switala Nicholas | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Sylvain Nintcheu Fata | Oak Ridge National Laboratory | 11/1/2010 - 1/29/2011 |
| Andrew Odlyzko | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Peter J. Olver | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Alexandra Ortan | University of Minnesota | 9/16/2010 - 6/15/2011 |
| Cecilia Ortiz-Duenas | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Miguel Pauletti | Texas A & M University | 1/8/2011 - 1/14/2011 |
| Carl Pomerance | Dartmouth College | 1/2/2011 - 1/5/2011 |
| Bjorn Poonen | Massachusetts Institute of Technology | 1/2/2011 - 1/5/2011 |
| Cristian D Popescu | University of California, San Diego | 1/2/2011 - 1/5/2011 |
| Weifeng (Frederick) Qiu | University of Minnesota | 8/31/2010 - 8/30/2011 |
| Vincent Quenneville-Belair | University of Minnesota | 9/16/2010 - 6/15/2011 |
| Varun Ramesh | University of Minnesota | 1/9/2011 - 1/14/2011 |
| Wayne Raskind | Arizona State University | 1/2/2011 - 1/5/2011 |
| Michel Raynaud | Université de Paris XI (Paris-Sud) | 1/2/2011 - 1/6/2011 |
| Fernando Reitich | University of Minnesota | 9/1/2010 - 6/30/2011 |
| Kenneth A. Ribet | University of California, Berkeley | 1/2/2011 - 1/5/2011 |
| Eric Riedl | Harvard University | 1/2/2011 - 1/6/2011 |
| David Peter Roberts | University of Minnesota | 1/2/2011 - 1/5/2011 |
| Fernando Rodriguez Villegas | University of Texas at Austin | 1/2/2011 - 1/5/2011 |
| Michael I. Rosen | Brown University | 1/2/2011 - 1/5/2011 |
| Jeffrey A. Rosoff | Gustavus Adolphus College | 1/2/2011 - 1/5/2011 |
| Karl Rubin | University of California, Irvine | 1/2/2011 - 1/5/2011 |
| Hakizumwami Birali Runesha | University of Minnesota | 1/9/2011 - 1/14/2011 |
| Yousef Saad | University of Minnesota | 1/10/2011 - 1/14/2011 |
| David J Saltman | Institute for Defense Analyses (IDA) | 1/2/2011 - 1/5/2011 |
| Nayda G. Santiago | University of Puerto Rico | 1/9/2011 - 1/15/2011 |
| Fadil Santosa | University of Minnesota | 7/1/2008 - 6/30/2011 |
| Olaf Schenk | Universität Basel | 1/8/2011 - 1/15/2011 |
| Bertil Schmidt | Nanyang Technological University | 1/9/2011 - 1/15/2011 |
| Anthony Scudiero | CRAY Inc | 1/10/2011 - 1/14/2011 |
| Shankar Sen | Cornell University | 1/2/2011 - 1/5/2011 |
| Chehrzad Shakiban | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Shuanglin Shao | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Stephen S. Shatz | University of Pennsylvania | 1/2/2011 - 1/5/2011 |
| Alice Silverberg | University of California, Irvine | 1/2/2011 - 1/5/2011 |
| Ethan Smith | Michigan Technological University | 1/2/2011 - 1/5/2011 |
| Steven Sperber | University of Minnesota | 1/3/2011 - 1/5/2011 |
| Harold M. Stark | University of California, San Diego | 1/2/2011 - 1/5/2011 |
| William A. Stein | University of Washington | 1/2/2011 - 1/5/2011 |
| Panagiotis Stinis | University of Minnesota | 9/1/2010 - 6/30/2011 |
| Mark J. Stock | Applied Scientific Research | 1/8/2011 - 1/14/2011 |
| Michael Stopa | Harvard University | 1/9/2011 - 1/14/2011 |
| Allan Struthers | Michigan Technological University | 1/9/2011 - 1/15/2011 |
| Robert Strzodka | Max-Planck-Institut für Informatik | 1/8/2011 - 1/14/2011 |
| Li-yeng Sung | Louisiana State University | 9/1/2010 - 6/15/2011 |
| Habiballah Talavatifard | Texas A & M University | 1/8/2011 - 1/14/2011 |
| Nicolae Tarfulea | Purdue University, Calumet | 9/1/2010 - 6/15/2011 |
| John Tate | University of Texas at Austin | 1/1/2011 - 1/9/2011 |
| Jeremy Teitelbaum | University of Connecticut | 1/2/2011 - 1/5/2011 |
| Keita Teranishi | CRAY Inc | 1/9/2011 - 1/14/2011 |
| Dinesh S. Thakur | University of Arizona | 1/2/2011 - 1/5/2011 |
| Jonas Tölke | Ingrain | 1/9/2011 - 1/14/2011 |
| Dimitar Trenev | University of Minnesota | 9/1/2009 - 8/31/2011 |
| Zohra Tridane | University of Minnesota | 1/10/2011 - 1/14/2011 |
| Jerrold Tunnell | Rutgers University | 1/2/2011 - 1/5/2011 |
| Douglas Ulmer | Georgia Institute of Technology | 1/3/2011 - 1/5/2011 |
| Jeffrey Vaaler | University of Texas at Austin | 1/2/2011 - 1/5/2011 |
| Marie-France Vigneras | Institut de Mathematiques de Jussieu | 1/1/2011 - 1/5/2011 |
| Vasily Volkov | University of California, Berkeley | 1/9/2011 - 1/14/2011 |
| Jose Felipe Voloch | University of Texas at Austin | 1/2/2011 - 1/6/2011 |
| Jamie Emmanuel Weigandt | Purdue University | 1/3/2011 - 1/6/2011 |
| Melanie Matchett Wood | Stanford University | 1/2/2011 - 1/5/2011 |
| Ulrike Meier Yang | Lawrence Livermore National Laboratory | 1/10/2011 - 1/14/2011 |
| Yimu Yin | University of Pittsburgh | 1/2/2011 - 1/5/2011 |
| Rio Yokota | Boston University | 1/11/2011 - 1/15/2011 |
| Qing Chaney Zhang | Ohio State University | 1/2/2011 - 1/6/2011 |
| Shuxia Zhang | University of Minnesota | 1/10/2011 - 1/14/2011 |
| Xudong Zheng | University of Illinois | 1/2/2011 - 1/5/2011 |