Web: http://www.ima.umn.edu | Email: ima-staff@ima.umn.edu | Telephone: (612) 624-6066 | Fax: (612) 626-7370
Additional newsletters available at http://www.ima.umn.edu/newsletters

IMA Newsletter #411

January 2011

2010-2011 Program

See http://www.ima.umn.edu/2010-2011/ for a full description of the 2010-2011 program on Simulating Our Complex World: Modeling, Computation and Analysis.

News and Notes

IMA Events

IMA Workshop

First Abel Conference A Mathematical Celebration of John Tate

January 3-5, 2011

Organizers: Joe Buhler (Center for Communications Research), Geir Ellingsrud (University of Oslo), Benedict Gross (Harvard University), Kenneth A. Ribet (University of California)

IMA Tutorial

Scientific Computing Using Graphics Processors

January 9, 2011

Speakers: Cris Cecka (Stanford University), David E. Keyes (Columbia University)

IMA Annual Program Year Workshop

High Performance Computing and Emerging Architectures

January 10-14, 2011

Organizers: Lorena A. Barba (Boston University), Eric Felix Darve (Stanford University), David E. Keyes (Columbia University)
Schedule

Monday, January 3

10:00am-11:00am Registration and coffeeKeller Hall 3-176 SW1.3-5.11
11:00am-11:15am Welcome to the IMAFadil Santosa (University of Minnesota)Keller Hall 3-180 SW1.3-5.11
11:15am-12:15pm Points on Shimura varieties mod pMark Kisin (Harvard University)Keller Hall 3-180 SW1.3-5.11
12:15pm-2:00pm Lunch SW1.3-5.11
2:00pm-3:00pm Elliptic curves: problems and applicationsCarl Pomerance (Dartmouth College)Keller Hall 3-180 SW1.3-5.11
3:00pm-3:10pm Group photo SW1.3-5.11
3:10pm-3:40pm Coffee breakKeller Hall 3-176 SW1.3-5.11
3:40pm-4:40pm Selmer ranks of elliptic curves in families of quadratic twistsKarl Rubin (University of California, Irvine)Keller Hall 3-180 SW1.3-5.11
5:00pm-7:00pm Reception at the Campus Club Bar LoungeCampus Club Bar Lounge SW1.3-5.11

Tuesday, January 4

8:30am-9:00am CoffeeKeller Hall 3-176 SW1.3-5.11
9:00am-10:00am An equivariant main conjecture in Iwasawa theory and applicationsCristian D Popescu (University of California, San Diego)Keller Hall 3-180 SW1.3-5.11
10:00am-10:30am Coffee breakKeller Hall 3-176 SW1.3-5.11
10:30am-11:30am Permanence following TemkinMichel Raynaud (Université de Paris XI (Paris-Sud))Keller Hall 3-180 SW1.3-5.11
11:30am-2:00pm Lunch SW1.3-5.11
2:00pm-3:00pm On the geometry of character varietiesFernando Rodriguez Villegas (University of Texas at Austin)Keller Hall 3-180 SW1.3-5.11
3:00pm-3:30pm Coffee breakKeller Hall 3-176 SW1.3-5.11
3:30pm-4:30pm The average rank of elliptic curvesManjul Bhargava (Princeton University)Keller Hall 3-180 SW1.3-5.11
6:00pm-8:00pm Banquet at McNamara Alumni Center, Heritage GalleryMcNamara Alumni Center, Heritage Gallery
200 Oak Street S.E.
Minneapolis, MN 55455
612-624-9831
SW1.3-5.11

Wednesday, January 5

8:30am-9:00am CoffeeKeller Hall 3-176 SW1.3-5.11
9:00am-10:00am P-adic periods and derived de Rham cohomology Alexander A. Beilinson (University of Chicago)Keller Hall 3-180 SW1.3-5.11
10:00am-10:30am Coffee breakKeller Hall 3-176 SW1.3-5.11
10:30am-11:30am Random maximal isotropic subspaces and Selmer groupsBjorn Poonen (Massachusetts Institute of Technology)Keller Hall 3-180 SW1.3-5.11
11:30am-1:00pm Lunch SW1.3-5.11
1:00pm-2:00pm Vector bundles and p-adic Galois representationsJean-Marc Fontaine (Université de Paris XI (Paris-Sud))Keller Hall 3-180 SW1.3-5.11
2:00pm-2:05pm Closing remarksKeller Hall 3-180 SW1.3-5.11

Thursday, January 6

10:45am-11:15am Coffee breakLind Hall 400

Friday, January 7

10:45am-11:15am Coffee breakLind Hall 400

Sunday, January 9

All Day Eric Darve (Stanford University) – Chair morning session and the opening of the afternoon session.
David E. Keyes (KAUST/Columbia University) will preside over at the end of Cris Cecka's afternoon session.
T1.9.11
9:00am-9:30am Registration and coffee Lind Hall 400 T1.9.11
9:30am-10:30am Lecture 1: Implications of the exascale roadmap for algorithmsDavid E. Keyes King Abdullah University of Science & Technology, Columbia UniversityLind Hall 305 T1.9.11
10:30am-11:00am BreakLind Hall 400 T1.9.11
11:00am-12:00pm Lecture 2: Implications of the exascale roadmap for algorithmsDavid E. Keyes King Abdullah University of Science & Technology, Columbia UniversityLind Hall 305 T1.9.11
12:00pm-1:30pm Lunch T1.9.11
1:30pm-2:30pm Lecture 1: Scientific Computing using Graphics ProcessorsCris Cecka (Stanford University)Lind Hall 305 T1.9.11
2:30pm-3:30pm Lecture 2: Scientific Computing with Graphics ProcessorsCris Cecka (Stanford University)Lind Hall 305 T1.9.11
3:30pm-4:00pm BreakLind Hall 400 T1.9.11
4:00pm-4:30pm Lecture 3: Introduction to heterogeneous computing with GPUsCris Cecka (Stanford University)Lind Hall 305 T1.9.11

Monday, January 10

All Day Morning Chair: David E. Keyes (King Abdullah University of Science & Technology/ Columbia University)
Afternoon Chair: Lorena A. Barba (Boston University)
W1.10-14.11
8:15am-9:15am Registration and coffee Keller Hall 3-176 W1.10-14.11
9:15am-9:30am Welcome to IMAFadil Santosa (University of Minnesota)Keller Hall 3-180 W1.10-14.11
9:30am-10:30am The Exascale: Why and HowDavid E. Keyes King Abdullah University of Science & Technology, Columbia UniversityKeller Hall 3-180 W1.10-14.11
10:30am-11:00am BreakKeller Hall 3-176 W1.10-14.11
11:00am-12:00pm Architecture-aware Algorithms and Software for Scalable Performance and Resilience on Heterogeneous ArchitecturesJack J. Dongarra (University of Tennessee)Keller Hall 3-180 W1.10-14.11
12:00pm-2:00pm Lunch W1.10-14.11
2:00pm-3:00pm Everyday Parallelism Robert Strzodka (Max-Planck-Institut für Informatik)Keller Hall 3-180 W1.10-14.11
3:00pm-4:00pm The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs or How to Avoid the Heroics of GPU ProgrammingMiriam Leeser (Northeastern University)Keller Hall 3-180 W1.10-14.11
4:00pm-4:30pm BreakKeller Hall 3-176 W1.10-14.11
4:30pm-5:30pm GPU programming from higher level representationsMatthew Gregg Knepley (University of Chicago)Keller Hall 3-180 W1.10-14.11

Tuesday, January 11

All Day Chair: Matthew Gregg Knepley (University of Chicago) W1.10-14.11
8:00am-8:30am CoffeeKeller Hall 3-176 W1.10-14.11
8:30am-9:30am Thinking parallel: sparse iterative solvers with CUDAJonathan M. Cohen (NVIDIA Corporation)Keller Hall 3-180 W1.10-14.11
9:30am-10:30am A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern MicroarchitecturesOlaf Schenk (Universität Basel)Keller Hall 3-180 W1.10-14.11
10:30am-11:00am BreakKeller Hall 3-176 W1.10-14.11
11:00am-12:00pm Large Scale Frictional Contact Dynamics on the GPUDan Negrut (University of Wisconsin-Madison)Keller Hall 3-180 W1.10-14.11
12:00pm-2:00pm Lunch W1.10-14.11
2:00pm-4:00pm Group photo and DiscussionKeller Hall 3-180 W1.10-14.11
4:00pm-4:30pm BreakKeller Hall 3-176 W1.10-14.11
4:30pm-6:00pm Reception and Poster Session
Poster submissions welcome from all participants
Instructions
Lind Hall 400 W1.10-14.11
Algorithms for Lattice Field Theory at Extreme ScalesRichard C. Brower (Boston University)
Medical Imaging on the GPU Using OpenCL: 3D Surface Extraction and 3D Ultrasound ReconstructionAnne C. Elster (Norwegian University of Science and Technology (NTNU))
Development of a new massively parallel tool for nonlinear free surface wave simulationAllan Peter Engsig-Karup (Technical University of Denmark)
Development of Desktop Computing Applications and Engineering Tools on GPUsAllan Peter Engsig-Karup (Technical University of Denmark)
A Domain Decomposition Method that Converges in Two Iterations for any Subdomain Decomposition and PDEMartin J. Gander (Universite de Geneve)
Efficient Uncertainty Quantification using GPUs Gaurav Gaurav (University of Minnesota)
Brain Perfusion: Multi-scale Simulations and Visualization Leopold Grinberg (Brown University)
Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers Dominik Göddeke (Universität Dortmund)
Robert Strzodka (Max-Planck-Institut für Informatik)
The Build to Order Compiler for Matrix Algebra Optimization Elizabeth R. Jessup (University of Colorado)
Global symbolic manipulations and code generation for Finite Elements on SIM[DT] hardwareHugo Leclerc (École Normale Supérieure de Cachan)
GPU Acceleration in a Modern Problem Solving Environment: SCIRun's Linear System SolversMiriam Leeser (Northeastern University)
Hyperspectral Image Analysis for Abundance Estimation using GPUsNayda G. Santiago (University of Puerto Rico)
A GPU-accelerated Boundary Element Method and Vortex Particle MethodMark J. Stock (Applied Scientific Research)
Locally-Self-Consistent Multiple-Scattering code (LSMS) for GPUsKeita Teranishi (CRAY Inc)
Digital rocks physics: fluid flow in rocksJonas Tölke (Ingrain)
Preparing Algebraic Multigrid for ExascaleUlrike Meier Yang (Lawrence Livermore National Laboratory)
Fast Multipole Methods on large cluster of GPUs Rio Yokota (Boston University)

Wednesday, January 12

All Day Morning Chair: Jonathan M. Cohen (NVIDIA Corporation)
Afternoon Chair: Susanne C. Brenner (Louisiana State University)
W1.10-14.11
8:00am-8:30am CoffeeKeller Hall 3-176 W1.10-14.11
8:30am-9:30am Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time ElastodynamicsCris Cecka (Stanford University)Keller Hall 3-180 W1.10-14.11
9:30am-10:30am Lattice Boltzmann Multi-Phase Simulations in Porous Media using GPUsJonas Tölke (Ingrain)Keller Hall 3-180 W1.10-14.11
10:30am-11:00am BreakKeller Hall 3-176 W1.10-14.11
11:00am-12:00pm The basis and perspectives of an exascale algorithm: our ExaFMM projectLorena A. Barba (Boston University)Keller Hall 3-180 W1.10-14.11
12:00pm-2:00pm Lunch W1.10-14.11
2:00pm-3:00pm Discussion [note room assignment] W1.10-14.11
Reproducible results and open source code
Keller Hall 3-180
Lorena A. Barba (Boston University)
Mixed precision computing
Lind Hall 401
Richard C. Brower (Boston University)
Exascale programming models
Lind Hall 409
Michael A. Heroux (Sandia National Laboratories)
3:00pm-4:00pm High-order DG Wave Propagation on GPUs: Infrastructure, Implementation, Method ImprovementsAndreas Klöckner (New York University)Keller Hall 3-180 W1.10-14.11
6:00pm-8:30pm Workshop social reception at Stub and Herbs
227 Oak St Minneapolis, MN 55414
(612) 379-0555
Stub and Herbs
227 Oak St Minneapolis, MN 55414
(612) 379-0555
W1.10-14.11

Thursday, January 13

All Day Morning Chair: Ulrike Meier Yang (Lawrence Livermore National Laboratory)
Afternoon Chair: Mike Giles (University of Oxford)
W1.10-14.11
8:00am-8:30am CoffeeKeller Hall 3-176 W1.10-14.11
8:30am-9:30am Algorithms and Tools for Bioinformatics on GPUsBertil Schmidt (Nanyang Technological University)Keller Hall 3-180 W1.10-14.11
9:30am-10:30am Algorithmic Fluid Art – Influences, Process, and Works Mark J. Stock (Applied Scientific Research)Keller Hall 3-180 W1.10-14.11
10:30am-11:00am BreakKeller Hall 3-176 W1.10-14.11
11:00am-12:00pm OP2: an open-source library for unstructured grid applicationsMike Giles (University of Oxford)Keller Hall 3-180 W1.10-14.11
12:00pm-2:00pm Lunch W1.10-14.11
2:00pm-3:00pm Ultraparallel solvers for multi-scale brain blood flow simulations on exascale computersLeopold Grinberg (Brown University)Keller Hall 3-180 W1.10-14.11
3:00pm-4:00pm Clouds MapReduce and HPCGeoffrey Charles Fox (Indiana University)Keller Hall 3-180 W1.10-14.11

Friday, January 14

All Day Chair: Mark J. Stock (Applied Scientific Research) W1.10-14.11
8:00am-8:30am CoffeeKeller Hall 3-176 W1.10-14.11
8:30am-9:30am I See GPU Shapes in the CloudsDavid Mayhew (Advanced Micro Devices)Keller Hall 3-180 W1.10-14.11
9:30am-10:30am Real-Time Medical and Geological Processing on GPU-based Systemss: Experiences and ChallengesAnne C. Elster (Norwegian University of Science and Technology (NTNU))Keller Hall 3-180 W1.10-14.11
10:30am-11:00am BreakKeller Hall 3-176 W1.10-14.11
11:00am-12:00pm Emerging Programming and Machine Models: Opportunities for Numerical Algorithms R&DMichael A. Heroux (Sandia National Laboratories)Keller Hall 3-180 W1.10-14.11
12:00pm-12:05pm Closing remarksKeller Hall 3-180 W1.10-14.11

Monday, January 17

All Day Martin Luther King, Jr. Day. The IMA is closed.

Tuesday, January 18

10:45am-11:15am Coffee breakLind Hall 400

Wednesday, January 19

10:45am-11:15am Coffee breakLind Hall 400

Thursday, January 20

10:45am-11:15am Coffee breakLind Hall 400

Friday, January 21

10:45am-11:15am Coffee breakLind Hall 400

Monday, January 24

10:45am-11:15am Coffee breakLind Hall 400

Tuesday, January 25

10:45am-11:15am Coffee breakLind Hall 400
11:15am-12:15pm On dispersive effect of the Coriolis force for the stationary Navier-Stokes equationsPawel Konieczny (University of Minnesota)Lind Hall 305 PS

Wednesday, January 26

10:45am-11:15am Coffee breakLind Hall 400
2:30pm-3:30pm Math 8994: Discontinuous Galerkin methods: An introduction - The original method: linear scalar transportBernardo Cockburn (University of Minnesota)Lind Hall 305

Thursday, January 27

10:45am-11:15am Coffee breakLind Hall 400

Friday, January 28

10:45am-11:15am Coffee breakLind Hall 400

Monday, January 31

10:45am-11:15am Coffee breakLind Hall 400
1:30pm-2:30pm Tutorial Lectures: Modeling Hurricane Storm Surges - Lecture 1: Introduction to the shallow water equationsClint Dawson (University of Texas at Austin)Lind Hall 305
Abstracts
Lorena A. Barba (Boston University) The basis and perspectives of an exascale algorithm: our ExaFMM project
Abstract: Linearly scaling algorithms will be crucial for the problem sizes that will be tackled in capability exascale systems. It is interesting to note that many of the most successful algorithms are hierarchical in nature, such as multi-grid methods and fast multipole methods (FMM). We have been leading development efforts for open-source FMM software for some time, and recently produced GPU implementations of the various computational kernels involved in the FMM algorithm. Most recently, we have produced a multi-GPU code, and performed scalability studies showing high parallel efficiency in strong scaling. These results have pointed to several features of the FMM that make it a particularly favorable algorithm for the emerging heterogeneous, many-core architectural landscape. We propose that the FMM algorithm offers exceptional opportunities to enable exascale applications. Among its exascale-suitable features are: (i) it has intrinsic geometric locality, and access patterns are made local via particle indexing techniques; (ii) we can achieve temporal locality via an efficient queuing of GPU tasks before execution, and at a fine level by means of memory coalescing based on the natural index-sorting techniques; (iii) global data communication and synchronization, often a significant impediment to scalability, is a soft barrier for the FMM, where the most time-consuming kernels are, respectively, purely local (particle-to-particle interactions) and "hierarchically synchronized" (multipole-to-local interactions, which happen simultaneously at every level of the tree). In addition, we suggest a strategy for achieving the best algorithmic performance, based on two key ideas: (i) hybridize the FMM with treecode by choosing on-the-fly between particle-particle, particle-box, and box-box interactions, according to a work estimate; (ii) apply a dynamic error-control technique, effected on the treecode by means of a variable "box-opening angle" and on the FMM by means of a variable order of the multipole expansion. We have carried out preliminary implementation of these ideas/techniques, achieving a 14x speed-up with respect to our current published version of the FMM. Considering that this effort was only exploratory, we are certain to possess the potential for unprecedented performance with these algorithms.
Lorena A. Barba (Boston University) Reproducible results and open source code
Keller Hall 3-180
Abstract: No Abstract
Alexander A. Beilinson (University of Chicago) P-adic periods and derived de Rham cohomology
Abstract: I will show that Fontaine's ring of p-adic periods can be realized as the ring of universal p-adic constants in the sense of derived algebraic geometry, and discuss a possible new construction of the p-adic periods map.
Manjul Bhargava (Princeton University) The average rank of elliptic curves
Abstract: No Abstract
Richard C. Brower (Boston University) Algorithms for Lattice Field Theory at Extreme Scales
Abstract: Increases in computational power allow lattice field theories to resolve smaller scales, but to realize the full benefit for scientific discovery, new multi-scale algorithms must be developed to maximize efficiency. Examples of new trends in algorithms include adaptive multigrid solvers for the quark propagator and an improved symplectic Force Gradient integrator for the Hamiltonian evolution used to include the quark contribution to vacuum fluctuations in the quantum path integral. Future challenges to algorithms and software infrastructure targeting many-core GPU accelerators and heterogeneous extreme scale computing are discussed.
Richard C. Brower (Boston University) Mixed precision computing
Lind Hall 401
Abstract: No Abstract
Cris Cecka (Stanford University) Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics
Abstract: We discuss multiple strategies to perform general computations on unstructured grids using a GPU, with specific application to the assembly of systems of equations in finite element methods (FEMs). For each method, we discuss the GPU hardware's limiting resources, optimizations, key data structures, and dependence of the performance with respect to problem size, element size, and GPU hardware generation. These methods are applied to a nonlinear hyperelastic material model to develop a large-scale real-time interactive elastodynamic visualization. By performing the assembly, solution, update, and visualization stages solely on the GPU, the similuation benefits from speed-ups in each stage and avoids costly GPU-CPU transfers of data.
Cris Cecka (Stanford University) Lecture 1: Scientific Computing using Graphics Processors
Abstract: In this short course, we introduce the GPU as a coprocessor for scientific computing. The course will review modern hardware, CUDA programming, algorithm design, and optimization considerations for this unique compute environment. Introductory example codes and slides will be available to aid attendees in using GPUs to accelerate their applications.
Cris Cecka (Stanford University) Lecture 3: Introduction to heterogeneous computing with GPUs
Abstract: see abstract for Lecture 1
Cris Cecka (Stanford University) Lecture 2: Scientific Computing with Graphics Processors
Abstract: see abstract for Lecture 1
Jonathan M. Cohen (NVIDIA Corporation) Thinking parallel: sparse iterative solvers with CUDA
Abstract: Iterative sparse linear solvers are a critical component of a scientific computing platform.  Developing effective preconditioning strategies is the main challenge in developing iterative sparse solvers on massively parallel systems. As computing systems become increasingly power-constrained, memory hierarchies for massively parallel systems will become deeper and  more hierarchical.  Parallel algorithms with all-to-all communication patterns that assume uniform memory access times will be inefficient on these systems.  In this talk, I will outline the challenges of developing good parallel preconditioners, and demonstrate that domain decomposition methods have communication patterns that match emerging parallel platforms.  I will present recent work to develop restricted additive Schwarz (RAS) preconditioners as part of the open source 'cusp' library of sparse parallel algorithms.  On 2d Poisson problems, a RAS preconditioner is consistently faster than diagonal preconditioning in time-to-solution.  Detailed analysis demonstrates that the communication pattern of RAS matches the on-chip bandwidths of a Fermi GPU.  Line smoothing, which requires solving a large number of small tridiagonal linears systems in local memory, is another preconditioning approach with similar communication patterns.  I will conclude with a roadmap for devoping a range of preconditioners, smoothers, and linear solvers on massively parallel hardware based on the domain decomposition and line smoothing approaches.
Clint Dawson (University of Texas at Austin) Tutorial Lectures: Modeling Hurricane Storm Surges - Lecture 1: Introduction to the shallow water equations
Abstract: An overview of the two-dimensional, depth-averaged shallow water equations. I will give underlying assumptions, derivation from the Navier-Stokes equations, and discuss the relevant forcing terms, including tides, wind and atmospheric pressure, gravity, and bottom friction.
Jack J. Dongarra (University of Tennessee) Architecture-aware Algorithms and Software for Scalable Performance and Resilience on Heterogeneous Architectures
Abstract: In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software.  Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder.  We will look at five areas of research that will have an importance impact in the development of software and algorithms.   We will focus on following themes:
  • Redesign of software to fit multicore architectures
  • Automatically tuned application software
  • Exploiting mixed precision for performance
  • The importance of fault tolerance
  • Communication avoiding algorithms
Anne C. Elster (Norwegian University of Science and Technology (NTNU)) Real-Time Medical and Geological Processing on GPU-based Systemss: Experiences and Challenges
Abstract: GPUs are now massive floating-point stream processors that offer a source of energy-efficient compute power on our laptops and desktops. Recent development of tools such as CUDA and OpenCL have made it much easier to utilize the computational power these systems offer. However, in order to optimally harness the the power of these GPU-based systems, there still are many challenges to overcome. In this talk, several issues related to our experiences with medical and geological processing applications that can benefit from real-time processing of data on GPUs, will be discussed. These include real-time medical imaging, e.g. for ultrasound-guided discovery and surgery, real-time seismic CT image enhancement, and using GPUs for real-time compression of seismic data in order to lower I/O latency. This talk will highlight work our research group has been involved dating back from 2006 through today.
Anne C. Elster (Norwegian University of Science and Technology (NTNU)) Medical Imaging on the GPU Using OpenCL: 3D Surface Extraction and 3D Ultrasound Reconstruction
Abstract: Collaborators: Frank Linseth, Holger Ludvigsen, Erik Smistad and Thor Kristian ValgerhaugGPUs offer a lot of compute power enabling real-time processing of images. This poster depict som our of group's recent work on image processing for medical applications on GPUs including 3D surface extraction using marching cubes and 3D ultrasound reconstruction. We have previously developed Cg and CUDA codes for wavelet transforms and CUDA codes for surface extraction for seismic images.
Allan Peter Engsig-Karup (Technical University of Denmark) Development of a new massively parallel tool for nonlinear free surface wave simulation
Abstract: The research objective of this work is to develop a new dedicated and massively parallel tool for efficient simulation of unsteady nonlinear free surface waves. The tool will be used for applications in coastal and offshore engineering, e.g. in connection with prediction of wave kinematics and forces at or near human-made structures. The tool is based on a unified potential flow formulation which can account for fully nonlinear and dispersive wave motion over uneven depths under the assumptions of nonbreaking waves, irrotational and inviscid flow. This work is a continuation of earlier work and will continue to contribute to advancing state-of-the-art for efficient wave simulation. The tool is expected to be orders of magnitude faster than current tools due to efficient algorithms and utilization of available hardware resources.
Allan Peter Engsig-Karup (Technical University of Denmark) Development of Desktop Computing Applications and Engineering Tools on GPUs
Abstract: GPULab - A competence center and laboratory for research and collaboration within academia and partners in industry has been established in 2008 at section for Scientific Computing, DTU informatics, Technical University of Denmark. In GPULab we focus on the utilization of Graphics Processing Units (GPUs) for high-performance computing applications and software tools in science and engineering, inverse problems, visualization, imaging, dynamic optimization. The goals are to contribute to the development of new state-of-the-art mathematical models and algorithms for maximum throughout performance, improved performance profiling tools and assimilation of results to academic and industrial partners in our network. Our approaches calls for multi-disciplinary skills and understanding of hardware, software development, profiling tools and tuning techniques, analytical methods for analysis and development of new approaches, together with expert knowledge in specific application areas within science and engineering. We anticipate that our research in a near future will bring new algorithms and insight in engineering and science applications targeting practical engineering problems.
Jean-Marc Fontaine (Université de Paris XI (Paris-Sud)) Vector bundles and p-adic Galois representations
Abstract: Let $F$ be a perfect field of characteristic $p>0$ equipped with a non trivial absolute value, $E$ a non archimedean locally compact field whose residue field is contained in $F$ and $pi$ a uniformizing parameter of $E$. We associate functorially to these datas a separated integral noetherian regular scheme $X=X_{F,E,pi}$ of dimension $1$ defined over $E$. There is an equivalence of categories between semi-stable vector bundles of slope $0$ over $X$ and continuous $E$-linear representations of the absolute Galois group $H_F$ of $F$.When $F$ is algebraically closed, the closed points of $F$ can be described in terms of the Lubin-Tate formal group of $E$ corresponding to $pi$.If $C$ is the $p$-adic completion of $overline Q_p$, one can associate to $C$ an algebraically closed field $F=F(C)$ as above and ${rm Gal)(overlineQ_p/Q_p)$ acts on the curve $X=X_{F(C),Q_p,p}$. The two main results of $p$-adic Hodge theory can be recovered from the classification of vector bundles over $X$.(joint work with Laurent Fargues)Read more at http://www.math.u-psud.fr/~fargues/Prepublications.html.
Geoffrey Charles Fox (Indiana University) Clouds MapReduce and HPC
Abstract: 1) We analyze the different tradeoffs and goals of Grid, Cloud and parallel (cluster/supercomputer) computing. 2) They tradeoff performance, fault tolerance, ease of use (elasticity), cost, interoperability. 3) Different application classes (characteristics) fit different architectures and we describe a hybrid model with Grids for data, traditional supercomputers for large scale simulations and clouds for broad based "capacity computing" including many data intensive problems. 4) We discuss the impressive features of cloud computing platforms and compare MapReduce and MPI. 5) We take most of our examples from the life science area. 6) We conclude with a description of FutureGrid -- a TeraGrid system for prototyping new middleware and applications.
Martin J. Gander (Universite de Geneve) A Domain Decomposition Method that Converges in Two Iterations for any Subdomain Decomposition and PDE
Abstract: Joint work with Felix Kwok.All domain decomposition methods are based on a decomposition of the physical domain into many subdomains and an iteration, which uses subdomain solutions only (and maybe a coarse grid), in order to compute an approximate solution of the problem on the entire domain. We show in this poster that it is possible to formulate such an iteration, only based on subdomain solutions, which converges in two steps to the solution of the underlying problem, independently of the number of subdomains and the PDE solved. This method is mainly of theoretical interest, since it contains sophisticated non-local operators (and a natural coarse grid component), which need to be approximated in order to obtain a practical method.
Gaurav Gaurav (University of Minnesota) Efficient Uncertainty Quantification using GPUs
Abstract: Joint work with Steven F. Wojtkiewicz (Department of Civil Engineering, University of Minnesota, Minneapolis, MN 55414, USA. bykvich@umn.edu).Graphics processing units (GPUs) have emerged as a much economical and a highly competitive alternative to CPU-based parallel computing. Recent studies have shown that GPUs consistently outperform their best corresponding CPU-based parallel computing equivalents by up to two orders of magnitude in certain applications. Moreover, the portability of the GPUs enables even a desktop computer to provide a teraflop (1012 floating point operations per second) of computing power. This study presents the gains in computational efficiency obtained using the GPU-based implementations of five types of algorithms frequently used in uncertainty quantification problems arising in the analysis of dynamical systems with uncertain parameters and/or inputs.
Mike Giles (University of Oxford) OP2: an open-source library for unstructured grid applications
Abstract: Based on an MPI library written over 10 years ago, OP2 is a new open-source library which is aimed at application developers using unstructured grids. Using a single API, it targets a variety of backend architectures, including both manycore GPUs and multicore CPUs with vector units. The talk will cover the API design, key aspects of the parallel implementation on the different platforms, and preliminary performance results on a small but representative CFD test code.
Leopold Grinberg (Brown University) Ultraparallel solvers for multi-scale brain blood flow simulations on exascale computers
Abstract: Solvers for coupled multi-scale (multi-physics) may be constructed by coupling an array of existing and well tested parallel numerical solvers, each designed to tackle a problem at different spatial and temporal scale. Each solver can be optimized/designed for different computer architecture. Future supercomputers may be composed of heterogeneous processing units, i.e., CPU/GPU. To make an efficient use of computational recourses, the coupled solvers must support topology-aware mapping of tasks to the processing units were the best parallel efficiency could be achieved.Arterial blood circulation is a multi-scale process where time and space scales range from nanoseconds (nanometers) to seconds (meters), reciprocally. The macro-vascular scales describing the flow dynamics in larger vessels are coupled to the meso-vascular scales unfolding dynamics of individual blood cells. The meso- vascular events are coupled to the micro-vascular ones accounting for blood perfusion, clot formation, adhesion of the blood cells to the arterial walls, etc. Besides the multi-scale nature of the problem, its size often presents a substantial computational challenge even for simulations considering a single scale.In this talk we will try to envision the design of a multi-scale solver for blood flow simulations, tailored to heterogeneous computer architecture.
Leopold Grinberg (Brown University) Brain Perfusion: Multi-scale Simulations and Visualization
Abstract: Joint work with J. Insley, M. Papka, and G. E. Karniadakis.Interactions of blood flow in the human brain occur between different scales, determined by flow features in the large arteries (above 0.5mm diameter), arterioles, and the capillaries (of 5E-3 mm). To simulate such multi-scale flow we develop mathematical models, numerical methods, scalable solvers and visualization tools. Our poster will present NektarG - a research code developed at Brown University for continuum and atomistic simulations. NektarG is based on a high-order spectral/hp element discretization featuring multi-patch domain decomposition for continuum flow simulations, and modified DPD-LAMMPS for mesoscopic simulations. The continuum and atomistic solvers are coupled via Multi-level Communicating Interface to exchange data required by interface conditions. The visualization software is based on ParaView and NektarG utilities accessed through the ParaView GUI. The new visualization software allows to simultaneously present data computed in coupled (multi-scale) simulations. The software automatically synchronizes the display of time evolution of solutions at multiple scales.
Dominik Göddeke (Universität Dortmund), Robert Strzodka (Max-Planck-Institut für Informatik) Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers
Abstract: We present efficient fine-grained parallelization techniques for robust multigrid solvers and Krylov subspace schemes, in particular for numerically strong smoothing and preconditioning operators. We apply them to sparse ill-conditioned linear systems of equations that arise from grid-based discretization techniques like finite differences, volumes and elements; the systems are notoriously hard to solve due to severe anisotropies in the underlying mesh and differential operator. These strong smoothers are characterized by sequential data dependencies, and do not parallelize in a straightforward manner. For linewise preconditioners, exact parallel algorithms exist, and we present a novel, efficient implementation of a cyclic reduction tridiagonal solver. For other preconditioners, traditional wavefront techniques can be applied, but their irregular and limited parallelism makes them a bad match for GPUs. Therefore, we discuss multicoloring techniques to recover parallelism in these preconditioners, by decoupling some of the dependencies at the expense of at first reduced numerical performance. However, by carefully balancing the coupling strength (more colors) with the parallelization benefits, the multicolored variants retain almost all of the sequential numerical performance. Further improvements are achieved by merging the tridiagonal and Gauß-Seidel approach into a smoothing operator that combines their advantages, and by employing an alternating direction implicit scheme to gain independence of the numbering of the unknowns. Due to their advantageous numerical properties, multigrid solvers equipped with strong smoothers are between four and eight times more efficient than with simple Gauß-Seidel preconditioners, and we achieve speedups factors between six and 18 with the GPU implementations over carefully tuned CPU variants.
Michael A. Heroux (Sandia National Laboratories) Emerging Programming and Machine Models: Opportunities for Numerical Algorithms R&D
Abstract: After 15-20 years of architectural stability, we are in the midst of a dramatic change in high performance computing systems design. In this talk we discuss the commonalities across the viable systems of today, and look at opportunities for numerical algorithms research and development. In particular, we explore possible programming and machine abstractions and how we can develop effective algorithms based on these abstractions, addressing, among other things, robustness issues for preconditioned iterative methods and resilience of algorithms in the presence of soft errors.
Michael A. Heroux (Sandia National Laboratories) Exascale programming models
Lind Hall 409
Abstract: No Abstract
Elizabeth R. Jessup (University of Colorado) The Build to Order Compiler for Matrix Algebra Optimization
Abstract: The performance of many high performance computing applications is limited by data movement from memory to the processor. Often their cost is more accurately expressed in terms of memory traffic rather than floating-point operations and, to improve performance, data movement must be reduced. One technique to reduce memory traffic is the fusion of loops that access the same data. We have built the Build to Order (BTO) compiler to automate the fusion of loops in matrix algebra kernels. Loop fusion often produces speedups proportional to the reduction in memory traffic, but it can also lead to negative effects in cache and register use. We present the results of experiments with BTO that help us to understand the workings of loop fusion.
David E. Keyes (Columbia University) The Exascale: Why and How
Abstract: Sustained floating-point computation rates on real applications, as tracked by the ACM Gordon Bell Prize, increased by three orders of magnitude from 1988 (1 Gigaflop/s) to 1998 (1 Teraflop/s), and by another three orders of magnitude to 2008 (1 Petaflop/s). Computer engineering provided only a couple of orders of magnitude of improvement for individual cores over that period; the remaining factor came from concurrency, which is approaching one million-fold.Algorithmic improvements contributed meanwhile to making each flop more valuable scientifically. As the semiconductor industry now slips relative to its own roadmap for silicon-based logic and memory, concurrency, especially on-chip many-core concurrency and GPGPU SIMD-type concurrency, will play an increasing role in the next few orders of magnitude, to arrive at the ambitious target of 1 Exaflop/s, extrapolated for 2018. An important question is whether today's best algorithms are efficiently hosted on such hardware and how much co-design of algorithms and architecture will be required.From the applications perspective, we illustrate eight reasons why today's computational scientists have an insatiable appetite for such performance: resolution, fidelity, dimension, artificial boundaries, parameter inversion, optimal control, uncertainty quantification, and the statistics of ensembles.The paths to the exascale summit are debated, but all are narrow and treacherous, constrained by fundamental laws of physics, cost, power consumption, programmability, and reliability. Drawing on recent reports, workshops, vendor projections, and experiences with scientific codes on contemporary platforms, we propose roles for today's researchers in one of the great global scientific quests of the next decade.
David E. Keyes (Columbia University) Lecture 1: Implications of the exascale roadmap for algorithms
Abstract: The central challenge in progressing from petascale to exascale supercomputing is the same as that in progressing from gigascale to terascale personal computing: strong scaling within shared memory on a single node of up to 1K simultaneously active computational threads. Many issues in algorithmic design and implementation are identical in these two simultaneous quests; however, the exascale quest has additional challenges due to practical limits on total power consumption (which come at the expense of resilience and node performance uniformity), to system-scale reliability (due to more points of failure), and to the need to merge the on-node programming environment with a million others (a weak scaling that is not in itself difficult, but will lead to challenges of coordination). This lecture series presents the issues, as digested from recent US Department of Energy roadmapping exercises, and focuses attention on some new issues that require mathematical attention. It is intended to provide those new to exascale computing with a working background for the week ahead, and motivation for the GPU scientific programming unit of the tutorial.
David E. Keyes (Columbia University) Lecture 2: Implications of the exascale roadmap for algorithms
Abstract: see abstract for Lecture 1
Mark Kisin (Harvard University) Points on Shimura varieties mod p
Abstract: I will explain some results towards the Langlands-Rapoport conjecture which predicts the structure of the mod p points of a Shimura variety. A consequence of the conjecture is that the isogeny class of every mod p point contains a point which admits a lifting to a special (ie CM) point of the Shimura variety. One of the roots of the subject is the work of John Tate on CM liftings and endomorphisms of abelian varieties mod p.
Andreas Klöckner (New York University) High-order DG Wave Propagation on GPUs: Infrastructure, Implementation, Method Improvements
Abstract: Having recently shown that high-order unstructured discontinuous Galerkin (DG) methods are a discretization method for systems of hyperbolic conservation laws that is well-matched to execution on GPUs, in this talk I will explore both core and supporting components of high-order DG solvers for their suitability for and performance on modern, massively parallel architectures. Components examined range from software components facilitating implementation to strategies for automated tuning and, time permitting, numerical tweaks to the method itself. In concluding, I will present a selection of further design considerations and performance data.
Matthew Gregg Knepley (University of Chicago) GPU programming from higher level representations
Abstract: We discuss the construction and execution of GPU kernels from higher level specifications. Examples will be shown using low-order finite elements and fast multipole method.
Pawel Konieczny (University of Minnesota) On dispersive effect of the Coriolis force for the stationary Navier-Stokes equations
Abstract: The dispersive effect of the Coriolis force for the stationary and nonstationary Navier-Stokes equations is investigated. Existence of a unique stationary solution is shown for arbitrary large external force provided the Coriolis force is large enough. In addition to the stationary case, counterparts of several classical results for the non-stationary Navier-Stokes problem have been proven. The analysis is carried out in a new framework of the Fourier-Besov spaces.
Hugo Leclerc (École Normale Supérieure de Cachan) Global symbolic manipulations and code generation for Finite Elements on SIM[DT] hardware
Abstract: Tools have been developed to generate code to solve partial differential equations from high level descriptions (manipulation of files, global operators, ...). The successive symbolic transformations lead to a macroscopic description of the code to be executed, which can thus be translated into x86 (SSEx), C++ or cuda code. The point emphasized here is that the different processes can be adapted to the target hardware, taking into account the ratio gflops / gbps (making e.g. the choice between re-computations or cache), the SIM[DT] abilities, ... The poster will present the gains (compared to classical CPU/GPU implementations) for two implementation of a 3D unstructured FEM solver,using respectively a conjugate gradient and a domain decomposition method with repetitive patterns.
Miriam Leeser (Northeastern University) The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs or How to Avoid the Heroics of GPU Programming
Abstract: We live in the age of heroic programming for scientific applications on Graphics Processing Units (GPUs).  Typically a scientist chooses an application to accelerate and a target platform, and through great effort maps their application to that platform.   If they are a true hero, they achieve two or three orders of magnitude speedup for that application and target hardware pair.  The effort required includes a deep understanding of the application,  its implementation and the target architecture.  When a new, higher performance architecture becomes available additional heroic acts are required.  There is another group of scientists who prefer to spend their time focused on the application level rather than lower levels.  These scientists would like to use GPUs for their applications, but would prefer to have parameterized library components available that deliver high performance without requiring heroic efforts on their part.  The library components should be easy to use and should support a wide range of user input parameters.  They should exhibit good performance on a range of different GPU platforms, including future architectures.  Our research focuses on creating such libraries.  We have been investigating parameterized library components for use with Matlab/Simulink and with the SCIRun Biomedical Problem Solving Environment from the University of Utah.  In this talk I will discuss our library development efforts and challenges to achieving high performance across a range of both application and architectural parameters. I will also focus on issues that arise in achieving correct behavior of GPU kernels.  One issue is  correct behavior with respect to thread synchronization.  Another is knowing whether or not your scientific application that uses floating point is correct when the results differ depending on the target architecture and order of computation.  
Miriam Leeser (Northeastern University) GPU Acceleration in a Modern Problem Solving Environment: SCIRun's Linear System Solvers
Abstract: This research demonstrates the incorporation of GPU's parallel processing architecture into the SCIRun biomedical problem solving environment with minimal changes to the environment or user experience. SCIRun, developed at the University of Utah, allows scientists to interactively construct many different types of biomedical simulations. We use this environment to demonstrate the effectiveness of the GPU by accelerating time consuming algorithms present in these simulations. Specifically, we target the linear solver module, which contains multiple solvers that benefit from GPU hardware. We have created a class to accelerate the conjugate gradient, Jacobi and minimal residual linear solvers; the results demonstrate that the GPU can provide acceleration in this environment. A principal focus was to remain transparent by retaining the user friendly experience to the scientist using SCIRun's graphical user interface. NVIDIA's CUDA C language is used to enable performance on NVIDIA GPUs. Challenges include manipulating the sparse data processed by these algorithms and communicating with the SCIRun interface amidst computation. Our solution makes it possible to implement GPU versions of the existing SCIRun algorithms easily and can be applied to other parallel algorithms in the application. The GPU executes the matrix and vector arithmetic to achieve acceleration performance of up to 16x on the algorithms in comparison to SCIRun's existing multithreaded CPU implementation. The source code will contain single and double precision versions to utilize a wide variety of GPU hardware and will be incorporated and publicly available in future versions of SCIRun.
David Mayhew (Advanced Micro Devices) I See GPU Shapes in the Clouds
Abstract: Fusion (the integration of CPU and GPU into a single processing entity) is here. Cloud based software services are here. Large processing clusters are running massively parallel Hadoop programs now. Can large-scale, commercial, enterprise, server solutions be dynamically repurposed to run HPC problem sets? The future of HPC may well be a massive set of virtual machines running in "curve of the earth" sized data centers. The cost of HPC processing sponges (HPC problem sets that consume otherwise wasted processing cycles in scale-out server clusters) will probably make all but the most extreme purpose-built HPC systems obsolete.
Dan Negrut (University of Wisconsin-Madison) Large Scale Frictional Contact Dynamics on the GPU
Abstract: This talk summarizes an effort at the Modeling, Simulation and Visualization Center at the University of Wisconsin-Madison to model and simulate large scale discrete dynamics problems. This effort is motivation by a desire to address unsolved challenges posed by granular dynamics problems, mobility of tracked and wheeled vehicle on granular terrain, and digging into granular material, to name a few. In the context of simulating the dynamics of large systems of interacting rigid bodies, we briefly outline a method for solving large cone complementarity problems by means of a fixed-point iteration algorithm. The method is an extension of the Gauss-Jacobi algorithms with over-relaxation for symmetric convex complementarity problems. Convergent under fairly standard assumptions, the method is implemented in a scalable parallel computational framework by using a single instruction multiple data (SIMD) execution paradigm supported by the Compute Unified Device Architecture (CUDA) library for programming on the graphical processing unit (GPU). The simulation framework developed supports the analysis of problems with more than one million rigid bodies that interact through contact and friction forces, and whose dynamics are constrained by either unilateral or bilateral kinematic constraints. Simulation thus becomes a viable tool for investigating in the near future the dynamics of complex systems such as the Mars Rover operating on granular terrain, powder composites, and granular material flow. The talk concludes with a short summary of other applications that stand to benefit from the computational power available on today’s GPUs.
Carl Pomerance (Dartmouth College) Elliptic curves: problems and applications
Abstract: In the past three decades there have been some exciting applications of elliptic curves over finite fields to integer factoring, primality testing, and cryptography. These applications in turn have raised some interesting problems often of an unconventional flavor. For example, how often is the order of an elliptic curve group prime, or how often does it have all small prime factors? In this talk we will visit problems such as these, as well as other analytic-type problems relating to ranks of elliptic curves over function fields and to elliptic divisibility sequences.
Bjorn Poonen (Massachusetts Institute of Technology) Random maximal isotropic subspaces and Selmer groups
Abstract: We show that the p-Selmer group of an elliptic curve is naturally the intersection of two maximal isotropic subspaces in an infinite-dimensional locally compact quadratic space over F_p.  By modeling this intersection as the intersection of a random maximal isotropic subspace with a fixed compact open maximal isotropic subspace, we can explain the known phenomena regarding distribution of Selmer ranks, such as the theorems of Heath-Brown, Swinnerton-Dyer, and Kane for 2-Selmer groups in certain families of quadratic twists, and the average size of 2- and 3-Selmer groups as computed by Bhargava and Shankar.  The only distribution on Mordell-Weil ranks compatible with both our random model and Delaunay's heuristics for Sha[p] is the distribution in which 50% of elliptic curves have rank 0, and 50% have rank 1.  We generalize many of our results to abelian varieties over global fields.  This is joint work with Eric Rains.
Cristian D Popescu (University of California, San Diego) An equivariant main conjecture in Iwasawa theory and applications
Abstract: I will discuss the statement and proof of an Equivariant Main Conjecture (EMC) in the Iwasawa theory of arbitrary global fields. This will be followed by applications of the EMC (via Iwasawa co-descent) towards proving various well known conjectures on special values of global L-functions. In the process, an important role will be played by an explicit construction of ell-adic Tate sequences. This is based on joint work with Cornelius Greither (Munich).
Michel Raynaud (Université de Paris XI (Paris-Sud)) Permanence following Temkin
Abstract: If we specialize algebraic equations having good properties, we usually face degeneracies. Starting with a bad specialization, we can try to improve it , performing modifications under control. If we succeed to get a new specialization with the initial good properties preserved,we get a permanence statement. We shall present examples of permanence with particular interest concerning semi-stable models.
Fernando Rodriguez Villegas (University of Texas at Austin) On the geometry of character varieties
Abstract: We know, thanks to the Weil conjectures, that counting points of varieties over finite fields yields purely topological information about them. In this talk I will first describe how we may count the number of points over finite fields on the character varieties parameterizing certain representations of the fundamental group of a Riemann surface into GL_n. The calculation involves an array of techniques from combinatorics to the representation theory of finite groups of Lie type. I will then discuss the geometric implications of this computation and the conjectures it has led to. This is joint work with T. Hausel and E. Letellier
Karl Rubin (University of California, Irvine) Selmer ranks of elliptic curves in families of quadratic twists
Abstract:  In joint work with Barry Mazur, we investigate the 2-Selmer rank in families of quadratic twists of elliptic curves over arbitrary number fields.  We give sufficient conditions for an elliptic curve to have twists of arbitrary 2-Selmer rank, and we give lower bounds for the number of twists (with bounded conductor) with a given 2-Selmer rank.  As a consequence, under appropriate hypotheses there are many twists with Mordell-Weil rank zero, and (assuming the Shafarevich-Tate conjecture) many others with Mordell-Weil rank one.  Another application of our methods, using ideas of Poonen and Shlapentokh, is that if the Shafarevich-Tate conjecture holds then Hilbert's 10th problem has a negative answer over the ring of integers of any number field.  
Nayda G. Santiago (University of Puerto Rico) Hyperspectral Image Analysis for Abundance Estimation using GPUs
Abstract: Hyperspectral images can be used for abundance estimation and anomaly detection, however, the algorithms involved tend to be I/O intensive. Parallelizing these algorithms can enable their use in real-time applications. A method of overcoming these limitations involves selecting parallelizable algorithms and implementing them using GPUs. GPUs are designed as throughput engines, built to process large amounts of dense data in a parallel fashion. RX's detectors and estimators of abundance will be parallelized and tested for correctness and performance.
Fadil Santosa (University of Minnesota) Welcome to the IMA
Abstract: No Abstract
Olaf Schenk (Universität Basel) A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures
Abstract: Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore crucial in order to reduce the time to solution. However, in the current complex hardware microarchitectures, meticulous architecture-specific tuning is required to elicit the machine's full compute power. We present a code generation and auto-tuning framework PATUS for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the autotuning methodology to optimize strategy-dependent parameters for the given hardware architecture.
Bertil Schmidt (Nanyang Technological University) Algorithms and Tools for Bioinformatics on GPUs
Abstract: The enormous growth of biological sequence data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of parallel accelerator technologies such as GPUs has made it possible to significantly reduce the execution times of many bioinformatics applications. In this talk I will present the design and implementation of scalable GPU algorithms based on the CUDA programming model in order to accelerate important bioinformatics applications. In particular, I will focus on algorithms and tools for next-generation sequencing (NGS) using error correction as an example.Detection and correction of sequencing errors is an important but time-consuming pre-processing step for de-novo genome assembly or read mapping. In this talk, I discuss the parallel algorithm design used for the CUDA-EC and DecGPU tools. I will also give an overview of other CUDA-enabled tools developed by my research group.
Mark J. Stock (Applied Scientific Research) A GPU-accelerated Boundary Element Method and Vortex Particle Method
Abstract: Vortex particle methods, when combined with multipole-accelerated boundary element methods (BEM), become a complete tool for direct numerical simulation (DNS) of internal or external vortex-dominated flows. In previous work, we presented a method to accelerate the vorticity-velocity inversion at the heart of vortex particle methods by performing a multipole treecode N-body method on parallel graphics hardware. The resulting method achieved a 17-fold speedup over a dual-core CPU implementation. In the present work, we will demonstrate both an improved algorithm for the GPU vortex particle method that outperforms an 8-core CPU by a factor of 43, but also a GPU-accelerated multipole treecode method for the boundary element solution. The new BEM solves for the unknown source, dipole, or combined strengths over a triangulated surface using all available CPU cores and GPUs. Problems with up to 1.4 million unknowns can be solved on a single commodity desktop computer in one minute, and at that size the hybrid CPU/GPU outperforms a quad-core CPU alone by 22.5 times. The method is exercised on DNS of impulsively-started flow over spheres at Re=500, 1000, 2000, and 4000.
Mark J. Stock (Applied Scientific Research) Algorithmic Fluid Art – Influences, Process, and Works
Abstract: In addition to my research into vortex particle methods, parallel N-body methods, and GPU programming, I create artwork using these same computer programs. The work consists of imagery and animations of fluid forms and other shapes and patterns in nature. Using relatively simple algorithms reflecting the origins of their underlying processes, many of these patterns can be recreated and their inherent beauty exposed. In this talk, I will discuss the technical aspects of my work, but mainly plan to distract attention with the works themselves.Biography:Mark Stock earned his PhD from Aerospace Engineering at the University of Michigan in 2006, and has been working for Applied Scientific Research in Santa Ana, CA since then. He has been creating computer imagery and numerical simulations for over 25 years, and started exhibiting his artwork in 2001.
Robert Strzodka (Max-Planck-Institut für Informatik) Everyday Parallelism
Abstract: Parallelism is largely seen as a necessary evil to cope with the power restrictions on a chip and most programmers would prefer to continue writing sequential programs rather than dealing with the alien and error-prone parallel programming. This talk will question this view and point out how the allegedly unfamiliar parallel processing is utilized by millions of people everyday. Parallelism appears as a course only when looking at it from the crooked illusion of sequential processing. Admittedly, there are critical decisions associated with specialization, data movement or synchronization, but we also have lots of experience in taking them because they are performed everyday. Presented results will demonstrate that the drawn analogies are not just theoretic.
Keita Teranishi (CRAY Inc) Locally-Self-Consistent Multiple-Scattering code (LSMS) for GPUs
Abstract: Locally-Self-Consistent Multiple-Scattering (LSMS) is one of the major petascale applications and highly tuned for supercomputer systems like Cray XT5 Jaguar. We present our recent effort on porting and tuning the major computational routine of LSMS to GPU based systems to demonstrate the feasibility of LSMS beyond petaflops. In particular, we discuss the techniques, including auto-tuning of dense matrix kernels and computation-communication overlap.
Jonas Tölke (Ingrain) Lattice Boltzmann Multi-Phase Simulations in Porous Media using GPUs
Abstract: We present a very efficient implementation of a multiphase lattice Boltzmann methods (LBM) based on CUDA. This technology delivers significant benefits for predictions of properties in rocks. The simulator on NVIDIA hardware enables us to perform pore scale multi-phase (oil-water-matrix) simulations in natural porous media and to predict important rock properties like absolute permeability, relative permeabilites, and capillary pressure. We will show videos of these simulations in complex real world porous media and rocks.
Jonas Tölke (Ingrain) Digital rocks physics: fluid flow in rocks
Abstract: We show how Ingrain's digital rock physics technology works to predict fluid flow properties in rocks. NVIDIA CUDA technology delivers significant acceleration for this technology. The simulator on NVIDIA hardware enables us to perform pore scale multi-phase (oil-water-matrix) simulations in natural porous media and to predict important rock properties like absolute permeability, relative permeabilites, and capillary pressure.
Ulrike Meier Yang (Lawrence Livermore National Laboratory) Preparing Algebraic Multigrid for Exascale
Abstract: Algebraic Multigrid (AMG) solvers are an essential component of many large-scale scientific simulation codes. Their continued numerical scalability and efficient implementation is critical for preparing these codes for exascale. Our experiences on modern multi-core machines show that significant challenges must be addressed for AMG to perform well on such machines. We discuss our experiences and describe the techniques we have used to overcome scalability challenges for AMG on hybrid architectures in preparation for exascale.
Rio Yokota (Boston University) Fast Multipole Methods on large cluster of GPUs
Abstract: The combination of algorithmic acceleration and hardware acceleration can have tremendous impact. The FMM is a fast algorithm for calculating matrix vector multiplications in O(N) time, and it runs very fast on GPUs. Its combination of high degree of parallelism and O(N) complexity make it an attractive solver for the Peta-scale and Exa-scale era. It has a wide range of applications, e.g. quantum mechanics, molecular dynamics, electrostatics, acoustics, structural mechanics, fluid mechanics, and astrophysics.
Visitors in Residence
Adebisi Agboola University of California, Santa Barbara 1/2/2011 - 1/6/2011
Douglas N. Arnold University of Minnesota 9/1/2010 - 6/30/2011
Gerard Michel Awanou Northern Illinois University 9/1/2010 - 6/10/2011
Hasan Babaei Auburn University 1/9/2011 - 1/14/2011
Matthew Baker Georgia Institute of Technology 1/3/2011 - 1/5/2011
Nusret Balci University of Minnesota 9/1/2009 - 8/31/2011
Lorena A. Barba Boston University 1/9/2011 - 1/15/2011
Hyman Bass University of Michigan 1/2/2011 - 1/5/2011
Alexander A. Beilinson University of Chicago 1/2/2011 - 1/5/2011
Jonathan Bentz CRAY Inc 1/10/2011 - 1/14/2011
John Ferderick Bergdall Brandeis University 1/3/2011 - 1/5/2011
Laurent Berger École Normale Supérieure de Lyon 1/1/2011 - 1/6/2011
Vladimir Berkovich Weizmann Institute of Science 1/2/2011 - 1/6/2011
Manjul Bhargava Princeton University 1/2/2011 - 1/5/2011
Alexander Borisov University of Pittsburgh 1/2/2011 - 1/6/2011
Nigel Boston University of Wisconsin-Madison 1/3/2011 - 1/5/2011
Susanne C. Brenner Louisiana State University 9/1/2010 - 6/10/2011
Richard C. Brower Boston University 1/9/2011 - 1/13/2011
Armand Brumer Fordham University 1/2/2011 - 1/5/2011
Joe Buhler Center for Communications Research 1/2/2011 - 1/5/2011
Gregory Scott Call Amherst College 1/2/2011 - 1/5/2011
Cris Cecka Stanford University 1/8/2011 - 1/14/2011
Aycil Cesmelioglu University of Minnesota 9/30/2010 - 8/30/2011
Byungchul Cha Muhlenberg College 1/2/2011 - 1/5/2011
Chi Hin Chan University of Minnesota 9/1/2009 - 8/31/2011
Jung Hee Cheon Seoul National University 1/3/2011 - 1/5/2011
Ionut Ciocan-Fontanine University of Minnesota 1/3/2011 - 1/5/2011
Dustin Clausen Massachusetts Institute of Technology 1/2/2011 - 1/5/2011
Bernardo Cockburn University of Minnesota 9/1/2010 - 6/30/2011
Jonathan M. Cohen NVIDIA Corporation 1/9/2011 - 1/14/2011
Jintao Cui University of Minnesota 8/31/2010 - 8/30/2011
Eric Felix Darve Stanford University 1/8/2011 - 1/9/2011
Clint Dawson University of Texas at Austin 1/30/2011 - 2/5/2011
Jack J. Dongarra University of Tennessee 1/9/2011 - 1/11/2011
Geir Ellingsrud University of Oslo 1/2/2011 - 1/6/2011
Anne C. Elster Norwegian University of Science and Technology (NTNU) 1/9/2011 - 1/15/2011
Allan Peter Engsig-Karup Technical University of Denmark 1/9/2011 - 1/14/2011
Carl Erickson Harvard University 1/2/2011 - 1/5/2011
Selim Esedoglu University of Michigan 1/20/2011 - 6/10/2011
Randy H. Ewoldt University of Minnesota 9/1/2009 - 8/31/2011
Liwu Fan Auburn University 1/9/2011 - 1/14/2011
Oscar E. Fernandez University of Minnesota 8/31/2010 - 8/30/2011
Daniel Flath Macalester College 1/3/2011 - 1/5/2011
Jean-Marc Fontaine Université de Paris XI (Paris-Sud) 1/2/2011 - 1/8/2011
Geoffrey Charles Fox Indiana University 1/12/2011 - 1/14/2011
Martin J. Gander Universite de Geneve 1/9/2011 - 1/15/2011
Paul Garrett University of Minnesota 1/3/2011 - 1/5/2011
Gaurav Gaurav University of Minnesota 1/9/2011 - 1/14/2011
Toby Gee Northwestern University 1/3/2011 - 1/5/2011
Mike Giles University of Oxford 1/8/2011 - 1/14/2011
Dominik Göddeke Universität Dortmund 1/8/2011 - 1/15/2011
Edray Herber Goins Purdue University 1/3/2011 - 1/5/2011
Jay Gopalakrishnan University of Florida 9/1/2010 - 6/30/2011
Vincent John Graziano CRAY Inc 1/9/2011 - 1/9/2011
Leopold Grinberg Brown University 1/10/2011 - 1/15/2011
Bobby Grizzard University of Texas at Austin 1/3/2011 - 1/5/2011
Benedict Gross Harvard University 1/2/2011 - 1/5/2011
Shiyuan Gu Louisiana State University 9/1/2010 - 6/30/2011
Joseph Gunther University of Texas at Austin 1/2/2011 - 1/5/2011
Ren Guo University of Minnesota 1/3/2011 - 1/5/2011
Thomas C. Hales University of Pittsburgh 1/2/2011 - 1/5/2011
Michael A. Heroux Sandia National Laboratories 1/9/2011 - 1/14/2011
Wei Ho Columbia University 1/2/2011 - 1/5/2011
Yulia Hristova University of Minnesota 9/1/2010 - 8/31/2011
Luc Illusie Université de Paris XI (Paris-Sud) 1/2/2011 - 1/6/2011
Elizabeth R. Jessup University of Colorado 1/8/2011 - 1/14/2011
Dihua Jiang University of Minnesota 1/3/2011 - 1/5/2011
Jennifer Johnson-Leung University of Idaho 1/2/2011 - 1/5/2011
John Jones Arizona State University 1/2/2011 - 1/6/2011
Nick Katz Princeton University 1/2/2011 - 1/5/2011
Dinesh Kaushik Argonne National Laboratory 1/11/2011 - 1/14/2011
Markus Keel University of Minnesota 7/21/2008 - 6/30/2011
David E. Keyes Columbia University 1/8/2011 - 1/13/2011
Mark Kisin Harvard University 1/2/2011 - 1/5/2011
Andreas Klöckner New York University 1/9/2011 - 1/14/2011
Matthew Gregg Knepley University of Chicago 1/9/2011 - 1/15/2011
Pawel Konieczny University of Minnesota 9/1/2009 - 8/31/2011
Kenneth Kramer Queens College, CUNY 1/2/2011 - 1/5/2011
Hugo Leclerc École Normale Supérieure de Cachan 1/9/2011 - 1/14/2011
Miriam Leeser Northeastern University 1/9/2011 - 1/14/2011
Gilad Lerman University of Minnesota 9/1/2010 - 6/30/2011
Hengguang Li University of Minnesota 8/16/2010 - 8/15/2011
Lizao (Larry) Li University of Minnesota 1/3/2011 - 1/5/2011
Peng Li University of Minnesota 1/9/2011 - 1/9/2011
David J. Lilja University of Minnesota 1/10/2011 - 1/14/2011
Zhi (George) Lin University of Minnesota 9/1/2009 - 8/31/2011
Baiying Liu University of Minnesota 1/3/2011 - 1/5/2011
Jonathan Lubin Brown University 1/1/2011 - 1/6/2011
Benjamin E Lundell Cornell University 1/3/2011 - 1/6/2011
Mitchell Luskin University of Minnesota 9/1/2010 - 6/30/2011
Chris Lyons University of Michigan 1/2/2011 - 1/4/2011
Kara Lee Maki University of Minnesota 9/1/2009 - 8/31/2011
Yu (David) Mao University of Minnesota 8/31/2010 - 8/30/2011
David Mayhew Advanced Micro Devices 1/14/2011 - 1/14/2011
William McCallum University of Arizona 1/3/2011 - 1/5/2011
Lois Curfman McInnes Argonne National Laboratory 1/10/2011 - 1/13/2011
William Messing University of Minnesota 1/3/2011 - 1/5/2011
Ina Mette American Mathematical Society 1/2/2011 - 1/5/2011
Irina Mitrea University of Minnesota 8/16/2010 - 6/14/2011
Dimitrios Mitsotakis University of Minnesota 10/27/2010 - 8/31/2011
Kevin Mugo Purdue University 1/3/2011 - 1/5/2011
Gregg Musiker University of Minnesota 1/3/2011 - 1/5/2011
Dan Negrut University of Wisconsin-Madison 1/9/2011 - 1/14/2011
Switala Nicholas University of Minnesota 1/3/2011 - 1/5/2011
Sylvain Nintcheu Fata Oak Ridge National Laboratory 11/1/2010 - 1/29/2011
Andrew Odlyzko University of Minnesota 1/3/2011 - 1/5/2011
Peter J. Olver University of Minnesota 1/3/2011 - 1/5/2011
Alexandra Ortan University of Minnesota 9/16/2010 - 6/15/2011
Cecilia Ortiz-Duenas University of Minnesota 9/1/2009 - 8/31/2011
Miguel Pauletti Texas A & M University 1/8/2011 - 1/14/2011
Carl Pomerance Dartmouth College 1/2/2011 - 1/5/2011
Bjorn Poonen Massachusetts Institute of Technology 1/2/2011 - 1/5/2011
Cristian D Popescu University of California, San Diego 1/2/2011 - 1/5/2011
Weifeng (Frederick) Qiu University of Minnesota 8/31/2010 - 8/30/2011
Vincent Quenneville-Belair University of Minnesota 9/16/2010 - 6/15/2011
Varun Ramesh University of Minnesota 1/9/2011 - 1/14/2011
Wayne Raskind Arizona State University 1/2/2011 - 1/5/2011
Michel Raynaud Université de Paris XI (Paris-Sud) 1/2/2011 - 1/6/2011
Fernando Reitich University of Minnesota 9/1/2010 - 6/30/2011
Kenneth A. Ribet University of California, Berkeley 1/2/2011 - 1/5/2011
Eric Riedl Harvard University 1/2/2011 - 1/6/2011
David Peter Roberts University of Minnesota 1/2/2011 - 1/5/2011
Fernando Rodriguez Villegas University of Texas at Austin 1/2/2011 - 1/5/2011
Michael I. Rosen Brown University 1/2/2011 - 1/5/2011
Jeffrey A. Rosoff Gustavus Adolphus College 1/2/2011 - 1/5/2011
Karl Rubin University of California, Irvine 1/2/2011 - 1/5/2011
Hakizumwami Birali Runesha University of Minnesota 1/9/2011 - 1/14/2011
Yousef Saad University of Minnesota 1/10/2011 - 1/14/2011
David J Saltman Institute for Defense Analyses (IDA) 1/2/2011 - 1/5/2011
Nayda G. Santiago University of Puerto Rico 1/9/2011 - 1/15/2011
Fadil Santosa University of Minnesota 7/1/2008 - 6/30/2011
Olaf Schenk Universität Basel 1/8/2011 - 1/15/2011
Bertil Schmidt Nanyang Technological University 1/9/2011 - 1/15/2011
Anthony Scudiero CRAY Inc 1/10/2011 - 1/14/2011
Shankar Sen Cornell University 1/2/2011 - 1/5/2011
Chehrzad Shakiban University of Minnesota 1/3/2011 - 1/5/2011
Shuanglin Shao University of Minnesota 9/1/2009 - 8/31/2011
Stephen S. Shatz University of Pennsylvania 1/2/2011 - 1/5/2011
Alice Silverberg University of California, Irvine 1/2/2011 - 1/5/2011
Ethan Smith Michigan Technological University 1/2/2011 - 1/5/2011
Steven Sperber University of Minnesota 1/3/2011 - 1/5/2011
Harold M. Stark University of California, San Diego 1/2/2011 - 1/5/2011
William A. Stein University of Washington 1/2/2011 - 1/5/2011
Panagiotis Stinis University of Minnesota 9/1/2010 - 6/30/2011
Mark J. Stock Applied Scientific Research 1/8/2011 - 1/14/2011
Michael Stopa Harvard University 1/9/2011 - 1/14/2011
Allan Struthers Michigan Technological University 1/9/2011 - 1/15/2011
Robert Strzodka Max-Planck-Institut für Informatik 1/8/2011 - 1/14/2011
Li-yeng Sung Louisiana State University 9/1/2010 - 6/15/2011
Habiballah Talavatifard Texas A & M University 1/8/2011 - 1/14/2011
Nicolae Tarfulea Purdue University, Calumet 9/1/2010 - 6/15/2011
John Tate University of Texas at Austin 1/1/2011 - 1/9/2011
Jeremy Teitelbaum University of Connecticut 1/2/2011 - 1/5/2011
Keita Teranishi CRAY Inc 1/9/2011 - 1/14/2011
Dinesh S. Thakur University of Arizona 1/2/2011 - 1/5/2011
Jonas Tölke Ingrain 1/9/2011 - 1/14/2011
Dimitar Trenev University of Minnesota 9/1/2009 - 8/31/2011
Zohra Tridane University of Minnesota 1/10/2011 - 1/14/2011
Jerrold Tunnell Rutgers University 1/2/2011 - 1/5/2011
Douglas Ulmer Georgia Institute of Technology 1/3/2011 - 1/5/2011
Jeffrey Vaaler University of Texas at Austin 1/2/2011 - 1/5/2011
Marie-France Vigneras Institut de Mathematiques de Jussieu 1/1/2011 - 1/5/2011
Vasily Volkov University of California, Berkeley 1/9/2011 - 1/14/2011
Jose Felipe Voloch University of Texas at Austin 1/2/2011 - 1/6/2011
Jamie Emmanuel Weigandt Purdue University 1/3/2011 - 1/6/2011
Melanie Matchett Wood Stanford University 1/2/2011 - 1/5/2011
Ulrike Meier Yang Lawrence Livermore National Laboratory 1/10/2011 - 1/14/2011
Yimu Yin University of Pittsburgh 1/2/2011 - 1/5/2011
Rio Yokota Boston University 1/11/2011 - 1/15/2011
Qing Chaney Zhang Ohio State University 1/2/2011 - 1/6/2011
Shuxia Zhang University of Minnesota 1/10/2011 - 1/14/2011
Xudong Zheng University of Illinois 1/2/2011 - 1/5/2011
Legend: Postdoc or Industrial Postdoc Long-term Visitor

IMA Affiliates:
Arizona State University, Boeing, Corning Incorporated, ExxonMobil, Ford, General Motors, Georgia Institute of Technology, Honeywell, IBM, Indiana University, Iowa State University, Korea Advanced Institute of Science and Technology (KAIST), Lawrence Livermore National Laboratory, Lockheed Martin, Los Alamos National Laboratory, Medtronic, Michigan State University, Michigan Technological University, Mississippi State University, Northern Illinois University, Ohio State University, Pennsylvania State University, Portland State University, Purdue University, Rice University, Rutgers University, Sandia National Laboratories, Schlumberger Cambridge Research, Schlumberger-Doll, Seoul National University, Siemens, Telcordia, Texas A & M University, University of Central Florida, University of Chicago, University of Delaware, University of Houston, University of Illinois at Urbana-Champaign, University of Iowa, University of Kentucky, University of Maryland, University of Michigan, University of Minnesota, University of Notre Dame, University of Pennsylvania, University of Pittsburgh, University of Tennessee, University of Wisconsin-Madison, University of Wyoming, US Air Force Research Laboratory, Wayne State University, Worcester Polytechnic Institute