Institute for Mathematics and its Applications University of Minnesota 114 Lind Hall 207 Church Street SE Minneapolis, MN 55455 
20102011 Program
See http://www.ima.umn.edu/20102011/ for a full description of the 20102011 program on Simulating Our Complex World: Modeling, Computation and Analysis.
10:00am11:00am  Registration and coffee  Keller Hall 3176  SW1.35.11  
11:00am11:15am  Welcome to the IMA  Fadil Santosa (University of Minnesota)  Keller Hall 3180  SW1.35.11 
11:15am12:15pm  Points on Shimura varieties mod p  Mark Kisin (Harvard University)  Keller Hall 3180  SW1.35.11 
12:15pm2:00pm  Lunch  SW1.35.11  
2:00pm3:00pm  Elliptic curves: problems and applications  Carl Pomerance (Dartmouth College)  Keller Hall 3180  SW1.35.11 
3:00pm3:10pm  Group photo  SW1.35.11  
3:10pm3:40pm  Coffee break  Keller Hall 3176  SW1.35.11  
3:40pm4:40pm  Selmer ranks of elliptic curves in families of quadratic twists  Karl Rubin (University of California, Irvine)  Keller Hall 3180  SW1.35.11 
5:00pm7:00pm  Reception at the Campus Club Bar Lounge  Campus Club Bar Lounge  SW1.35.11 
8:30am9:00am  Coffee  Keller Hall 3176  SW1.35.11  
9:00am10:00am  An equivariant main conjecture in Iwasawa theory and applications  Cristian D Popescu (University of California, San Diego)  Keller Hall 3180  SW1.35.11 
10:00am10:30am  Coffee break  Keller Hall 3176  SW1.35.11  
10:30am11:30am  Permanence following Temkin  Michel Raynaud (Université de Paris XI (ParisSud))  Keller Hall 3180  SW1.35.11 
11:30am2:00pm  Lunch  SW1.35.11  
2:00pm3:00pm  On the geometry of character varieties  Fernando Rodriguez Villegas (University of Texas at Austin)  Keller Hall 3180  SW1.35.11 
3:00pm3:30pm  Coffee break  Keller Hall 3176  SW1.35.11  
3:30pm4:30pm  The average rank of elliptic curves  Manjul Bhargava (Princeton University)  Keller Hall 3180  SW1.35.11 
6:00pm8:00pm  Banquet at McNamara Alumni Center, Heritage Gallery  McNamara Alumni Center, Heritage Gallery 200 Oak Street S.E. Minneapolis, MN 55455 6126249831 
SW1.35.11 
8:30am9:00am  Coffee  Keller Hall 3176  SW1.35.11  
9:00am10:00am  Padic periods and derived de Rham cohomology  Alexander A. Beilinson (University of Chicago)  Keller Hall 3180  SW1.35.11 
10:00am10:30am  Coffee break  Keller Hall 3176  SW1.35.11  
10:30am11:30am  Random maximal isotropic subspaces and Selmer groups  Bjorn Poonen (Massachusetts Institute of Technology)  Keller Hall 3180  SW1.35.11 
11:30am1:00pm  Lunch  SW1.35.11  
1:00pm2:00pm  Vector bundles and padic Galois representations  JeanMarc Fontaine (Université de Paris XI (ParisSud))  Keller Hall 3180  SW1.35.11 
2:00pm2:05pm  Closing remarks  Keller Hall 3180  SW1.35.11 
10:45am11:15am  Coffee break  Lind Hall 400 
10:45am11:15am  Coffee break  Lind Hall 400 
All Day  Eric Darve (Stanford University) –
Chair morning session and the opening
of the afternoon session. David E. Keyes (KAUST/Columbia University) will preside over at the end of Cris Cecka's afternoon session.  T1.9.11  
9:00am9:30am  Registration and coffee  Lind Hall 400  T1.9.11  
9:30am10:30am  Lecture 1: Implications of the exascale roadmap for algorithms  David E. Keyes King Abdullah University of Science & Technology, Columbia University  Lind Hall 305  T1.9.11 
10:30am11:00am  Break  Lind Hall 400  T1.9.11  
11:00am12:00pm  Lecture 2: Implications of the exascale roadmap for algorithms  David E. Keyes King Abdullah University of Science & Technology, Columbia University  Lind Hall 305  T1.9.11 
12:00pm1:30pm  Lunch  T1.9.11  
1:30pm2:30pm  Lecture 1: Scientific Computing using Graphics Processors  Cris Cecka (Stanford University)  Lind Hall 305  T1.9.11 
2:30pm3:30pm  Lecture 2: Scientific Computing with Graphics Processors  Cris Cecka (Stanford University)  Lind Hall 305  T1.9.11 
3:30pm4:00pm  Break  Lind Hall 400  T1.9.11  
4:00pm4:30pm  Lecture 3: Introduction to heterogeneous computing with GPUs  Cris Cecka (Stanford University)  Lind Hall 305  T1.9.11 
All Day  Morning Chair: David E. Keyes (King Abdullah University of Science & Technology/ Columbia University) Afternoon Chair: Lorena A. Barba (Boston University)  W1.1014.11  
8:15am9:15am  Registration and coffee  Keller Hall 3176  W1.1014.11  
9:15am9:30am  Welcome to IMA  Fadil Santosa (University of Minnesota)  Keller Hall 3180  W1.1014.11 
9:30am10:30am  The Exascale: Why and How  David E. Keyes King Abdullah University of Science & Technology, Columbia University  Keller Hall 3180  W1.1014.11 
10:30am11:00am  Break  Keller Hall 3176  W1.1014.11  
11:00am12:00pm  Architectureaware Algorithms and Software for Scalable Performance and Resilience on Heterogeneous Architectures  Jack J. Dongarra (University of Tennessee)  Keller Hall 3180  W1.1014.11 
12:00pm2:00pm  Lunch  W1.1014.11  
2:00pm3:00pm  Everyday Parallelism  Robert Strzodka (MaxPlanckInstitut für Informatik)  Keller Hall 3180  W1.1014.11 
3:00pm4:00pm  The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs or How to Avoid the Heroics of GPU Programming  Miriam Leeser (Northeastern University)  Keller Hall 3180  W1.1014.11 
4:00pm4:30pm  Break  Keller Hall 3176  W1.1014.11  
4:30pm5:30pm  GPU programming from higher level representations  Matthew Gregg Knepley (University of Chicago)  Keller Hall 3180  W1.1014.11 
All Day  Chair: Matthew Gregg Knepley (University of Chicago)  W1.1014.11  
8:00am8:30am  Coffee  Keller Hall 3176  W1.1014.11  
8:30am9:30am  Thinking parallel: sparse iterative solvers with CUDA  Jonathan M. Cohen (NVIDIA Corporation)  Keller Hall 3180  W1.1014.11 
9:30am10:30am  A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures  Olaf Schenk (Universität Basel)  Keller Hall 3180  W1.1014.11 
10:30am11:00am  Break  Keller Hall 3176  W1.1014.11  
11:00am12:00pm  Large Scale Frictional Contact Dynamics on the GPU  Dan Negrut (University of WisconsinMadison)  Keller Hall 3180  W1.1014.11 
12:00pm2:00pm  Lunch  W1.1014.11  
2:00pm4:00pm  Group photo and Discussion  Keller Hall 3180  W1.1014.11  
4:00pm4:30pm  Break  Keller Hall 3176  W1.1014.11  
4:30pm6:00pm  Reception and Poster Session Poster submissions welcome from all participants Instructions  Lind Hall 400  W1.1014.11  
Algorithms for Lattice Field Theory at Extreme Scales  Richard C. Brower (Boston University)  
Medical Imaging on the GPU Using OpenCL: 3D Surface Extraction and 3D Ultrasound Reconstruction  Anne C. Elster (Norwegian University of Science and Technology (NTNU))  
Development of a new massively parallel tool for nonlinear free surface wave simulation  Allan Peter EngsigKarup (Technical University of Denmark)  
Development of Desktop Computing Applications and Engineering Tools on GPUs  Allan Peter EngsigKarup (Technical University of Denmark)  
A Domain Decomposition Method that Converges in Two Iterations for any Subdomain Decomposition and PDE  Martin J. Gander (Universite de Geneve)  
Efficient Uncertainty Quantification using GPUs  Gaurav Gaurav (University of Minnesota)  
Brain Perfusion: Multiscale Simulations and Visualization  Leopold Grinberg (Brown University)  
MixedPrecision GPUMultigrid Solvers with Strong Smoothers  Dominik Göddeke (Universität Dortmund) Robert Strzodka (MaxPlanckInstitut für Informatik)  
The Build to Order Compiler for Matrix Algebra Optimization  Elizabeth R. Jessup (University of Colorado)  
Global symbolic manipulations and code generation for Finite Elements on SIM[DT] hardware  Hugo Leclerc (École Normale Supérieure de Cachan)  
GPU Acceleration in a Modern Problem Solving Environment: SCIRun's Linear System Solvers  Miriam Leeser (Northeastern University)  
Hyperspectral Image Analysis for Abundance Estimation using GPUs  Nayda G. Santiago (University of Puerto Rico)  
A GPUaccelerated Boundary Element Method and Vortex Particle Method  Mark J. Stock (Applied Scientific Research)  
LocallySelfConsistent MultipleScattering code (LSMS) for GPUs  Keita Teranishi (CRAY Inc)  
Digital rocks physics: fluid flow in rocks  Jonas Tölke (Ingrain)  
Preparing Algebraic Multigrid for Exascale  Ulrike Meier Yang (Lawrence Livermore National Laboratory)  
Fast Multipole Methods on large cluster of GPUs  Rio Yokota (Boston University) 
All Day  Morning Chair: Jonathan M. Cohen (NVIDIA Corporation) Afternoon Chair: Susanne C. Brenner (Louisiana State University)  W1.1014.11  
8:00am8:30am  Coffee  Keller Hall 3176  W1.1014.11  
8:30am9:30am  Application of Assembly of Finite Element Methods on Graphics Processors for RealTime Elastodynamics  Cris Cecka (Stanford University)  Keller Hall 3180  W1.1014.11 
9:30am10:30am  Lattice Boltzmann MultiPhase Simulations in Porous Media using GPUs  Jonas Tölke (Ingrain)  Keller Hall 3180  W1.1014.11 
10:30am11:00am  Break  Keller Hall 3176  W1.1014.11  
11:00am12:00pm  The basis and perspectives of an exascale algorithm: our ExaFMM project  Lorena A. Barba (Boston University)  Keller Hall 3180  W1.1014.11 
12:00pm2:00pm  Lunch  W1.1014.11  
2:00pm3:00pm  Discussion [note room assignment]  W1.1014.11  
Reproducible results and open source code Keller Hall 3180  Lorena A. Barba (Boston University)  
Mixed precision computing Lind Hall 401  Richard C. Brower (Boston University)  
Exascale programming models Lind Hall 409  Michael A. Heroux (Sandia National Laboratories)  
3:00pm4:00pm  Highorder DG Wave Propagation on GPUs: Infrastructure, Implementation, Method Improvements  Andreas Klöckner (New York University)  Keller Hall 3180  W1.1014.11 
6:00pm8:30pm  Workshop social reception at Stub and Herbs 227 Oak St Minneapolis, MN 55414 (612) 3790555  Stub and Herbs 227 Oak St Minneapolis, MN 55414 (612) 3790555 
W1.1014.11 
All Day  Morning Chair: Ulrike Meier Yang (Lawrence Livermore National Laboratory) Afternoon Chair: Mike Giles (University of Oxford)  W1.1014.11  
8:00am8:30am  Coffee  Keller Hall 3176  W1.1014.11  
8:30am9:30am  Algorithms and Tools for Bioinformatics on GPUs  Bertil Schmidt (Nanyang Technological University)  Keller Hall 3180  W1.1014.11 
9:30am10:30am  Algorithmic Fluid Art – Influences, Process, and Works  Mark J. Stock (Applied Scientific Research)  Keller Hall 3180  W1.1014.11 
10:30am11:00am  Break  Keller Hall 3176  W1.1014.11  
11:00am12:00pm  OP2: an opensource library for unstructured grid applications  Mike Giles (University of Oxford)  Keller Hall 3180  W1.1014.11 
12:00pm2:00pm  Lunch  W1.1014.11  
2:00pm3:00pm  Ultraparallel solvers for multiscale brain blood flow simulations on exascale computers  Leopold Grinberg (Brown University)  Keller Hall 3180  W1.1014.11 
3:00pm4:00pm  Clouds MapReduce and HPC  Geoffrey Charles Fox (Indiana University)  Keller Hall 3180  W1.1014.11 
All Day  Chair: Mark J. Stock (Applied Scientific Research)  W1.1014.11  
8:00am8:30am  Coffee  Keller Hall 3176  W1.1014.11  
8:30am9:30am  I See GPU Shapes in the Clouds  David Mayhew (Advanced Micro Devices)  Keller Hall 3180  W1.1014.11 
9:30am10:30am  RealTime Medical and Geological Processing on GPUbased Systemss: Experiences and Challenges  Anne C. Elster (Norwegian University of Science and Technology (NTNU))  Keller Hall 3180  W1.1014.11 
10:30am11:00am  Break  Keller Hall 3176  W1.1014.11  
11:00am12:00pm  Emerging Programming and Machine Models: Opportunities for Numerical Algorithms R&D  Michael A. Heroux (Sandia National Laboratories)  Keller Hall 3180  W1.1014.11 
12:00pm12:05pm  Closing remarks  Keller Hall 3180  W1.1014.11 
All Day  Martin Luther King, Jr. Day. The IMA is closed. 
10:45am11:15am  Coffee break  Lind Hall 400 
10:45am11:15am  Coffee break  Lind Hall 400 
10:45am11:15am  Coffee break  Lind Hall 400 
10:45am11:15am  Coffee break  Lind Hall 400 
10:45am11:15am  Coffee break  Lind Hall 400 
10:45am11:15am  Coffee break  Lind Hall 400  
11:15am12:15pm  On dispersive effect of the Coriolis force for the stationary NavierStokes equations  Pawel Konieczny (University of Minnesota)  Lind Hall 305  PS 
10:45am11:15am  Coffee break  Lind Hall 400  
2:30pm3:30pm  Math 8994: Discontinuous Galerkin methods: An introduction  The original method: linear scalar transport  Bernardo Cockburn (University of Minnesota)  Lind Hall 305 
10:45am11:15am  Coffee break  Lind Hall 400 
10:45am11:15am  Coffee break  Lind Hall 400 
10:45am11:15am  Coffee break  Lind Hall 400  
1:30pm2:30pm  Tutorial Lectures: Modeling Hurricane Storm Surges  Lecture 1: Introduction to the shallow water equations  Clint Dawson (University of Texas at Austin)  Lind Hall 305 
Event Legend: 

PS  IMA Postdoc Seminar 
SW1.35.11  First Abel Conference A Mathematical Celebration of John Tate 
T1.9.11  Scientific Computing Using Graphics Processors 
W1.1014.11  High Performance Computing and Emerging Architectures 
Lorena A. Barba (Boston University)  The basis and perspectives of an exascale algorithm: our ExaFMM project 
Abstract: Linearly scaling algorithms will be crucial for the problem sizes that will be tackled in capability exascale systems. It is interesting to note that many of the most successful algorithms are hierarchical in nature, such as multigrid methods and fast multipole methods (FMM). We have been leading development efforts for opensource FMM software for some time, and recently produced GPU implementations of the various computational kernels involved in the FMM algorithm. Most recently, we have produced a multiGPU code, and performed scalability studies showing high parallel efficiency in strong scaling. These results have pointed to several features of the FMM that make it a particularly favorable algorithm for the emerging heterogeneous, manycore architectural landscape. We propose that the FMM algorithm offers exceptional opportunities to enable exascale applications. Among its exascalesuitable features are: (i) it has intrinsic geometric locality, and access patterns are made local via particle indexing techniques; (ii) we can achieve temporal locality via an efficient queuing of GPU tasks before execution, and at a fine level by means of memory coalescing based on the natural indexsorting techniques; (iii) global data communication and synchronization, often a significant impediment to scalability, is a soft barrier for the FMM, where the most timeconsuming kernels are, respectively, purely local (particletoparticle interactions) and "hierarchically synchronized" (multipoletolocal interactions, which happen simultaneously at every level of the tree). In addition, we suggest a strategy for achieving the best algorithmic performance, based on two key ideas: (i) hybridize the FMM with treecode by choosing onthefly between particleparticle, particlebox, and boxbox interactions, according to a work estimate; (ii) apply a dynamic errorcontrol technique, effected on the treecode by means of a variable "boxopening angle" and on the FMM by means of a variable order of the multipole expansion. We have carried out preliminary implementation of these ideas/techniques, achieving a 14x speedup with respect to our current published version of the FMM. Considering that this effort was only exploratory, we are certain to possess the potential for unprecedented performance with these algorithms.  
Lorena A. Barba (Boston University)  Reproducible results and open source code Keller Hall 3180 
Abstract: No Abstract  
Alexander A. Beilinson (University of Chicago)  Padic periods and derived de Rham cohomology 
Abstract: I will show that Fontaine's ring of padic periods can be realized as the ring of universal padic constants in the sense of derived algebraic geometry, and discuss a possible new construction of the padic periods map.  
Manjul Bhargava (Princeton University)  The average rank of elliptic curves 
Abstract: No Abstract  
Richard C. Brower (Boston University)  Algorithms for Lattice Field Theory at Extreme Scales 
Abstract: Increases in computational power allow lattice field theories to resolve smaller scales, but to realize the full benefit for scientific discovery, new multiscale algorithms must be developed to maximize efficiency. Examples of new trends in algorithms include adaptive multigrid solvers for the quark propagator and an improved symplectic Force Gradient integrator for the Hamiltonian evolution used to include the quark contribution to vacuum fluctuations in the quantum path integral. Future challenges to algorithms and software infrastructure targeting manycore GPU accelerators and heterogeneous extreme scale computing are discussed.  
Richard C. Brower (Boston University)  Mixed precision computing Lind Hall 401 
Abstract: No Abstract  
Cris Cecka (Stanford University)  Application of Assembly of Finite Element Methods on Graphics Processors for RealTime Elastodynamics 
Abstract: We discuss multiple strategies to perform general computations on unstructured grids using a GPU, with specific application to the assembly of systems of equations in finite element methods (FEMs). For each method, we discuss the GPU hardware's limiting resources, optimizations, key data structures, and dependence of the performance with respect to problem size, element size, and GPU hardware generation. These methods are applied to a nonlinear hyperelastic material model to develop a largescale realtime interactive elastodynamic visualization. By performing the assembly, solution, update, and visualization stages solely on the GPU, the similuation benefits from speedups in each stage and avoids costly GPUCPU transfers of data.  
Cris Cecka (Stanford University)  Lecture 1: Scientific Computing using Graphics Processors 
Abstract: In this short course, we introduce the GPU as a coprocessor for scientific computing. The course will review modern hardware, CUDA programming, algorithm design, and optimization considerations for this unique compute environment. Introductory example codes and slides will be available to aid attendees in using GPUs to accelerate their applications.  
Cris Cecka (Stanford University)  Lecture 3: Introduction to heterogeneous computing with GPUs 
Abstract: see abstract for Lecture 1  
Cris Cecka (Stanford University)  Lecture 2: Scientific Computing with Graphics Processors 
Abstract: see abstract for Lecture 1  
Jonathan M. Cohen (NVIDIA Corporation)  Thinking parallel: sparse iterative solvers with CUDA 
Abstract: Iterative sparse linear solvers are a critical component of a scientific computing platform. Developing effective preconditioning strategies is the main challenge in developing iterative sparse solvers on massively parallel systems. As computing systems become increasingly powerconstrained, memory hierarchies for massively parallel systems will become deeper and more hierarchical. Parallel algorithms with alltoall communication patterns that assume uniform memory access times will be inefficient on these systems. In this talk, I will outline the challenges of developing good parallel preconditioners, and demonstrate that domain decomposition methods have communication patterns that match emerging parallel platforms. I will present recent work to develop restricted additive Schwarz (RAS) preconditioners as part of the open source 'cusp' library of sparse parallel algorithms. On 2d Poisson problems, a RAS preconditioner is consistently faster than diagonal preconditioning in timetosolution. Detailed analysis demonstrates that the communication pattern of RAS matches the onchip bandwidths of a Fermi GPU. Line smoothing, which requires solving a large number of small tridiagonal linears systems in local memory, is another preconditioning approach with similar communication patterns. I will conclude with a roadmap for devoping a range of preconditioners, smoothers, and linear solvers on massively parallel hardware based on the domain decomposition and line smoothing approaches.  
Clint Dawson (University of Texas at Austin)  Tutorial Lectures: Modeling Hurricane Storm Surges  Lecture 1: Introduction to the shallow water equations 
Abstract: An overview of the twodimensional, depthaveraged shallow water equations. I will give underlying assumptions, derivation from the NavierStokes equations, and discuss the relevant forcing terms, including tides, wind and atmospheric pressure, gravity, and bottom friction.  
Jack J. Dongarra (University of Tennessee)  Architectureaware Algorithms and Software for Scalable Performance and Resilience on Heterogeneous Architectures 
Abstract:
In this talk we examine how high performance computing has changed over the last 10year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compiletime and runtime techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased runtime environment variability will make these problems much harder.
We will look at five areas of research that will have an importance impact in the development of software and algorithms.
We will focus on following themes:


Anne C. Elster (Norwegian University of Science and Technology (NTNU))  RealTime Medical and Geological Processing on GPUbased Systemss: Experiences and Challenges 
Abstract: GPUs are now massive floatingpoint stream processors that offer a source of energyefficient compute power on our laptops and desktops. Recent development of tools such as CUDA and OpenCL have made it much easier to utilize the computational power these systems offer. However, in order to optimally harness the the power of these GPUbased systems, there still are many challenges to overcome. In this talk, several issues related to our experiences with medical and geological processing applications that can benefit from realtime processing of data on GPUs, will be discussed. These include realtime medical imaging, e.g. for ultrasoundguided discovery and surgery, realtime seismic CT image enhancement, and using GPUs for realtime compression of seismic data in order to lower I/O latency. This talk will highlight work our research group has been involved dating back from 2006 through today.  
Anne C. Elster (Norwegian University of Science and Technology (NTNU))  Medical Imaging on the GPU Using OpenCL: 3D Surface Extraction and 3D Ultrasound Reconstruction 
Abstract: Collaborators: Frank Linseth, Holger Ludvigsen, Erik Smistad and Thor Kristian ValgerhaugGPUs offer a lot of compute power enabling realtime processing of images. This poster depict som our of group's recent work on image processing for medical applications on GPUs including 3D surface extraction using marching cubes and 3D ultrasound reconstruction. We have previously developed Cg and CUDA codes for wavelet transforms and CUDA codes for surface extraction for seismic images.  
Allan Peter EngsigKarup (Technical University of Denmark)  Development of a new massively parallel tool for nonlinear free surface wave simulation 
Abstract: The research objective of this work is to develop a new dedicated and massively parallel tool for efficient simulation of unsteady nonlinear free surface waves. The tool will be used for applications in coastal and offshore engineering, e.g. in connection with prediction of wave kinematics and forces at or near humanmade structures. The tool is based on a unified potential flow formulation which can account for fully nonlinear and dispersive wave motion over uneven depths under the assumptions of nonbreaking waves, irrotational and inviscid flow. This work is a continuation of earlier work and will continue to contribute to advancing stateoftheart for efficient wave simulation. The tool is expected to be orders of magnitude faster than current tools due to efficient algorithms and utilization of available hardware resources.  
Allan Peter EngsigKarup (Technical University of Denmark)  Development of Desktop Computing Applications and Engineering Tools on GPUs 
Abstract: GPULab  A competence center and laboratory for research and collaboration within academia and partners in industry has been established in 2008 at section for Scientific Computing, DTU informatics, Technical University of Denmark. In GPULab we focus on the utilization of Graphics Processing Units (GPUs) for highperformance computing applications and software tools in science and engineering, inverse problems, visualization, imaging, dynamic optimization. The goals are to contribute to the development of new stateoftheart mathematical models and algorithms for maximum throughout performance, improved performance profiling tools and assimilation of results to academic and industrial partners in our network. Our approaches calls for multidisciplinary skills and understanding of hardware, software development, profiling tools and tuning techniques, analytical methods for analysis and development of new approaches, together with expert knowledge in specific application areas within science and engineering. We anticipate that our research in a near future will bring new algorithms and insight in engineering and science applications targeting practical engineering problems.  
JeanMarc Fontaine (Université de Paris XI (ParisSud))  Vector bundles and padic Galois representations 
Abstract: Let $F$ be a perfect field of characteristic $p>0$ equipped with a non trivial absolute value, $E$ a non archimedean locally compact field whose residue field is contained in $F$ and $pi$ a uniformizing parameter of $E$. We associate functorially to these datas a separated integral noetherian regular scheme $X=X_{F,E,pi}$ of dimension $1$ defined over $E$. There is an equivalence of categories between semistable vector bundles of slope $0$ over $X$ and continuous $E$linear representations of the absolute Galois group $H_F$ of $F$.When $F$ is algebraically closed, the closed points of $F$ can be described in terms of the LubinTate formal group of $E$ corresponding to $pi$.If $C$ is the $p$adic completion of $overline Q_p$, one can associate to $C$ an algebraically closed field $F=F(C)$ as above and ${rm Gal)(overlineQ_p/Q_p)$ acts on the curve $X=X_{F(C),Q_p,p}$. The two main results of $p$adic Hodge theory can be recovered from the classification of vector bundles over $X$.(joint work with Laurent Fargues)Read more at http://www.math.upsud.fr/~fargues/Prepublications.html.  
Geoffrey Charles Fox (Indiana University)  Clouds MapReduce and HPC 
Abstract: 1) We analyze the different tradeoffs and goals of Grid, Cloud and parallel (cluster/supercomputer) computing. 2) They tradeoff performance, fault tolerance, ease of use (elasticity), cost, interoperability. 3) Different application classes (characteristics) fit different architectures and we describe a hybrid model with Grids for data, traditional supercomputers for large scale simulations and clouds for broad based "capacity computing" including many data intensive problems. 4) We discuss the impressive features of cloud computing platforms and compare MapReduce and MPI. 5) We take most of our examples from the life science area. 6) We conclude with a description of FutureGrid  a TeraGrid system for prototyping new middleware and applications.  
Martin J. Gander (Universite de Geneve)  A Domain Decomposition Method that Converges in Two Iterations for any Subdomain Decomposition and PDE 
Abstract: Joint work with Felix Kwok.All domain decomposition methods are based on a decomposition of the physical domain into many subdomains and an iteration, which uses subdomain solutions only (and maybe a coarse grid), in order to compute an approximate solution of the problem on the entire domain. We show in this poster that it is possible to formulate such an iteration, only based on subdomain solutions, which converges in two steps to the solution of the underlying problem, independently of the number of subdomains and the PDE solved. This method is mainly of theoretical interest, since it contains sophisticated nonlocal operators (and a natural coarse grid component), which need to be approximated in order to obtain a practical method.  
Gaurav Gaurav (University of Minnesota)  Efficient Uncertainty Quantification using GPUs 
Abstract: Joint work with Steven F. Wojtkiewicz (Department of Civil Engineering, University of Minnesota, Minneapolis, MN 55414, USA. bykvich@umn.edu).Graphics processing units (GPUs) have emerged as a much economical and a highly competitive alternative to CPUbased parallel computing. Recent studies have shown that GPUs consistently outperform their best corresponding CPUbased parallel computing equivalents by up to two orders of magnitude in certain applications. Moreover, the portability of the GPUs enables even a desktop computer to provide a teraflop (1012 floating point operations per second) of computing power. This study presents the gains in computational efficiency obtained using the GPUbased implementations of five types of algorithms frequently used in uncertainty quantification problems arising in the analysis of dynamical systems with uncertain parameters and/or inputs.  
Mike Giles (University of Oxford)  OP2: an opensource library for unstructured grid applications 
Abstract: Based on an MPI library written over 10 years ago, OP2 is a new opensource library which is aimed at application developers using unstructured grids. Using a single API, it targets a variety of backend architectures, including both manycore GPUs and multicore CPUs with vector units. The talk will cover the API design, key aspects of the parallel implementation on the different platforms, and preliminary performance results on a small but representative CFD test code.  
Leopold Grinberg (Brown University)  Ultraparallel solvers for multiscale brain blood flow simulations on exascale computers 
Abstract: Solvers for coupled multiscale (multiphysics) may be constructed by coupling an array of existing and well tested parallel numerical solvers, each designed to tackle a problem at different spatial and temporal scale. Each solver can be optimized/designed for different computer architecture. Future supercomputers may be composed of heterogeneous processing units, i.e., CPU/GPU. To make an efficient use of computational recourses, the coupled solvers must support topologyaware mapping of tasks to the processing units were the best parallel efficiency could be achieved.Arterial blood circulation is a multiscale process where time and space scales range from nanoseconds (nanometers) to seconds (meters), reciprocally. The macrovascular scales describing the flow dynamics in larger vessels are coupled to the mesovascular scales unfolding dynamics of individual blood cells. The meso vascular events are coupled to the microvascular ones accounting for blood perfusion, clot formation, adhesion of the blood cells to the arterial walls, etc. Besides the multiscale nature of the problem, its size often presents a substantial computational challenge even for simulations considering a single scale.In this talk we will try to envision the design of a multiscale solver for blood flow simulations, tailored to heterogeneous computer architecture.  
Leopold Grinberg (Brown University)  Brain Perfusion: Multiscale Simulations and Visualization 
Abstract: Joint work with J. Insley, M. Papka, and G. E. Karniadakis.Interactions of blood flow in the human brain occur between different scales, determined by flow features in the large arteries (above 0.5mm diameter), arterioles, and the capillaries (of 5E3 mm). To simulate such multiscale flow we develop mathematical models, numerical methods, scalable solvers and visualization tools. Our poster will present NektarG  a research code developed at Brown University for continuum and atomistic simulations. NektarG is based on a highorder spectral/hp element discretization featuring multipatch domain decomposition for continuum flow simulations, and modified DPDLAMMPS for mesoscopic simulations. The continuum and atomistic solvers are coupled via Multilevel Communicating Interface to exchange data required by interface conditions. The visualization software is based on ParaView and NektarG utilities accessed through the ParaView GUI. The new visualization software allows to simultaneously present data computed in coupled (multiscale) simulations. The software automatically synchronizes the display of time evolution of solutions at multiple scales.  
Dominik Göddeke (Universität Dortmund), Robert Strzodka (MaxPlanckInstitut für Informatik)  MixedPrecision GPUMultigrid Solvers with Strong Smoothers 
Abstract: We present efficient finegrained parallelization techniques for robust multigrid solvers and Krylov subspace schemes, in particular for numerically strong smoothing and preconditioning operators. We apply them to sparse illconditioned linear systems of equations that arise from gridbased discretization techniques like finite differences, volumes and elements; the systems are notoriously hard to solve due to severe anisotropies in the underlying mesh and differential operator. These strong smoothers are characterized by sequential data dependencies, and do not parallelize in a straightforward manner. For linewise preconditioners, exact parallel algorithms exist, and we present a novel, efficient implementation of a cyclic reduction tridiagonal solver. For other preconditioners, traditional wavefront techniques can be applied, but their irregular and limited parallelism makes them a bad match for GPUs. Therefore, we discuss multicoloring techniques to recover parallelism in these preconditioners, by decoupling some of the dependencies at the expense of at first reduced numerical performance. However, by carefully balancing the coupling strength (more colors) with the parallelization benefits, the multicolored variants retain almost all of the sequential numerical performance. Further improvements are achieved by merging the tridiagonal and GaußSeidel approach into a smoothing operator that combines their advantages, and by employing an alternating direction implicit scheme to gain independence of the numbering of the unknowns. Due to their advantageous numerical properties, multigrid solvers equipped with strong smoothers are between four and eight times more efficient than with simple GaußSeidel preconditioners, and we achieve speedups factors between six and 18 with the GPU implementations over carefully tuned CPU variants.  
Michael A. Heroux (Sandia National Laboratories)  Emerging Programming and Machine Models: Opportunities for Numerical Algorithms R&D 
Abstract: After 1520 years of architectural stability, we are in the midst of a dramatic change in high performance computing systems design. In this talk we discuss the commonalities across the viable systems of today, and look at opportunities for numerical algorithms research and development. In particular, we explore possible programming and machine abstractions and how we can develop effective algorithms based on these abstractions, addressing, among other things, robustness issues for preconditioned iterative methods and resilience of algorithms in the presence of soft errors.  
Michael A. Heroux (Sandia National Laboratories)  Exascale programming models Lind Hall 409 
Abstract: No Abstract  
Elizabeth R. Jessup (University of Colorado)  The Build to Order Compiler for Matrix Algebra Optimization 
Abstract: The performance of many high performance computing applications is limited by data movement from memory to the processor. Often their cost is more accurately expressed in terms of memory traffic rather than floatingpoint operations and, to improve performance, data movement must be reduced. One technique to reduce memory traffic is the fusion of loops that access the same data. We have built the Build to Order (BTO) compiler to automate the fusion of loops in matrix algebra kernels. Loop fusion often produces speedups proportional to the reduction in memory traffic, but it can also lead to negative effects in cache and register use. We present the results of experiments with BTO that help us to understand the workings of loop fusion.  
David E. Keyes (Columbia University)  The Exascale: Why and How 
Abstract: Sustained floatingpoint computation rates on real applications, as tracked by the ACM Gordon Bell Prize, increased by three orders of magnitude from 1988 (1 Gigaflop/s) to 1998 (1 Teraflop/s), and by another three orders of magnitude to 2008 (1 Petaflop/s). Computer engineering provided only a couple of orders of magnitude of improvement for individual cores over that period; the remaining factor came from concurrency, which is approaching one millionfold.Algorithmic improvements contributed meanwhile to making each flop more valuable scientifically. As the semiconductor industry now slips relative to its own roadmap for siliconbased logic and memory, concurrency, especially onchip manycore concurrency and GPGPU SIMDtype concurrency, will play an increasing role in the next few orders of magnitude, to arrive at the ambitious target of 1 Exaflop/s, extrapolated for 2018. An important question is whether today's best algorithms are efficiently hosted on such hardware and how much codesign of algorithms and architecture will be required.From the applications perspective, we illustrate eight reasons why today's computational scientists have an insatiable appetite for such performance: resolution, fidelity, dimension, artificial boundaries, parameter inversion, optimal control, uncertainty quantification, and the statistics of ensembles.The paths to the exascale summit are debated, but all are narrow and treacherous, constrained by fundamental laws of physics, cost, power consumption, programmability, and reliability. Drawing on recent reports, workshops, vendor projections, and experiences with scientific codes on contemporary platforms, we propose roles for today's researchers in one of the great global scientific quests of the next decade.  
David E. Keyes (Columbia University)  Lecture 1: Implications of the exascale roadmap for algorithms 
Abstract: The central challenge in progressing from petascale to exascale supercomputing is the same as that in progressing from gigascale to terascale personal computing: strong scaling within shared memory on a single node of up to 1K simultaneously active computational threads. Many issues in algorithmic design and implementation are identical in these two simultaneous quests; however, the exascale quest has additional challenges due to practical limits on total power consumption (which come at the expense of resilience and node performance uniformity), to systemscale reliability (due to more points of failure), and to the need to merge the onnode programming environment with a million others (a weak scaling that is not in itself difficult, but will lead to challenges of coordination). This lecture series presents the issues, as digested from recent US Department of Energy roadmapping exercises, and focuses attention on some new issues that require mathematical attention. It is intended to provide those new to exascale computing with a working background for the week ahead, and motivation for the GPU scientific programming unit of the tutorial.  
David E. Keyes (Columbia University)  Lecture 2: Implications of the exascale roadmap for algorithms 
Abstract: see abstract for Lecture 1  
Mark Kisin (Harvard University)  Points on Shimura varieties mod p 
Abstract: I will explain some results towards the LanglandsRapoport conjecture which predicts the structure of the mod p points of a Shimura variety. A consequence of the conjecture is that the isogeny class of every mod p point contains a point which admits a lifting to a special (ie CM) point of the Shimura variety. One of the roots of the subject is the work of John Tate on CM liftings and endomorphisms of abelian varieties mod p.  
Andreas Klöckner (New York University)  Highorder DG Wave Propagation on GPUs: Infrastructure, Implementation, Method Improvements 
Abstract: Having recently shown that highorder unstructured discontinuous Galerkin (DG) methods are a discretization method for systems of hyperbolic conservation laws that is wellmatched to execution on GPUs, in this talk I will explore both core and supporting components of highorder DG solvers for their suitability for and performance on modern, massively parallel architectures. Components examined range from software components facilitating implementation to strategies for automated tuning and, time permitting, numerical tweaks to the method itself. In concluding, I will present a selection of further design considerations and performance data.  
Matthew Gregg Knepley (University of Chicago)  GPU programming from higher level representations 
Abstract: We discuss the construction and execution of GPU kernels from higher level specifications. Examples will be shown using loworder finite elements and fast multipole method.  
Pawel Konieczny (University of Minnesota)  On dispersive effect of the Coriolis force for the stationary NavierStokes equations 
Abstract: The dispersive effect of the Coriolis force for the stationary and nonstationary NavierStokes equations is investigated. Existence of a unique stationary solution is shown for arbitrary large external force provided the Coriolis force is large enough. In addition to the stationary case, counterparts of several classical results for the nonstationary NavierStokes problem have been proven. The analysis is carried out in a new framework of the FourierBesov spaces.  
Hugo Leclerc (École Normale Supérieure de Cachan)  Global symbolic manipulations and code generation for Finite Elements on SIM[DT] hardware 
Abstract: Tools have been developed to generate code to solve partial differential equations from high level descriptions (manipulation of files, global operators, ...). The successive symbolic transformations lead to a macroscopic description of the code to be executed, which can thus be translated into x86 (SSEx), C++ or cuda code. The point emphasized here is that the different processes can be adapted to the target hardware, taking into account the ratio gflops / gbps (making e.g. the choice between recomputations or cache), the SIM[DT] abilities, ... The poster will present the gains (compared to classical CPU/GPU implementations) for two implementation of a 3D unstructured FEM solver,using respectively a conjugate gradient and a domain decomposition method with repetitive patterns.  
Miriam Leeser (Northeastern University)  The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs or How to Avoid the Heroics of GPU Programming 
Abstract: We live in the age of heroic programming for scientific applications on Graphics Processing Units (GPUs). Typically a scientist chooses an application to accelerate and a target platform, and through great effort maps their application to that platform. If they are a true hero, they achieve two or three orders of magnitude speedup for that application and target hardware pair. The effort required includes a deep understanding of the application, its implementation and the target architecture. When a new, higher performance architecture becomes available additional heroic acts are required. There is another group of scientists who prefer to spend their time focused on the application level rather than lower levels. These scientists would like to use GPUs for their applications, but would prefer to have parameterized library components available that deliver high performance without requiring heroic efforts on their part. The library components should be easy to use and should support a wide range of user input parameters. They should exhibit good performance on a range of different GPU platforms, including future architectures. Our research focuses on creating such libraries. We have been investigating parameterized library components for use with Matlab/Simulink and with the SCIRun Biomedical Problem Solving Environment from the University of Utah. In this talk I will discuss our library development efforts and challenges to achieving high performance across a range of both application and architectural parameters. I will also focus on issues that arise in achieving correct behavior of GPU kernels. One issue is correct behavior with respect to thread synchronization. Another is knowing whether or not your scientific application that uses floating point is correct when the results differ depending on the target architecture and order of computation.  
Miriam Leeser (Northeastern University)  GPU Acceleration in a Modern Problem Solving Environment: SCIRun's Linear System Solvers 
Abstract: This research demonstrates the incorporation of GPU's parallel processing architecture into the SCIRun biomedical problem solving environment with minimal changes to the environment or user experience. SCIRun, developed at the University of Utah, allows scientists to interactively construct many different types of biomedical simulations. We use this environment to demonstrate the effectiveness of the GPU by accelerating time consuming algorithms present in these simulations. Specifically, we target the linear solver module, which contains multiple solvers that benefit from GPU hardware. We have created a class to accelerate the conjugate gradient, Jacobi and minimal residual linear solvers; the results demonstrate that the GPU can provide acceleration in this environment. A principal focus was to remain transparent by retaining the user friendly experience to the scientist using SCIRun's graphical user interface. NVIDIA's CUDA C language is used to enable performance on NVIDIA GPUs. Challenges include manipulating the sparse data processed by these algorithms and communicating with the SCIRun interface amidst computation. Our solution makes it possible to implement GPU versions of the existing SCIRun algorithms easily and can be applied to other parallel algorithms in the application. The GPU executes the matrix and vector arithmetic to achieve acceleration performance of up to 16x on the algorithms in comparison to SCIRun's existing multithreaded CPU implementation. The source code will contain single and double precision versions to utilize a wide variety of GPU hardware and will be incorporated and publicly available in future versions of SCIRun.  
David Mayhew (Advanced Micro Devices)  I See GPU Shapes in the Clouds 
Abstract: Fusion (the integration of CPU and GPU into a single processing entity) is here. Cloud based software services are here. Large processing clusters are running massively parallel Hadoop programs now. Can largescale, commercial, enterprise, server solutions be dynamically repurposed to run HPC problem sets? The future of HPC may well be a massive set of virtual machines running in "curve of the earth" sized data centers. The cost of HPC processing sponges (HPC problem sets that consume otherwise wasted processing cycles in scaleout server clusters) will probably make all but the most extreme purposebuilt HPC systems obsolete.  
Dan Negrut (University of WisconsinMadison)  Large Scale Frictional Contact Dynamics on the GPU 
Abstract: This talk summarizes an effort at the Modeling, Simulation and Visualization Center at the University of WisconsinMadison to model and simulate large scale discrete dynamics problems. This effort is motivation by a desire to address unsolved challenges posed by granular dynamics problems, mobility of tracked and wheeled vehicle on granular terrain, and digging into granular material, to name a few. In the context of simulating the dynamics of large systems of interacting rigid bodies, we briefly outline a method for solving large cone complementarity problems by means of a fixedpoint iteration algorithm. The method is an extension of the GaussJacobi algorithms with overrelaxation for symmetric convex complementarity problems. Convergent under fairly standard assumptions, the method is implemented in a scalable parallel computational framework by using a single instruction multiple data (SIMD) execution paradigm supported by the Compute Unified Device Architecture (CUDA) library for programming on the graphical processing unit (GPU). The simulation framework developed supports the analysis of problems with more than one million rigid bodies that interact through contact and friction forces, and whose dynamics are constrained by either unilateral or bilateral kinematic constraints. Simulation thus becomes a viable tool for investigating in the near future the dynamics of complex systems such as the Mars Rover operating on granular terrain, powder composites, and granular material flow. The talk concludes with a short summary of other applications that stand to benefit from the computational power available on today’s GPUs.  
Carl Pomerance (Dartmouth College)  Elliptic curves: problems and applications 
Abstract: In the past three decades there have been some exciting applications of elliptic curves over finite fields to integer factoring, primality testing, and cryptography. These applications in turn have raised some interesting problems often of an unconventional flavor. For example, how often is the order of an elliptic curve group prime, or how often does it have all small prime factors? In this talk we will visit problems such as these, as well as other analytictype problems relating to ranks of elliptic curves over function fields and to elliptic divisibility sequences.  
Bjorn Poonen (Massachusetts Institute of Technology)  Random maximal isotropic subspaces and Selmer groups 
Abstract: We show that the pSelmer group of an elliptic curve is naturally the intersection of two maximal isotropic subspaces in an infinitedimensional locally compact quadratic space over F_p. By modeling this intersection as the intersection of a random maximal isotropic subspace with a fixed compact open maximal isotropic subspace, we can explain the known phenomena regarding distribution of Selmer ranks, such as the theorems of HeathBrown, SwinnertonDyer, and Kane for 2Selmer groups in certain families of quadratic twists, and the average size of 2 and 3Selmer groups as computed by Bhargava and Shankar. The only distribution on MordellWeil ranks compatible with both our random model and Delaunay's heuristics for Sha[p] is the distribution in which 50% of elliptic curves have rank 0, and 50% have rank 1. We generalize many of our results to abelian varieties over global fields. This is joint work with Eric Rains.  
Cristian D Popescu (University of California, San Diego)  An equivariant main conjecture in Iwasawa theory and applications 
Abstract: I will discuss the statement and proof of an Equivariant Main Conjecture (EMC) in the Iwasawa theory of arbitrary global fields. This will be followed by applications of the EMC (via Iwasawa codescent) towards proving various well known conjectures on special values of global Lfunctions. In the process, an important role will be played by an explicit construction of elladic Tate sequences. This is based on joint work with Cornelius Greither (Munich).  
Michel Raynaud (Université de Paris XI (ParisSud))  Permanence following Temkin 
Abstract: If we specialize algebraic equations having good properties, we usually face degeneracies. Starting with a bad specialization, we can try to improve it , performing modifications under control. If we succeed to get a new specialization with the initial good properties preserved,we get a permanence statement. We shall present examples of permanence with particular interest concerning semistable models.  
Fernando Rodriguez Villegas (University of Texas at Austin)  On the geometry of character varieties 
Abstract: We know, thanks to the Weil conjectures, that counting points of varieties over finite fields yields purely topological information about them. In this talk I will first describe how we may count the number of points over finite fields on the character varieties parameterizing certain representations of the fundamental group of a Riemann surface into GL_n. The calculation involves an array of techniques from combinatorics to the representation theory of finite groups of Lie type. I will then discuss the geometric implications of this computation and the conjectures it has led to. This is joint work with T. Hausel and E. Letellier  
Karl Rubin (University of California, Irvine)  Selmer ranks of elliptic curves in families of quadratic twists 
Abstract: In joint work with Barry Mazur, we investigate the 2Selmer rank in families of quadratic twists of elliptic curves over arbitrary number fields. We give sufficient conditions for an elliptic curve to have twists of arbitrary 2Selmer rank, and we give lower bounds for the number of twists (with bounded conductor) with a given 2Selmer rank. As a consequence, under appropriate hypotheses there are many twists with MordellWeil rank zero, and (assuming the ShafarevichTate conjecture) many others with MordellWeil rank one. Another application of our methods, using ideas of Poonen and Shlapentokh, is that if the ShafarevichTate conjecture holds then Hilbert's 10th problem has a negative answer over the ring of integers of any number field.  
Nayda G. Santiago (University of Puerto Rico)  Hyperspectral Image Analysis for Abundance Estimation using GPUs 
Abstract: Hyperspectral images can be used for abundance estimation and anomaly detection, however, the algorithms involved tend to be I/O intensive. Parallelizing these algorithms can enable their use in realtime applications. A method of overcoming these limitations involves selecting parallelizable algorithms and implementing them using GPUs. GPUs are designed as throughput engines, built to process large amounts of dense data in a parallel fashion. RX's detectors and estimators of abundance will be parallelized and tested for correctness and performance.  
Fadil Santosa (University of Minnesota)  Welcome to the IMA 
Abstract: No Abstract  
Olaf Schenk (Universität Basel)  A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures 
Abstract: Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore crucial in order to reduce the time to solution. However, in the current complex hardware microarchitectures, meticulous architecturespecific tuning is required to elicit the machine's full compute power. We present a code generation and autotuning framework PATUS for stencil computations targeted at multi and manycore processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the autotuning methodology to optimize strategydependent parameters for the given hardware architecture.  
Bertil Schmidt (Nanyang Technological University)  Algorithms and Tools for Bioinformatics on GPUs 
Abstract: The enormous growth of biological sequence data has caused bioinformatics to be rapidly moving towards a dataintensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of parallel accelerator technologies such as GPUs has made it possible to significantly reduce the execution times of many bioinformatics applications. In this talk I will present the design and implementation of scalable GPU algorithms based on the CUDA programming model in order to accelerate important bioinformatics applications. In particular, I will focus on algorithms and tools for nextgeneration sequencing (NGS) using error correction as an example.Detection and correction of sequencing errors is an important but timeconsuming preprocessing step for denovo genome assembly or read mapping. In this talk, I discuss the parallel algorithm design used for the CUDAEC and DecGPU tools. I will also give an overview of other CUDAenabled tools developed by my research group.  
Mark J. Stock (Applied Scientific Research)  A GPUaccelerated Boundary Element Method and Vortex Particle Method 
Abstract: Vortex particle methods, when combined with multipoleaccelerated boundary element methods (BEM), become a complete tool for direct numerical simulation (DNS) of internal or external vortexdominated flows. In previous work, we presented a method to accelerate the vorticityvelocity inversion at the heart of vortex particle methods by performing a multipole treecode Nbody method on parallel graphics hardware. The resulting method achieved a 17fold speedup over a dualcore CPU implementation. In the present work, we will demonstrate both an improved algorithm for the GPU vortex particle method that outperforms an 8core CPU by a factor of 43, but also a GPUaccelerated multipole treecode method for the boundary element solution. The new BEM solves for the unknown source, dipole, or combined strengths over a triangulated surface using all available CPU cores and GPUs. Problems with up to 1.4 million unknowns can be solved on a single commodity desktop computer in one minute, and at that size the hybrid CPU/GPU outperforms a quadcore CPU alone by 22.5 times. The method is exercised on DNS of impulsivelystarted flow over spheres at Re=500, 1000, 2000, and 4000.  
Mark J. Stock (Applied Scientific Research)  Algorithmic Fluid Art – Influences, Process, and Works 
Abstract: In addition to my research into vortex particle methods, parallel Nbody methods, and GPU programming, I create artwork using these same computer programs. The work consists of imagery and animations of fluid forms and other shapes and patterns in nature. Using relatively simple algorithms reflecting the origins of their underlying processes, many of these patterns can be recreated and their inherent beauty exposed. In this talk, I will discuss the technical aspects of my work, but mainly plan to distract attention with the works themselves.Biography:Mark Stock earned his PhD from Aerospace Engineering at the University of Michigan in 2006, and has been working for Applied Scientific Research in Santa Ana, CA since then. He has been creating computer imagery and numerical simulations for over 25 years, and started exhibiting his artwork in 2001.  
Robert Strzodka (MaxPlanckInstitut für Informatik)  Everyday Parallelism 
Abstract: Parallelism is largely seen as a necessary evil to cope with the power restrictions on a chip and most programmers would prefer to continue writing sequential programs rather than dealing with the alien and errorprone parallel programming. This talk will question this view and point out how the allegedly unfamiliar parallel processing is utilized by millions of people everyday. Parallelism appears as a course only when looking at it from the crooked illusion of sequential processing. Admittedly, there are critical decisions associated with specialization, data movement or synchronization, but we also have lots of experience in taking them because they are performed everyday. Presented results will demonstrate that the drawn analogies are not just theoretic.  
Keita Teranishi (CRAY Inc)  LocallySelfConsistent MultipleScattering code (LSMS) for GPUs 
Abstract: LocallySelfConsistent MultipleScattering (LSMS) is one of the major petascale applications and highly tuned for supercomputer systems like Cray XT5 Jaguar. We present our recent effort on porting and tuning the major computational routine of LSMS to GPU based systems to demonstrate the feasibility of LSMS beyond petaflops. In particular, we discuss the techniques, including autotuning of dense matrix kernels and computationcommunication overlap.  
Jonas Tölke (Ingrain)  Lattice Boltzmann MultiPhase Simulations in Porous Media using GPUs 
Abstract: We present a very efficient implementation of a multiphase lattice Boltzmann methods (LBM) based on CUDA. This technology delivers significant benefits for predictions of properties in rocks. The simulator on NVIDIA hardware enables us to perform pore scale multiphase (oilwatermatrix) simulations in natural porous media and to predict important rock properties like absolute permeability, relative permeabilites, and capillary pressure. We will show videos of these simulations in complex real world porous media and rocks.  
Jonas Tölke (Ingrain)  Digital rocks physics: fluid flow in rocks 
Abstract: We show how Ingrain's digital rock physics technology works to predict fluid flow properties in rocks. NVIDIA CUDA technology delivers significant acceleration for this technology. The simulator on NVIDIA hardware enables us to perform pore scale multiphase (oilwatermatrix) simulations in natural porous media and to predict important rock properties like absolute permeability, relative permeabilites, and capillary pressure.  
Ulrike Meier Yang (Lawrence Livermore National Laboratory)  Preparing Algebraic Multigrid for Exascale 
Abstract: Algebraic Multigrid (AMG) solvers are an essential component of many largescale scientific simulation codes. Their continued numerical scalability and efficient implementation is critical for preparing these codes for exascale. Our experiences on modern multicore machines show that significant challenges must be addressed for AMG to perform well on such machines. We discuss our experiences and describe the techniques we have used to overcome scalability challenges for AMG on hybrid architectures in preparation for exascale.  
Rio Yokota (Boston University)  Fast Multipole Methods on large cluster of GPUs 
Abstract: The combination of algorithmic acceleration and hardware acceleration can have tremendous impact. The FMM is a fast algorithm for calculating matrix vector multiplications in O(N) time, and it runs very fast on GPUs. Its combination of high degree of parallelism and O(N) complexity make it an attractive solver for the Petascale and Exascale era. It has a wide range of applications, e.g. quantum mechanics, molecular dynamics, electrostatics, acoustics, structural mechanics, fluid mechanics, and astrophysics. 
Adebisi Agboola  University of California, Santa Barbara  1/2/2011  1/6/2011 
Douglas N. Arnold  University of Minnesota  9/1/2010  6/30/2011 
Gerard Michel Awanou  Northern Illinois University  9/1/2010  6/10/2011 
Hasan Babaei  Auburn University  1/9/2011  1/14/2011 
Matthew Baker  Georgia Institute of Technology  1/3/2011  1/5/2011 
Nusret Balci  University of Minnesota  9/1/2009  8/31/2011 
Lorena A. Barba  Boston University  1/9/2011  1/15/2011 
Hyman Bass  University of Michigan  1/2/2011  1/5/2011 
Alexander A. Beilinson  University of Chicago  1/2/2011  1/5/2011 
Jonathan Bentz  CRAY Inc  1/10/2011  1/14/2011 
John Ferderick Bergdall  Brandeis University  1/3/2011  1/5/2011 
Laurent Berger  École Normale Supérieure de Lyon  1/1/2011  1/6/2011 
Vladimir Berkovich  Weizmann Institute of Science  1/2/2011  1/6/2011 
Manjul Bhargava  Princeton University  1/2/2011  1/5/2011 
Alexander Borisov  University of Pittsburgh  1/2/2011  1/6/2011 
Nigel Boston  University of WisconsinMadison  1/3/2011  1/5/2011 
Susanne C. Brenner  Louisiana State University  9/1/2010  6/10/2011 
Richard C. Brower  Boston University  1/9/2011  1/13/2011 
Armand Brumer  Fordham University  1/2/2011  1/5/2011 
Joe Buhler  Center for Communications Research  1/2/2011  1/5/2011 
Gregory Scott Call  Amherst College  1/2/2011  1/5/2011 
Cris Cecka  Stanford University  1/8/2011  1/14/2011 
Aycil Cesmelioglu  University of Minnesota  9/30/2010  8/30/2011 
Byungchul Cha  Muhlenberg College  1/2/2011  1/5/2011 
Chi Hin Chan  University of Minnesota  9/1/2009  8/31/2011 
Jung Hee Cheon  Seoul National University  1/3/2011  1/5/2011 
Ionut CiocanFontanine  University of Minnesota  1/3/2011  1/5/2011 
Dustin Clausen  Massachusetts Institute of Technology  1/2/2011  1/5/2011 
Bernardo Cockburn  University of Minnesota  9/1/2010  6/30/2011 
Jonathan M. Cohen  NVIDIA Corporation  1/9/2011  1/14/2011 
Jintao Cui  University of Minnesota  8/31/2010  8/30/2011 
Eric Felix Darve  Stanford University  1/8/2011  1/9/2011 
Clint Dawson  University of Texas at Austin  1/30/2011  2/5/2011 
Jack J. Dongarra  University of Tennessee  1/9/2011  1/11/2011 
Geir Ellingsrud  University of Oslo  1/2/2011  1/6/2011 
Anne C. Elster  Norwegian University of Science and Technology (NTNU)  1/9/2011  1/15/2011 
Allan Peter EngsigKarup  Technical University of Denmark  1/9/2011  1/14/2011 
Carl Erickson  Harvard University  1/2/2011  1/5/2011 
Selim Esedoglu  University of Michigan  1/20/2011  6/10/2011 
Randy H. Ewoldt  University of Minnesota  9/1/2009  8/31/2011 
Liwu Fan  Auburn University  1/9/2011  1/14/2011 
Oscar E. Fernandez  University of Minnesota  8/31/2010  8/30/2011 
Daniel Flath  Macalester College  1/3/2011  1/5/2011 
JeanMarc Fontaine  Université de Paris XI (ParisSud)  1/2/2011  1/8/2011 
Geoffrey Charles Fox  Indiana University  1/12/2011  1/14/2011 
Martin J. Gander  Universite de Geneve  1/9/2011  1/15/2011 
Paul Garrett  University of Minnesota  1/3/2011  1/5/2011 
Gaurav Gaurav  University of Minnesota  1/9/2011  1/14/2011 
Toby Gee  Northwestern University  1/3/2011  1/5/2011 
Mike Giles  University of Oxford  1/8/2011  1/14/2011 
Dominik Göddeke  Universität Dortmund  1/8/2011  1/15/2011 
Edray Herber Goins  Purdue University  1/3/2011  1/5/2011 
Jay Gopalakrishnan  University of Florida  9/1/2010  6/30/2011 
Vincent John Graziano  CRAY Inc  1/9/2011  1/9/2011 
Leopold Grinberg  Brown University  1/10/2011  1/15/2011 
Bobby Grizzard  University of Texas at Austin  1/3/2011  1/5/2011 
Benedict Gross  Harvard University  1/2/2011  1/5/2011 
Shiyuan Gu  Louisiana State University  9/1/2010  6/30/2011 
Joseph Gunther  University of Texas at Austin  1/2/2011  1/5/2011 
Ren Guo  University of Minnesota  1/3/2011  1/5/2011 
Thomas C. Hales  University of Pittsburgh  1/2/2011  1/5/2011 
Michael A. Heroux  Sandia National Laboratories  1/9/2011  1/14/2011 
Wei Ho  Columbia University  1/2/2011  1/5/2011 
Yulia Hristova  University of Minnesota  9/1/2010  8/31/2011 
Luc Illusie  Université de Paris XI (ParisSud)  1/2/2011  1/6/2011 
Elizabeth R. Jessup  University of Colorado  1/8/2011  1/14/2011 
Dihua Jiang  University of Minnesota  1/3/2011  1/5/2011 
Jennifer JohnsonLeung  University of Idaho  1/2/2011  1/5/2011 
John Jones  Arizona State University  1/2/2011  1/6/2011 
Nick Katz  Princeton University  1/2/2011  1/5/2011 
Dinesh Kaushik  Argonne National Laboratory  1/11/2011  1/14/2011 
Markus Keel  University of Minnesota  7/21/2008  6/30/2011 
David E. Keyes  Columbia University  1/8/2011  1/13/2011 
Mark Kisin  Harvard University  1/2/2011  1/5/2011 
Andreas Klöckner  New York University  1/9/2011  1/14/2011 
Matthew Gregg Knepley  University of Chicago  1/9/2011  1/15/2011 
Pawel Konieczny  University of Minnesota  9/1/2009  8/31/2011 
Kenneth Kramer  Queens College, CUNY  1/2/2011  1/5/2011 
Hugo Leclerc  École Normale Supérieure de Cachan  1/9/2011  1/14/2011 
Miriam Leeser  Northeastern University  1/9/2011  1/14/2011 
Gilad Lerman  University of Minnesota  9/1/2010  6/30/2011 
Hengguang Li  University of Minnesota  8/16/2010  8/15/2011 
Lizao (Larry) Li  University of Minnesota  1/3/2011  1/5/2011 
Peng Li  University of Minnesota  1/9/2011  1/9/2011 
David J. Lilja  University of Minnesota  1/10/2011  1/14/2011 
Zhi (George) Lin  University of Minnesota  9/1/2009  8/31/2011 
Baiying Liu  University of Minnesota  1/3/2011  1/5/2011 
Jonathan Lubin  Brown University  1/1/2011  1/6/2011 
Benjamin E Lundell  Cornell University  1/3/2011  1/6/2011 
Mitchell Luskin  University of Minnesota  9/1/2010  6/30/2011 
Chris Lyons  University of Michigan  1/2/2011  1/4/2011 
Kara Lee Maki  University of Minnesota  9/1/2009  8/31/2011 
Yu (David) Mao  University of Minnesota  8/31/2010  8/30/2011 
David Mayhew  Advanced Micro Devices  1/14/2011  1/14/2011 
William McCallum  University of Arizona  1/3/2011  1/5/2011 
Lois Curfman McInnes  Argonne National Laboratory  1/10/2011  1/13/2011 
William Messing  University of Minnesota  1/3/2011  1/5/2011 
Ina Mette  American Mathematical Society  1/2/2011  1/5/2011 
Irina Mitrea  University of Minnesota  8/16/2010  6/14/2011 
Dimitrios Mitsotakis  University of Minnesota  10/27/2010  8/31/2011 
Kevin Mugo  Purdue University  1/3/2011  1/5/2011 
Gregg Musiker  University of Minnesota  1/3/2011  1/5/2011 
Dan Negrut  University of WisconsinMadison  1/9/2011  1/14/2011 
Switala Nicholas  University of Minnesota  1/3/2011  1/5/2011 
Sylvain Nintcheu Fata  Oak Ridge National Laboratory  11/1/2010  1/29/2011 
Andrew Odlyzko  University of Minnesota  1/3/2011  1/5/2011 
Peter J. Olver  University of Minnesota  1/3/2011  1/5/2011 
Alexandra Ortan  University of Minnesota  9/16/2010  6/15/2011 
Cecilia OrtizDuenas  University of Minnesota  9/1/2009  8/31/2011 
Miguel Pauletti  Texas A & M University  1/8/2011  1/14/2011 
Carl Pomerance  Dartmouth College  1/2/2011  1/5/2011 
Bjorn Poonen  Massachusetts Institute of Technology  1/2/2011  1/5/2011 
Cristian D Popescu  University of California, San Diego  1/2/2011  1/5/2011 
Weifeng (Frederick) Qiu  University of Minnesota  8/31/2010  8/30/2011 
Vincent QuennevilleBelair  University of Minnesota  9/16/2010  6/15/2011 
Varun Ramesh  University of Minnesota  1/9/2011  1/14/2011 
Wayne Raskind  Arizona State University  1/2/2011  1/5/2011 
Michel Raynaud  Université de Paris XI (ParisSud)  1/2/2011  1/6/2011 
Fernando Reitich  University of Minnesota  9/1/2010  6/30/2011 
Kenneth A. Ribet  University of California, Berkeley  1/2/2011  1/5/2011 
Eric Riedl  Harvard University  1/2/2011  1/6/2011 
David Peter Roberts  University of Minnesota  1/2/2011  1/5/2011 
Fernando Rodriguez Villegas  University of Texas at Austin  1/2/2011  1/5/2011 
Michael I. Rosen  Brown University  1/2/2011  1/5/2011 
Jeffrey A. Rosoff  Gustavus Adolphus College  1/2/2011  1/5/2011 
Karl Rubin  University of California, Irvine  1/2/2011  1/5/2011 
Hakizumwami Birali Runesha  University of Minnesota  1/9/2011  1/14/2011 
Yousef Saad  University of Minnesota  1/10/2011  1/14/2011 
David J Saltman  Institute for Defense Analyses (IDA)  1/2/2011  1/5/2011 
Nayda G. Santiago  University of Puerto Rico  1/9/2011  1/15/2011 
Fadil Santosa  University of Minnesota  7/1/2008  6/30/2011 
Olaf Schenk  Universität Basel  1/8/2011  1/15/2011 
Bertil Schmidt  Nanyang Technological University  1/9/2011  1/15/2011 
Anthony Scudiero  CRAY Inc  1/10/2011  1/14/2011 
Shankar Sen  Cornell University  1/2/2011  1/5/2011 
Chehrzad Shakiban  University of Minnesota  1/3/2011  1/5/2011 
Shuanglin Shao  University of Minnesota  9/1/2009  8/31/2011 
Stephen S. Shatz  University of Pennsylvania  1/2/2011  1/5/2011 
Alice Silverberg  University of California, Irvine  1/2/2011  1/5/2011 
Ethan Smith  Michigan Technological University  1/2/2011  1/5/2011 
Steven Sperber  University of Minnesota  1/3/2011  1/5/2011 
Harold M. Stark  University of California, San Diego  1/2/2011  1/5/2011 
William A. Stein  University of Washington  1/2/2011  1/5/2011 
Panagiotis Stinis  University of Minnesota  9/1/2010  6/30/2011 
Mark J. Stock  Applied Scientific Research  1/8/2011  1/14/2011 
Michael Stopa  Harvard University  1/9/2011  1/14/2011 
Allan Struthers  Michigan Technological University  1/9/2011  1/15/2011 
Robert Strzodka  MaxPlanckInstitut für Informatik  1/8/2011  1/14/2011 
Liyeng Sung  Louisiana State University  9/1/2010  6/15/2011 
Habiballah Talavatifard  Texas A & M University  1/8/2011  1/14/2011 
Nicolae Tarfulea  Purdue University, Calumet  9/1/2010  6/15/2011 
John Tate  University of Texas at Austin  1/1/2011  1/9/2011 
Jeremy Teitelbaum  University of Connecticut  1/2/2011  1/5/2011 
Keita Teranishi  CRAY Inc  1/9/2011  1/14/2011 
Dinesh S. Thakur  University of Arizona  1/2/2011  1/5/2011 
Jonas Tölke  Ingrain  1/9/2011  1/14/2011 
Dimitar Trenev  University of Minnesota  9/1/2009  8/31/2011 
Zohra Tridane  University of Minnesota  1/10/2011  1/14/2011 
Jerrold Tunnell  Rutgers University  1/2/2011  1/5/2011 
Douglas Ulmer  Georgia Institute of Technology  1/3/2011  1/5/2011 
Jeffrey Vaaler  University of Texas at Austin  1/2/2011  1/5/2011 
MarieFrance Vigneras  Institut de Mathematiques de Jussieu  1/1/2011  1/5/2011 
Vasily Volkov  University of California, Berkeley  1/9/2011  1/14/2011 
Jose Felipe Voloch  University of Texas at Austin  1/2/2011  1/6/2011 
Jamie Emmanuel Weigandt  Purdue University  1/3/2011  1/6/2011 
Melanie Matchett Wood  Stanford University  1/2/2011  1/5/2011 
Ulrike Meier Yang  Lawrence Livermore National Laboratory  1/10/2011  1/14/2011 
Yimu Yin  University of Pittsburgh  1/2/2011  1/5/2011 
Rio Yokota  Boston University  1/11/2011  1/15/2011 
Qing Chaney Zhang  Ohio State University  1/2/2011  1/6/2011 
Shuxia Zhang  University of Minnesota  1/10/2011  1/14/2011 
Xudong Zheng  University of Illinois  1/2/2011  1/5/2011 