Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, and Richard Vuduc.
Hardware and software prefetching mechanisms for GPGPU
applications.
In Proc. IEEE/ACM Int'l. Symp. Microarchitecture (MICRO),
Atlanta, GA, USA, December 2010.
(accepted).
→ BibTeX
Abtin Rahimian, Ilya Lashuk, Aparna Chandramowlishwaran, Dhairya Malhotra,
Logan Moon, Rahul Sampath, Aashay Shringarpure, Shravan Veerapaneni, Jeffrey
Vetter, Richard Vuduc, Denis Zorin, and George Biros.
Petascale direct numerical simulation of blood flow on 200k cores and
heterogeneous architectures.
In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA,
USA, November 2010.
(to appear).
Finalist, Gordon Bell Prize.
→ BibTeX, Topic: Add data for field: Topic
Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc.
Diagnosis, tuning, and redesign for multicore performance: A case
study of the fast multipole method.
In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA,
USA, November 2010.
(to appear).
→ PDF, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization
Richard Vuduc, Aparna Chandramowlishwaran, Jee Whan Choi, Murat Efe Guney, and
Aashay Shringarpure.
On the limits of GPU acceleration.
In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar),
Berkeley, CA, USA, June 2010.
→ PDF, BibTeX, Topics: multicore; GPGPU; sparse linear algebra; n-body
Sooraj Bhat, Ashish Agarwal, Alexander Gray, and Richard Vuduc.
Toward interactive statistical modeling.
Procedia Computer Science, 1(1):1829-1838, May-June 2010.
Proc. Int'l. Conf. Computational Science
(ICCS), Wkshp. Automated Program Generation for Computational Science
(APGCS).
→ PDF, BibTeX, Topics: machine learning; algorithm derivation; interactive modeling; type theory
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Performance evaluation of Concurrent Collections on
high-performance multicore computing systems.
Technical Report GT-CSE-10-01, Georgia Institute of Technology,
Atlanta, GA, USA, February 2010.
→ BibTeX, Topics: parallel programming models; dense linear algebra; multicore
Aparna Chandramowlishwaran, Samuel Williams, Leonid Oliker, Ilya Lashuk, George
Biros, and Richard Vuduc.
Optimizing and tuning the fast multipole method for state-of-the-art
multicore architectures.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Atlanta, GA, USA, April 2010.
→ PDF, BibTeX, Topics: n-body; multicore; performance analysis; performance optimization
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Performance evaluation of Concurrent Collections on
high-performance multicore computing systems.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Atlanta, GA, USA, April 2010.
Winner, Best Paper (software track).
→ PDF, BibTeX, Topics: CnC; parallel programming models; dense linear algebra; multicore
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Applying the Concurrent Collections programming model to
asynchronous parallel dense linear algebra.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel
Programming (PPoPP), Bangalore, India, January 2010.
(poster).
http://dx.doi.org/10.1145/1693453.1693506.
→ PDF, BibTeX, Topics: parallel programming models; dense linear algebra; multicore
Sangmin Park, Richard W. Vuduc, and Mary Jean Harrold.
FALCON: Fault localization for concurrent programs.
In Proc. ACM/IEEE Int'l. Conf. Software Eng., Cape Town, South
Africa, May 2010.
→ PDF, BibTeX, Topics: testing; debugging; fault-localization; concurrency
Jee Whan Choi, Amik Singh, and Richard W. Vuduc.
Model-driven autotuning of sparse matrix-vector multiply on GPUs.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel
Programming (PPoPP), Bangalore, India, January 2010.
http://dx.doi.org/10.1145/1693453.1693471.
→ PDF, BibTeX, Topics: sparse linear algebra; autotuning; GPGPU; performance modeling; performance optimization
Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas.
Effective source-to-source outlining to support whole program
empirical optimization.
In Proc. Int'l. Wkshp. Languages and Compilers for Parallel
Computing (LCPC), volume LNCS, Newark, DE, USA, October 2009.
http://www.osti.gov/bridge/product.biblio.jsp?osti_id=966918.
→ BibTeX, Topics: compilers; autotuning; outlining
Nitin Arora, Ryan P. Russell, and Richard W. Vuduc.
Fast sensitivity computations for numerical optimizations.
In Proc. AAS/AIAA Astrodynamics Specialist Conference, AAS
09-435, Pittsburgh, PA, USA, August 2009.
http://soliton.ae.gatech.edu/people/rrussell/FinalPublications/ConferencePapers/09AugAAS_09-392_p2pLowthrust.pdf.
→ PDF, BibTeX, Topics: numerical optimization; sensitivity; GPGPU; astrodynamics
Manisha Gajbe, Andrew Canning, John Shalf, Lin-Wang Wang, Harvey Wasserman, and
Richard Vuduc.
Optimization and auto-tuning of 3D FFTs on the Cray XT4.
In Proc. Cray User's Group (CUG) Meeting, Atlanta, GA, USA, May
2009.
→ BibTeX, Topics: autotuning; performance analysis; performance optimization; FFT
Sundaresan Venkatasubramanian and Richard W. Vuduc.
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU
platforms.
In Proc. ACM Int'l. Conf. Supercomputing (ICS), New York, NY,
USA, June 2009.
http://dx.doi.org/10.1145/1542275.1542312.
→ PDF, BibTeX, Topics: asynchronous iteration; GPGPU; heterogeneous architectures; performance optimization
Nitin Arora, Aashay Shringarpure, and Richard Vuduc.
Direct
-body kernels for multicore platforms.
In Proc. Int'l. Conf. Parallel Processing (ICPP), Vienna,
Austria, September 2009.
http://dx.doi.org/10.1109/ICPP.2009.71.
→ PDF, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization
Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh Nguyen,
Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin,
and George Biros.
A massively parallel adaptive fast multipole method on heterogeneous
architectures.
In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA,
November 2009.
Finalist, Best Paper.
http://doi.acm.org/10.1145/1654059.1654118.
→ PDF, BibTeX, Topics: n-body; multicore; GPGPU; MPI; parallel algorithms
Seunghwa Kang, David Bader, and Richard Vuduc.
Understanding the design trade-offs among current multicore systems
for numerical computations.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Rome, Italy, May 2009.
http://doi.ieeecomputersociety.org/10.1109/IPDPS.2009.5161055.
→ PDF, BibTeX, Topics: statistical models; n-body; multicore; performance analysis; performance optimization
Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine Yelick, and
James Demmel.
Optimizing sparse matrix-vector multiply on emerging multicore
platforms.
Parallel Computing (ParCo), 35(3):178-194, March 2009.
Extends conference version:
http://dx.doi.org/10.1145/1362622.1362674.
→ PDF, BibTeX, Topics: sparse linear algebra; multicore; autotuning
Aparna Chandramowlishwaran, Abhinav Karhu, Ketan Umare, and Richard Vuduc.
Numerical algorithms with tunable parallelism.
In Proc.Wkshp. Software Tools for Multicore Systems (STMCS), at
IEEE/ACM Int'l. Symp. Code Generation and Optimization (CGO), Boston, MA,
USA, April 2008.
http://people.csail.mit.edu/rabbah/conferences/08/cgo/stmcs/papers/vuduc-stmcs08.pdf.
→ PDF, BibTeX, Topics: autotuning; asynchronous variational integration; asynchronous iteration
Thomas Panas, Dan Quinlan, and Richard Vuduc.
Tool support for inspecting the code quality of HPC applications.
In Proc. Wkshp. Software Eng. for High-Performance Computing
Applications (SE-HPC), at ACM/IEEE Int'l. Conf. Software Eng. (ICSE),
Minneapolis, MN, USA, May 2007.
http://dx.doi.org/10.1109/SE-HPC.2007.8.
→ PDF, BibTeX, Topics: program visualization; software engineering
Thomas Panas, Dan Quinlan, and Richard Vuduc.
Analyzing and visualizing whole program architectures.
In Proc. Wkshp. Aerospace Software Engineering (AeroSE), at
ACM/IEEE Int'l. Conf. Software Eng. (ICSE), Minneapolis, MN, USA, May 2007.
Also: Lawrence Livermore National Laboratory
Technical Report UCRL-PROC-231453.
http://www.osti.gov/bridge/servlets/purl/909924-c8K5TR/909924.pdf.
→ PDF, BibTeX, Topics: program visualization; software engineering
Dan Quinlan, Richard Vuduc, and Ghassan Misherghi.
Techniques for specifying bug patterns.
In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing
and Debugging (PADTAD), at Int'l. Symp. Software Testing and Analysis
(ISSTA), Portland, ME, USA, July 2007.
Winner, Best Paper.
http://doi.acm.org/10.1145/1273647.1273654.
→ PDF, BibTeX, Topics: software security; compilers; debugging
Sam Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and
James Demmel.
Optimization of sparse matrix-vector multiplication on emerging
multicore platforms.
In Proc. ACM/IEEE Conf. Supercomputing (SC), 2007.
http://dx.doi.org/10.1145/1362622.1362674.
→ BibTeX, Topics: sparse linear algebra; autotuning; multicore; performance analysis; performance optimization
Rajesh Nishtala, Richard Vuduc, James W. Demmel, and Katherine A. Yelick.
When cache blocking sparse matrix vector multiply works and why.
Applicable Algebra in Engineering, Communication, and Computing:
Special Issue on Computational Linear Algebra and Sparse Matrix
Computations, March 2007.
→ BibTeX, Topics: sparse linear algebra; performance analysis; performance optimization
Qing Yi, Keith Seymour, Haihang You, Richard Vuduc, and Dan Quinlan.
POET: Parameterized Optimizations for Empirical Tuning.
In Proc. Wkshp. Performance Optimization of High-level Languages
and Libraries (POHLL), at IEEE Int'l. Par. Distrib. Processing Symp.
(IPDPS), pages 1-8, Long Beach, CA, USA, March 2007.
http://dx.doi.org/10.1109/IPDPS.2007.370637.
→ PDF, BibTeX, Topics: compilers; autotuning; program generation
Dan Quinlan, Markus Schordan, Richard Vuduc, and Qing Yi.
Annotating user-defined abstractions for optimization.
In Proc. Wkshp. Performance Optimization of High-level Languages
and Libraries (POHLL), at IEEE Int'l. Par. Distrib. Processing Symp.
(IPDPS), Rhodes, Greece, April 2006.
http://dx.doi.org/10.1109/IPDPS.2006.1639722.
→ BibTeX, Topic: compilers
Dan Quinlan, Richard Vuduc, Thomas Panas, Jochen Härdtlein, and Andreas
Sæbjørnsen.
Support for whole-program analysis and the verification of the
one-definition rule in C++.
In Proc. Static Analysis Summit (SAS), volume NIST Special
Publication 500-262, pages 27-35, 2006.
http://samate.nist.gov/docs/NIST_Special_Publication_500-262.pdf.
→ PDF, BibTeX, Topics: program analysis; C++; one-definition rule; software security; compilers
Richard Vuduc, Martin Schulz, Dan Quinlan, and Bronis de Supinski.
Improving distributed memory applications testing by message
perturbation.
In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing
and Debugging (PADTAD), at Int'l. Symp. Software Testing and Analysis
(ISSTA), Portland, ME, USA, July 2006.
Winner, Best Paper.
→ PDF, BibTeX, Topics: MPI; testing; debugging; irritators
Yuan Zhao, Qing Yi, Ken Kennedy, Dan Quinlan, and Richard Vuduc.
Parameterizing loop fusion for automated empirical tuning.
Technical Report UCRL-TR-217808, Center for Applied Scientific
Computing, Lawrence Livermore National Laboratory, California, USA, December
2005.
http://dx.doi.org/10.2172/890608.
→ BibTeX, Topics: compilers; autotuning
Dan Quinlan, Shmuel Ur, and Richard Vuduc.
An extensible open-source compiler infrastructure for testing.
In Proc. IBM Haifa Verification Conf. (VC), volume LNCS 3875,
pages 116-133, Haifa, Israel, November 2005. Springer Berlin / Heidelberg.
http://dx.doi.org/10.1007/11678779_9.
→ PDF, BibTeX, Topics: compilers; testing
Richard Vuduc, James W. Demmel, and Katherine A. Yelick.
OSKI: A library of automatically tuned sparse matrix kernels.
In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages
521-530, 2005.
http://dx.doi.org/10.1088/1742-6596/16/1/071.
→ BibTeX, Topics: sparse linear algebra; autotuning; performance optimization
Richard W. Vuduc and Hyun-Jin Moon.
Fast sparse matrix-vector multiplication by exploiting variable block
structure.
In Proc. High-Performance Computing and Communications Conf.
(HPCC), volume LNCS 3726, pages 807-816, Sorrento, Italy, September 2005.
Springer.
http://dx.doi.org/10.1007/11557654_91.
→ BibTeX, Topics: sparse linear algebra; autotuning; performance optimization
James Demmel, Jack Dongarra, Viktor Eijkhout, Erika Fuentes, Antoine Petitet,
Richard Vuduc, R. Clint Whaley, and Katherine Yelick.
Self-adapting linear algebra algorithms and software.
Proc. IEEE, 93(2):293-312, February 2005.
→ BibTeX, Topics: dense linear algebra; sparse linear algebra; autotuning
Benjamin C. Lee, Richard Vuduc, James Demmel, and Katherine Yelick.
Performance models for evaluation and automatic tuning of symmetric
sparse matrix-vector multiply.
In Proc. Int'l. Conf. Parallel Processing (ICPP), Montreal,
Canada, August 2004.
Winner, Best Paper.
http://dx.doi.org/10.1109/ICPP.2004.1327917.
→ BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Eun-Jin Im, Katherine Yelick, and Richard Vuduc.
SPARSITY: Optimization framework for sparse matrix kernels.
Int'l. J. High Performance Computing Applications (IJHPCA),
18(1):135-158, February 2004.
→ BibTeX, Topics: sparse linear algebra; autotuning; performance modeling; performance optimization
Richard W. Vuduc.
Automatic performance tuning of sparse matrix kernels.
PhD thesis, University of California, Berkeley, CA, USA, January
2004.
http://bebop.cs.berkeley.edu/pubs/vuduc2003-dissertation.pdf.
→ BibTeX, Topics: performance analysis; performance modeling; performance optimization; autotuning; sparse linear algebra; statistical models
Richard Vuduc, James Demmel, and Jeff Bilmes.
Statistical models for empirical search-based performance tuning.
Int'l. J. High Performance Computing Applications (IJHPCA),
18(1):65-94, 2004.
Extends conference version:
http://dx.doi.org/10.1007/3-540-45545-0_21.
→ BibTeX, Topics: statistical models; autotuning; survey; dense linear algebra; performance analysis
Richard Vuduc, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick.
Memory hierarchy optimizations and bounds for sparse
.
In Proc. Wkshp. Parallel Linear Algebra (PLA), at Int'l. Conf.
Computational Sci. (ICCS), volume LNCS 2659, pages 705-714, Melbourne,
Australia, June 2003. Springer Berlin / Heidelberg.
http://dx.doi.org/10.1007/3-540-44863-2_69.
→ BibTeX, Topics: sparse linear algebra; autotuning; performance modeling
Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh
Nishtala, and Benjamin Lee.
Performance optimizations and bounds for sparse matrix-vector
multiply.
In Proc. ACM/IEEE Conf. Supercomputing (SC), Baltimore, MD,
USA, November 2002.
Finalist, Best Student Paper.
http://portal.acm.org/citation.cfm?id=762822.
→ BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, and
Katherine A. Yelick.
Automatic performance tuning and analysis of sparse triangular solve.
In Proc. Wkshp. Performance Optimization of High-level Languages
and Libraries (POHLL), at ACM Int'l. Conf. Supercomputing (ICS), New York,
USA, June 2002.
Winner, Best Presentation; Winner, Best
Student Paper.
http://www.ece.lsu.edu/jxr/pohll-02/papers/vuduc.pdf.
→ BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Richard Vuduc, James W. Demmel, and Jeff A. Bilmes.
Statistical models for empirical search-based performance tuning.
In Proc. Int'l. Conf. Computational Science (ICCS), volume LNCS
2073, pages 117-126, San Francisco, CA, USA, May 2001. Springer Berlin /
Heidelberg.
Extends workshop version:
http://www.eecs.harvard.edu/~smith/fddo3/papers/107.ps.
http://dx.doi.org/10.1007/3-540-45545-0_21.
→ BibTeX, Topics: statistical models; autotuning; dense linear algebra
Richard Vuduc and James W. Demmel.
Code generators for automatic tuning of numerical kernels:
Experiences with FFTW.
In Proc. Semantics, Applications, and Implementation of Program
Generation (SAIG), at ACM SIGPLAN Int'l. Conf. Functional Programming
(ICFP), Montréal, Canada, September 2000.
http://dx.doi.org/10.1007/3-540-45350-4_14.
→ PDF, BibTeX, Topics: program generation; signal processing; autotuning; FFT
Richard Vuduc, James Demmel, and Jeff Bilmes.
Statistical modeling of feedback data in an automatic tuning system.
In Proc. ACM Wkshp. Feedback-Directed Dynamic Optimization
(FDDO), at Int'l. Symp. Microarchitecture (MICRO), Monterey, CA, USA,
December 2000.
Winner, Best Presentation.
http://www.eecs.harvard.edu/~smith/fddo3/papers/107.ps.
→ BibTeX, Topics: dense linear algebra; statistical models; autotuning; performance modeling
Danyel Fisher, Kris Hildrum, Jason Hong, Mark Newman, Megan Thomas, and Richard
Vuduc.
SWAMI: A framework for collaborative filtering algorithm
development and evaluation.
In Proc. ACM Conf. Research and Development in Information
Retrieval (SIGIR), pages 366-368, Athens, Greece, July 2000.
(poster).
http://dx.doi.org/10.1145/345508.345658.
→ PDF, BibTeX, Topic: collaborative filtering
E. Jason Riedy and Richard Vuduc.
Microbenchmarking the Tera MTA.
http://vuduc.org/pubs/riedy99-tera-report.pdf, May 1998.
→ PDF, BibTeX, Topics: multithreaded architectures; benchmarking; performance analysis
Bohdan Balko, Irvin W. Kay, Richard Vuduc, and John W. Neuberger.
Recovery of superfluorescence in inhomogeneously broadened systems
through rapid relaxation.
Phys. Rev. B, 55(18):12079-12085, May 1997.
→ PDF, BibTeX, Topic: gamma-ray lasers
Bohdan Balko, Irvin W. Kay, James D. Silk, Richard Vuduc, and John W.
Neuberger.
Superfluorescence in the presence of inhomogeneous broadening.
Hyperfine Interactions: Special Issue on the Gamma-Ray Laser,
107(1-4):369-379, June 1997.
→ BibTeX, Topic: gamma-ray lasers
Bohdan Balko, Irvin Kay, Richard Vuduc, and John Neuberger.
An investigation of the possible enhancement of nuclear
superfluorescence.
In Proc. Lasers '95, page 308, 1996.
→ BibTeX, Topic: gamma-ray lasers
Automatically created on Mon Aug 30 18:27:20 2010 by yab2web.