I broke my write-many read-never perl script. (Curse you, modern versions of perl!) However, my CV is more current so please check that out instead.
Kenneth Czechowski, Victor W. Lee, Ed Grochowski, Ronny Ronen, Ronak Singhal,
Pradeep Dubey, and Richard Vuduc.
Improving the energy efficiency of big cores.
In Proc. ACM/IEEE Int'l. Symp. on Computer Architecture (ISCA),
Minneapolis, MN, USA, June 2014.
(accepted).
→ BibTeX
Sangmin Park, Richard Vuduc, and Mary Jean Harrold.
UNICORN: a unified approach for localizing non-deadlock
concurrency bugs.
Software Testing, Verification, and Reliability (Softw. Test.
Verif. Reliab.), 2014.
doi:10.1002/stvr.1523.
→ DOI, BibTeX
Jee Choi, Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc.
A CPU-GPU hybrid implementation and model-driven scheduling of
the fast multipole method.
In Proc. 7th Wkshp. on General-purpose Processing using GPUs
(GPGPU-7), Salt Lake City, UT, USA, March 2014. ACM.
http://www.ece.neu.edu/groups/nucar/GPGPU/GPGPU7/.
→ PDF, BibTeX, Topics: performance analysis; performance evaluation; performance modeling; performance programming; GPGPU
Jee Choi, Marat Dukhan, Xing Liu, and Richard Vuduc.
Algorithmic time, energy, and power on candidate HPC compute
building blocks.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Phoenix, AZ, USA, May 2014.
(to appear).
→ PDF, BibTeX, Topics: energy and power; co-design; performance analysis; performance modeling; performance evaluation
Piyush Sao and Richard Vuduc.
Self-stabilizing iterative solvers.
In Proc. 4th Wkshp. Latest Advances in Scalable Algorithms for
Large-scale Systems (ScalA), Denver, CO, USA, November 2013.
http://www.csm.ornl.gov/srt/conferences/Scala/2013/.
→ BibTeX, Topics: exascale; fault-tolerance; numerical algorithms
Marat Dukhan and Richard Vuduc.
Methods for high-throughput computation of elementary functions.
In Proc. 10th Int'l. Conf. Parallel Processing and Applied
Mathematics (PPAM), September 2013.
→ BibTeX, Topics: numerical algorithms; microarchitecture; performance programming
Sangmin Park, Mary Jean Harrold, and Richard Vuduc.
Griffin: Grouping suspicious memory-access patterns to improve
understanding of concurrency bugs.
In Proc. Int'l. Symp. Software Testing and Analysis (ISSTA),
Lugano, Switzerland, July 2013.
→ PDF, BibTeX, Topics: testing and debugging; fault localization; concurrency bugs
Agata Rozga, Tricia Z. King, Richard W. Vuduc, and Diana L. Robins.
Undifferentiated facial electromyography responses to dynamic,
audio-visual emotion displays in individuals with autism spectrum disorders.
Developmental Science, 2013.
doi:10.1111/desc.12062.
→ DOI, BibTeX, Topics: autism; EMG; emotion; psychology
Jee Choi, Dan Bedard, Rob Fowler, and Richard Vuduc.
A roofline model of energy.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Boston, MA, USA, May 2013.
This paper is a short peer-reviewed
conference version of the following technical report:
https://smartech.gatech.edu/xmlui/handle/1853/45737.
doi:10.1109/IPDPS.2013.77.
→ PDF, DOI, BibTeX, Topics: energy and power; co-design; performance analysis; performance modeling
Kenneth Czechowski and Richard Vuduc.
A theoretical framework for algorithm-architecture co-design.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Boston, MA, USA, May 2013.
doi:10.1109/IPDPS.2013.99.
→ PDF, DOI, BibTeX, Topics: energy and power; co-design; performance modeling
Jee Whan Choi and Richard Vuduc.
A roofline model of energy.
Technical Report GT-CSE-12-01, Georgia Institute of Technology,
School of Computational Science and Engineering, Atlanta, GA, USA, December
2012.
https://smartech.gatech.edu/xmlui/handle/1853/45737.
→ BibTeX, Topics: energy and power; co-design; performance analysis; performance modeling
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen mei Hwu.
Performance analysis and tuning for general purpose graphics
processing units (GPGPU).
Synthesis Lectures on Computer Architecture. Morgan & Claypool
Publishers, San Rafael, CA, USA, November 2012.
→ DOI, BibTeX, Topics: microarchitecture; performance analysis; performance modeling
William B. March, Kenneth Czechowski, Marat Dukhan, Thomas Benson, Dongryeol
Lee, Andrew J. Connolly, Richard Vuduc, Edmond Chow, and Alexander G. Gray.
Optimizing the computation of -point correlations on large-scale
astronomical data.
In Proc. ACM/IEEE Conf. Supercomputing (SC), November 2012.
http://dl.acm.org/citation.cfm?id=2389097.
→ PDF, BibTeX, Topics: performance evaluation; performance programming; n-body
Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Ahn Nguyen,
Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin,
and George Biros.
A massively parallel adaptive fast multipole method on heterogeneous
architectures.
Communications of the ACM (CACM), 55(5):101-109, May 2012.
doi:10.1145/2160718.2160740.
Extends conference version:
http://doi.acm.org/10.1145/1654059.1654118.
→ PDF, DOI, BibTeX, Topics: n-body; GPGPU; parallel algorithms; performance optimization; performance analysis; fast multipole method; performance evaluation
Cong Hou, Daniel Quinlan, David Jefferson, Richard Fujimoto, and Richard Vuduc.
Loop synthesis for program inversion.
In Proc. 4th Wkshp. Reversible Computation, Copenhagen,
Denmark, July 2012.
http://www.reversible-computation.org/2012/cms.
→ Talk, PDF, BibTeX, Topics: program inversion; compilers
Aparna Chandramowlishwaran, Jee Whan Choi, Kamesh Madduri, and Richard Vuduc.
Towards a communication optimal fast multipole method and its
implications for exascale.
In Proc. ACM Symp. Parallel Algorithms and Architectures
(SPAA), Pittsburgh, PA, USA, June 2012.
Brief announcement.
doi:10.1145/2312005.2312039.
→ PDF, DOI, BibTeX, Topics: performance analysis; performance modeling; performance optimization; fast multipole method; exascale; co-design; parallel algorithms; n-body
Kenneth Czechowski, Chris McClanahan, Casey Battaglino, Kartik Iyer, P.-K.
Yeung, and Richard Vuduc.
On the communication complexity of 3D FFTs and its implications
for exascale.
In Proc. ACM Int'l. Conf. Supercomputing (ICS), San Servolo
Island, Venice, Italy, June 2012.
doi:10.1145/2304576.2304604.
→ Talk, PDF, DOI, BibTeX, Topics: FFT; exascale; performance modeling; co-design
Richard Vuduc, Kenneth Czechowski, Aparna Chandramowlishwaran, and Jee Whan
Choi.
Courses in high-performance computing for scientists and engineers.
In Proc. NSF/TCPP Wkshp. Parallel and Distributed Computing
Education (EduPar), co-located with IPDPS'12, Shanghai, China, May 2012.
→ Talk, PDF, BibTeX, Topic: education
Cong Hou, George Vulov, Daniel Quinlan, David Jefferson, Richard Fujimoto, and
Richard Vuduc.
A new method for program inversion.
In Proc. Int'l. Conf. Compiler Construction (CC), Tallinn,
Estonia, March 2012.
http://www.cc.gatech.edu/~chou3/ProgramInversion.pdf.
→ Talk, BibTeX, Topics: program inversion; compilers; reverse computation; parallel discrete-event simulation
Sangmin Park, Richard Vuduc, and Mary Jean Harrold.
A unified approach for localizing non-deadlock concurrency bugs.
In Proc. IEEE Int'l. Conf. Software Testing, Verification, and
Validation (ICST), Montréal, Canada, April 2012.
doi:10.1109/ICST.2012.85.
→ PDF, DOI, BibTeX, Topics: fault-localization; software engineering; debugging; testing
Dongryeol Lee, Richard Vuduc, and Alexander G. Gray.
A distributed kernel summation framework for general-dimension
machine learning.
In Proc. SIAM Int'l. Conf. Data Mining (SDM), Anaheim, CA, USA,
April 2012.
Winner, Best Paper.
→ PDF, BibTeX, Topics: distributed memory parallelism; statistical machine learning; data mining
Jaewoong Sim, Aniruddha Dasgputa, Hyesoon Kim, and Richard Vuduc.
A performance analysis framework for identifying performance benefits
in GPGPU applications.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel
Programming (PPoPP), New Orleans, LA, USA, February 2012.
doi:10.1145/2145816.2145819.
→ PDF, DOI, BibTeX, Topics: GPGPU; performance analysis; performance modeling; microarchitecture; performance optimization
Jaekyu Lee, Hyesoon Kim, and Richard Vuduc.
When prefetching works, when it doesn't, and why.
ACM Trans. Architecture and Code Optimization (TACO), 9(1),
March 2012.
doi:10.1145/2133382.2133384.
→ PDF, DOI, BibTeX, Topics: microarchitecture; prefetching
Sooraj Bhat, Ashish Agarwal, Richard Vuduc, and Alexander Gray.
A type theory for probability density functions.
In ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages
(POPL 2012), Philadelpha, PA, USA, January 2012.
doi:10.1145/2103656.2103721.
→ PDF, DOI, BibTeX, Topics: statistical machine learning; type theory; probability theory; programming languages
Richard W. Vuduc.
Autotuning (definition).
In David Padua, editor, Encyclopedia of Parallel Computing.
Springer, 2011.
→ BibTeX, Topic: autotuning
George Vulov, Cong Hou, Richard Vuduc, Daniel Quinlan, Richard Fujimoto, and
David Jefferson.
The Backstroke framework for source level reverse computation
applied to parallel discrete event simulation.
In S. Jain, R. R. Creasey, J. Himmelspach, K.P. White, and M. Fu,
editors, Proc. Winter Simulation Conf. (WSC), Phoenix, AZ, USA,
December 2011. IEEE.
http://www.informs-sim.org/wsc11papers/264.pdf.
→ BibTeX, Topics: compilers; program inversion; parallel discrete-event simulation
Kenneth Czechowski, Chris McClanahan, Casey Battaglino, Kartik Iyer, P.-K.
Yeung, and Richard Vuduc.
Prospects for scalable 3D FFTs on heterogeneous exascale systems.
In In Proc. ACM/IEEE Conf. Supercomputing (SC), November 2011.
(poster; extended version available as
Georgia Tech report GT-CSE-11-02.
→ BibTeX, Topics: exascale; FFT; GPGPU; performance analysis; performance modeling
Richard Vuduc and Kenneth Czechowski.
What GPU computing means for high-end systems.
IEEE Micro, 31(4):74-78, July/August 2011.
doi:10.1109/MM.2011.78.
→ PDF, DOI, BibTeX, Topics: GPGPU; performance analysis; performance modeling; exascale
Raghul Gunasekaran, David Dillow, Galen Shipman, Richard Vuduc, and Edmond
Chow.
Characterizing application runtime behavior from system logs and
metrics.
In Proc. Int'l. Wkshp. Characterizing Applications for
Heterogeneous Exascale Systems (CACHES), Tucson, AZ, USA, June 2011.
→ BibTeX, Topics: performance analysis; performance modeling; systems software; job scheduling
Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Aparna
Chandramowlishwaran, and Richard Vuduc.
Balance principles for algorithm-architecture co-design.
In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar),
Berkeley, CA, USA, May 2011.
http://www.usenix.org/events/hotpar11/tech/final_files/Czechowski.pdf.
→ Talk, PDF, BibTeX, Topics: parallel algorithms; architecture; co-design; performance analysis
Sam Williams, Nathan Bell, Jee Choi, Michael Garland, Leonid Oliker, and
Richard Vuduc.
Sparse matrix vector multiplication on multicore and accelerator
systems.
In Jakub Kurzak, David A. Bader, and Jack Dongarra, editors, Scientific Computing with Multicore Processors and Accelerators. CRC Press,
2010.
→ BibTeX
Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, and Richard Vuduc.
Many-thread aware prefetching mechanisms for GPGPU applications.
In Proc. IEEE/ACM Int'l. Symp. Microarchitecture (MICRO),
Atlanta, GA, USA, December 2010.
doi:10.1109/MICRO.2010.44.
→ PDF, DOI, BibTeX, Topics: GPGPU; performance evaluation; performance programming; prefetching; codesign
Abtin Rahimian, Ilya Lashuk, Aparna Chandramowlishwaran, Dhairya Malhotra,
Logan Moon, Rahul Sampath, Aashay Shringarpure, Shravan Veerapaneni, Jeffrey
Vetter, Richard Vuduc, Denis Zorin, and George Biros.
Petascale direct numerical simulation of blood flow on 200k cores and
heterogeneous architectures.
In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA,
USA, November 2010.
doi:10.1109/SC.2010.42.
Winner, Gordon Bell Prize.
→ PDF, DOI, BibTeX, Topics: performance evaluation; performance programming; GPGPU; multicore; MPI; parallel algorithms; heterogeneous architectures; parallel algorithms; fast multipole method
Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc.
Diagnosis, tuning, and redesign for multicore performance: A case
study of the fast multipole method.
In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA,
USA, November 2010.
doi:10.1109/SC.2010.19.
→ PDF, DOI, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization
Richard Vuduc, Aparna Chandramowlishwaran, Jee Whan Choi, Murat Efe Guney, and
Aashay Shringarpure.
On the limits of GPU acceleration.
In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar),
Berkeley, CA, USA, June 2010.
→ PDF, BibTeX, Topics: multicore; GPGPU; sparse linear algebra; n-body
Sooraj Bhat, Ashish Agarwal, Alexander Gray, and Richard Vuduc.
Toward interactive statistical modeling.
Procedia Computer Science, 1(1):1829-1838, May-June 2010.
doi:10.1016/j.procs.2010.04.205.
Proc. Int'l. Conf. Computational Science
(ICCS), Wkshp. Automated Program Generation for Computational Science
(APGCS).
→ PDF, DOI, BibTeX, Topics: machine learning; algorithm derivation; interactive modeling; type theory
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Performance evaluation of Concurrent Collections on
high-performance multicore computing systems.
Technical Report GT-CSE-10-01, Georgia Institute of Technology,
Atlanta, GA, USA, February 2010.
→ BibTeX, Topics: parallel programming models; dense linear algebra; multicore
Aparna Chandramowlishwaran, Samuel Williams, Leonid Oliker, Ilya Lashuk, George
Biros, and Richard Vuduc.
Optimizing and tuning the fast multipole method for state-of-the-art
multicore architectures.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Atlanta, GA, USA, April 2010.
→ PDF, BibTeX, Topics: n-body; multicore; performance analysis; performance optimization
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Performance evaluation of Concurrent Collections on
high-performance multicore computing systems.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Atlanta, GA, USA, April 2010.
doi:10.1109/IPDPS.2010.5470404.
Winner, Best Paper (software track).
→ PDF, DOI, BibTeX, Topics: CnC; parallel programming models; dense linear algebra; multicore
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Applying the Concurrent Collections programming model to
asynchronous parallel dense linear algebra.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel
Programming (PPoPP), Bangalore, India, January 2010.
(poster).
doi:10.1145/1693453.1693506.
→ PDF, DOI, BibTeX, Topics: parallel programming models; dense linear algebra; multicore
Sangmin Park, Richard W. Vuduc, and Mary Jean Harrold.
FALCON: Fault localization for concurrent programs.
In Proc. ACM/IEEE Int'l. Conf. Software Eng., Cape Town, South
Africa, May 2010.
doi:10.1145/1806799.1806838.
→ PDF, DOI, BibTeX, Topics: testing; debugging; fault-localization; concurrency
Jee Whan Choi, Amik Singh, and Richard W. Vuduc.
Model-driven autotuning of sparse matrix-vector multiply on GPUs.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel
Programming (PPoPP), Bangalore, India, January 2010.
doi:10.1145/1693453.1693471.
→ PDF, DOI, BibTeX, Topics: sparse linear algebra; autotuning; GPGPU; performance modeling; performance optimization
Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas.
Effective source-to-source outlining to support whole program
empirical optimization.
In Proc. Int'l. Wkshp. Languages and Compilers for Parallel
Computing (LCPC), volume LNCS, Newark, DE, USA, October 2009.
doi:10.1007/978-3-642-13374-9_21.
→ DOI, BibTeX, Topics: compilers; autotuning; outlining
Nitin Arora, Ryan P. Russell, and Richard W. Vuduc.
Fast sensitivity computations for numerical optimizations.
In Proc. AAS/AIAA Astrodynamics Specialist Conference, AAS
09-435, Pittsburgh, PA, USA, August 2009.
http://soliton.ae.gatech.edu/people/rrussell/FinalPublications/ConferencePapers/09AugAAS_09-392_p2pLowthrust.pdf.
→ PDF, BibTeX, Topics: numerical optimization; sensitivity; GPGPU; astrodynamics
Manisha Gajbe, Andrew Canning, John Shalf, Lin-Wang Wang, Harvey Wasserman, and
Richard Vuduc.
Auto-tuning distributed-memory 3-dimensional fast Fourier
transforms on the Cray XT4.
In Proc. Cray User's Group (CUG) Meeting, Atlanta, GA, USA, May
2009.
http://www.cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/11-15Wednesday/14C-Gajbe/GAJBE-paper.pdf.
→ BibTeX, Topics: autotuning; performance analysis; performance optimization; FFT
Sundaresan Venkatasubramanian and Richard W. Vuduc.
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU
platforms.
In Proc. ACM Int'l. Conf. Supercomputing (ICS), New York, NY,
USA, June 2009.
doi:http://dx.doi.org/10.1145/1542275.1542312.
→ PDF, DOI, BibTeX, Topics: asynchronous iteration; GPGPU; heterogeneous architectures; performance optimization
Nitin Arora, Aashay Shringarpure, and Richard Vuduc.
Direct -body kernels for multicore platforms.
In Proc. Int'l. Conf. Parallel Processing (ICPP), Vienna,
Austria, September 2009.
doi:http://dx.doi.org/10.1109/ICPP.2009.71.
→ PDF, DOI, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization
Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh Nguyen,
Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin,
and George Biros.
A massively parallel adaptive fast multipole method on heterogeneous
architectures.
In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA,
November 2009.
doi:http://doi.acm.org/10.1145/1654059.1654118.
Finalist, Best Paper.
→ PDF, DOI, BibTeX, Topics: n-body; multicore; GPGPU; MPI; parallel algorithms
Seunghwa Kang, David Bader, and Richard Vuduc.
Understanding the design trade-offs among current multicore systems
for numerical computations.
In Proc. IEEE Int'l. Parallel and Distributed Processing Symp.
(IPDPS), Rome, Italy, May 2009.
doi:http://doi.ieeecomputersociety.org/10.1109/IPDPS.2009.5161055.
→ PDF, DOI, BibTeX, Topics: statistical models; n-body; multicore; performance analysis; performance optimization
Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine Yelick, and
James Demmel.
Optimizing sparse matrix-vector multiply on emerging multicore
platforms.
Parallel Computing (ParCo), 35(3):178-194, March 2009.
doi:10.1016/j.parco.2008.12.006.
Extends conference version:
http://dx.doi.org/10.1145/1362622.1362674.
→ PDF, DOI, BibTeX, Topics: sparse linear algebra; multicore; autotuning
Aparna Chandramowlishwaran, Abhinav Karhu, Ketan Umare, and Richard Vuduc.
Numerical algorithms with tunable parallelism.
In Proc.Wkshp. Software Tools for Multicore Systems (STMCS), at
IEEE/ACM Int'l. Symp. Code Generation and Optimization (CGO), Boston, MA,
USA, April 2008.
http://people.csail.mit.edu/rabbah/conferences/08/cgo/stmcs/papers/vuduc-stmcs08.pdf.
→ PDF, BibTeX, Topics: autotuning; asynchronous variational integration; asynchronous iteration
Thomas Panas, Dan Quinlan, and Richard Vuduc.
Tool support for inspecting the code quality of HPC applications.
In Proc. Wkshp. Software Eng. for High-Performance Computing
Applications (SE-HPC), at ACM/IEEE Int'l. Conf. Software Eng. (ICSE),
Minneapolis, MN, USA, May 2007.
doi:http://dx.doi.org/10.1109/SE-HPC.2007.8.
→ PDF, DOI, BibTeX, Topics: program visualization; software engineering
Thomas Panas, Dan Quinlan, and Richard Vuduc.
Analyzing and visualizing whole program architectures.
In Proc. Wkshp. Aerospace Software Engineering (AeroSE), at
ACM/IEEE Int'l. Conf. Software Eng. (ICSE), Minneapolis, MN, USA, May 2007.
Also: Lawrence Livermore National Laboratory
Technical Report UCRL-PROC-231453.
http://www.osti.gov/bridge/servlets/purl/909924-c8K5TR/909924.pdf.
→ PDF, BibTeX, Topics: program visualization; software engineering
Dan Quinlan, Richard Vuduc, and Ghassan Misherghi.
Techniques for specifying bug patterns.
In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing
and Debugging (PADTAD), at Int'l. Symp. Software Testing and Analysis
(ISSTA), Portland, ME, USA, July 2007.
doi:http://doi.acm.org/10.1145/1273647.1273654.
Winner, Best Paper.
→ PDF, DOI, BibTeX, Topics: software security; compilers; debugging
Sam Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and
James Demmel.
Optimization of sparse matrix-vector multiplication on emerging
multicore platforms.
In Proc. ACM/IEEE Conf. Supercomputing (SC), 2007.
doi:http://dx.doi.org/10.1145/1362622.1362674.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; multicore; performance analysis; performance optimization
Rajesh Nishtala, Richard Vuduc, James W. Demmel, and Katherine A. Yelick.
When cache blocking sparse matrix vector multiply works and why.
Applicable Algebra in Engineering, Communication, and Computing:
Special Issue on Computational Linear Algebra and Sparse Matrix
Computations, March 2007.
doi:http://dx.doi.org/10.1007/s00200-007-0038-9.
→ DOI, BibTeX, Topics: sparse linear algebra; performance analysis; performance optimization
Qing Yi, Keith Seymour, Haihang You, Richard Vuduc, and Dan Quinlan.
POET: Parameterized Optimizations for Empirical Tuning.
In Proc. Wkshp. Performance Optimization of High-level Languages
and Libraries (POHLL), at IEEE Int'l. Par. Distrib. Processing Symp.
(IPDPS), pages 1-8, Long Beach, CA, USA, March 2007.
doi:http://dx.doi.org/10.1109/IPDPS.2007.370637.
→ PDF, DOI, BibTeX, Topics: compilers; autotuning; program generation
Dan Quinlan, Markus Schordan, Richard Vuduc, and Qing Yi.
Annotating user-defined abstractions for optimization.
In Proc. Wkshp. Performance Optimization of High-level Languages
and Libraries (POHLL), at IEEE Int'l. Par. Distrib. Processing Symp.
(IPDPS), Rhodes, Greece, April 2006.
doi:http://dx.doi.org/10.1109/IPDPS.2006.1639722.
→ DOI, BibTeX, Topic: compilers
Dan Quinlan, Richard Vuduc, Thomas Panas, Jochen Härdtlein, and Andreas
Sæbjørnsen.
Support for whole-program analysis and the verification of the
one-definition rule in C++.
In Proc. Static Analysis Summit (SAS), volume NIST Special
Publication 500-262, pages 27-35, 2006.
http://samate.nist.gov/docs/NIST_Special_Publication_500-262.pdf.
→ PDF, BibTeX, Topics: program analysis; C++; one-definition rule; software security; compilers
Richard Vuduc, Martin Schulz, Dan Quinlan, and Bronis de Supinski.
Improving distributed memory applications testing by message
perturbation.
In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing
and Debugging (PADTAD), at Int'l. Symp. Software Testing and Analysis
(ISSTA), Portland, ME, USA, July 2006.
doi:http://dx.doi.org/10.1145/1147403.1147409.
Winner, Best Paper.
→ PDF, DOI, BibTeX, Topics: MPI; testing; debugging; irritators
Yuan Zhao, Qing Yi, Ken Kennedy, Dan Quinlan, and Richard Vuduc.
Parameterizing loop fusion for automated empirical tuning.
Technical Report UCRL-TR-217808, Center for Applied Scientific
Computing, Lawrence Livermore National Laboratory, California, USA, December
2005.
doi:http://dx.doi.org/10.2172/890608.
→ DOI, BibTeX, Topics: compilers; autotuning
Dan Quinlan, Shmuel Ur, and Richard Vuduc.
An extensible open-source compiler infrastructure for testing.
In Proc. IBM Haifa Verification Conf. (VC), volume LNCS 3875,
pages 116-133, Haifa, Israel, November 2005. Springer Berlin / Heidelberg.
doi:http://dx.doi.org/10.1007/11678779_9.
→ PDF, DOI, BibTeX, Topics: compilers; testing
Richard Vuduc, James W. Demmel, and Katherine A. Yelick.
OSKI: A library of automatically tuned sparse matrix kernels.
In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages
521-530, 2005.
doi:http://dx.doi.org/10.1088/1742-6596/16/1/071.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance optimization
Richard W. Vuduc and Hyun-Jin Moon.
Fast sparse matrix-vector multiplication by exploiting variable block
structure.
In Proc. High-Performance Computing and Communications Conf.
(HPCC), volume LNCS 3726, pages 807-816, Sorrento, Italy, September 2005.
Springer.
doi:http://dx.doi.org/10.1007/11557654_91.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance optimization
James Demmel, Jack Dongarra, Viktor Eijkhout, Erika Fuentes, Antoine Petitet,
Richard Vuduc, R. Clint Whaley, and Katherine Yelick.
Self-adapting linear algebra algorithms and software.
Proc. IEEE, 93(2):293-312, February 2005.
doi:http://dx.doi.org/10.1109/JPROC.2004.840848.
→ DOI, BibTeX, Topics: dense linear algebra; sparse linear algebra; autotuning
Benjamin C. Lee, Richard Vuduc, James Demmel, and Katherine Yelick.
Performance models for evaluation and automatic tuning of symmetric
sparse matrix-vector multiply.
In Proc. Int'l. Conf. Parallel Processing (ICPP), Montreal,
Canada, August 2004.
doi:http://dx.doi.org/10.1109/ICPP.2004.1327917.
Winner, Best Paper.
→ DOI, BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Eun-Jin Im, Katherine Yelick, and Richard Vuduc.
SPARSITY: Optimization framework for sparse matrix kernels.
Int'l. J. High Performance Computing Applications (IJHPCA),
18(1):135-158, February 2004.
doi:http://dx.doi.org/10.1177/1094342004041296.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance modeling; performance optimization
Richard W. Vuduc.
Automatic performance tuning of sparse matrix kernels.
PhD thesis, University of California, Berkeley, CA, USA, January
2004.
http://bebop.cs.berkeley.edu/pubs/vuduc2003-dissertation.pdf.
→ BibTeX, Topics: performance analysis; performance modeling; performance optimization; autotuning; sparse linear algebra; statistical models
Richard Vuduc, James Demmel, and Jeff Bilmes.
Statistical models for empirical search-based performance tuning.
Int'l. J. High Performance Computing Applications (IJHPCA),
18(1):65-94, 2004.
doi:10.1177/1094342004041293.
Extends conference version:
http://dx.doi.org/10.1007/3-540-45545-0_21.
→ DOI, BibTeX, Topics: statistical models; autotuning; survey; dense linear algebra; performance analysis
Richard Vuduc, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick.
Memory hierarchy optimizations and bounds for sparse .
In Proc. Wkshp. Parallel Linear Algebra (PLA), at Int'l. Conf.
Computational Sci. (ICCS), volume LNCS 2659, pages 705-714, Melbourne,
Australia, June 2003. Springer Berlin / Heidelberg.
doi:http://dx.doi.org/10.1007/3-540-44863-2_69.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance modeling
Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh
Nishtala, and Benjamin Lee.
Performance optimizations and bounds for sparse matrix-vector
multiply.
In Proc. ACM/IEEE Conf. Supercomputing (SC), Baltimore, MD,
USA, November 2002.
Finalist, Best Student Paper.
http://portal.acm.org/citation.cfm?id=762822.
→ BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, and
Katherine A. Yelick.
Automatic performance tuning and analysis of sparse triangular solve.
In Proc. Wkshp. Performance Optimization of High-level Languages
and Libraries (POHLL), at ACM Int'l. Conf. Supercomputing (ICS), New York,
USA, June 2002.
Winner, Best Presentation; Winner, Best
Student Paper.
http://www.ece.lsu.edu/jxr/pohll-02/papers/vuduc.pdf.
→ BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Richard Vuduc, James W. Demmel, and Jeff A. Bilmes.
Statistical models for empirical search-based performance tuning.
In Proc. Int'l. Conf. Computational Science (ICCS), volume LNCS
2073, pages 117-126, San Francisco, CA, USA, May 2001. Springer Berlin /
Heidelberg.
Extends workshop version:
http://www.eecs.harvard.edu/~smith/fddo3/papers/107.ps.
doi:http://dx.doi.org/10.1007/3-540-45545-0_21.
→ DOI, BibTeX, Topics: statistical models; autotuning; dense linear algebra
Richard Vuduc and James W. Demmel.
Code generators for automatic tuning of numerical kernels:
Experiences with FFTW.
In Proc. Semantics, Applications, and Implementation of Program
Generation (SAIG), at ACM SIGPLAN Int'l. Conf. Functional Programming
(ICFP), Montréal, Canada, September 2000.
doi:http://dx.doi.org/10.1007/3-540-45350-4_14.
→ PDF, DOI, BibTeX, Topics: program generation; signal processing; autotuning; FFT
Richard Vuduc, James Demmel, and Jeff Bilmes.
Statistical modeling of feedback data in an automatic tuning system.
In Proc. ACM Wkshp. Feedback-Directed Dynamic Optimization
(FDDO), at Int'l. Symp. Microarchitecture (MICRO), Monterey, CA, USA,
December 2000.
Winner, Best Presentation.
http://www.eecs.harvard.edu/~smith/fddo3/papers/107.ps.
→ BibTeX, Topics: dense linear algebra; statistical models; autotuning; performance modeling
Danyel Fisher, Kris Hildrum, Jason Hong, Mark Newman, Megan Thomas, and Richard
Vuduc.
SWAMI: A framework for collaborative filtering algorithm
development and evaluation.
In Proc. ACM Conf. Research and Development in Information
Retrieval (SIGIR), pages 366-368, Athens, Greece, July 2000.
(poster).
doi:http://dx.doi.org/10.1145/345508.345658.
→ PDF, DOI, BibTeX, Topic: collaborative filtering
E. Jason Riedy and Richard Vuduc.
Microbenchmarking the Tera MTA.
http://vuduc.org/pubs/riedy99-tera-report.pdf, May 1998.
→ PDF, BibTeX, Topics: multithreaded architectures; benchmarking; performance analysis
Bohdan Balko, Irvin W. Kay, Richard Vuduc, and John W. Neuberger.
Recovery of superfluorescence in inhomogeneously broadened systems
through rapid relaxation.
Phys. Rev. B, 55(18):12079-12085, May 1997.
doi:http://dx.doi.org/10.1103/PhysRevB.55.12079.
→ PDF, DOI, BibTeX, Topic: gamma-ray lasers
Bohdan Balko, Irvin W. Kay, James D. Silk, Richard Vuduc, and John W.
Neuberger.
Superfluorescence in the presence of inhomogeneous broadening.
Hyperfine Interactions: Special Issue on the Gamma-Ray Laser,
107(1-4):369-379, June 1997.
doi:http://dx.doi.org/10.1023/A:1012020225589.
→ DOI, BibTeX, Topic: gamma-ray lasers
Bohdan Balko, Irvin Kay, Richard Vuduc, and John Neuberger.
An investigation of the possible enhancement of nuclear
superfluorescence through crystalline and hyperfine interaction effects.
In Proc. Lasers '95, page 308, 1996.
→ BibTeX, Topic: gamma-ray lasers
Automatically created on Sun Mar 30 21:08:47 2014 by yab2web.