Richard Vuduc: Publications

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

NOTE: This page is woefully out of date!

I broke my write-many read-never perl script. (Curse you, modern versions of perl!) However, my CV is more current so please check that out instead.

Topic:			Show topics?
Venue:			Show acceptance rates?
Year:
Author:

Kenneth Czechowski, Victor W. Lee, Ed Grochowski, Ronny Ronen, Ronak Singhal, Pradeep Dubey, and Richard Vuduc. Improving the energy efficiency of big cores. In Proc. ACM/IEEE Int'l. Symp. on Computer Architecture (ISCA), Minneapolis, MN, USA, June 2014. (accepted).
→ BibTeX

Sangmin Park, Richard Vuduc, and Mary Jean Harrold. UNICORN: a unified approach for localizing non-deadlock concurrency bugs. Software Testing, Verification, and Reliability (Softw. Test. Verif. Reliab.), 2014. doi:10.1002/stvr.1523.
→ DOI, BibTeX

Jee Choi, Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc. A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method. In Proc. 7th Wkshp. on General-purpose Processing using GPUs (GPGPU-7), Salt Lake City, UT, USA, March 2014. ACM. http://www.ece.neu.edu/groups/nucar/GPGPU/GPGPU7/.
→ PDF, BibTeX, Topics: performance analysis; performance evaluation; performance modeling; performance programming; GPGPU

Jee Choi, Marat Dukhan, Xing Liu, and Richard Vuduc. Algorithmic time, energy, and power on candidate HPC compute building blocks. In Proc. IEEE Int'l. Parallel and Distributed Processing Symp. (IPDPS), Phoenix, AZ, USA, May 2014. (to appear).
→ PDF, BibTeX, Topics: energy and power; co-design; performance analysis; performance modeling; performance evaluation

Piyush Sao and Richard Vuduc. Self-stabilizing iterative solvers. In Proc. 4th Wkshp. Latest Advances in Scalable Algorithms for Large-scale Systems (ScalA), Denver, CO, USA, November 2013. http://www.csm.ornl.gov/srt/conferences/Scala/2013/.
→ BibTeX, Topics: exascale; fault-tolerance; numerical algorithms

Marat Dukhan and Richard Vuduc. Methods for high-throughput computation of elementary functions. In Proc. 10th Int'l. Conf. Parallel Processing and Applied Mathematics (PPAM), September 2013.
→ BibTeX, Topics: numerical algorithms; microarchitecture; performance programming

Sangmin Park, Mary Jean Harrold, and Richard Vuduc. Griffin: Grouping suspicious memory-access patterns to improve understanding of concurrency bugs. In Proc. Int'l. Symp. Software Testing and Analysis (ISSTA), Lugano, Switzerland, July 2013.
→ PDF, BibTeX, Topics: testing and debugging; fault localization; concurrency bugs, Acceptance rate: [32/124=25.8%]

Agata Rozga, Tricia Z. King, Richard W. Vuduc, and Diana L. Robins. Undifferentiated facial electromyography responses to dynamic, audio-visual emotion displays in individuals with autism spectrum disorders. Developmental Science, 2013. doi:10.1111/desc.12062.
→ DOI, BibTeX, Topics: autism; EMG; emotion; psychology

Jee Choi, Dan Bedard, Rob Fowler, and Richard Vuduc. A roofline model of energy. In Proc. IEEE Int'l. Parallel and Distributed Processing Symp. (IPDPS), Boston, MA, USA, May 2013. This paper is a short peer-reviewed conference version of the following technical report: https://smartech.gatech.edu/xmlui/handle/1853/45737. doi:10.1109/IPDPS.2013.77.
→ PDF, DOI, BibTeX, Topics: energy and power; co-design; performance analysis; performance modeling, Acceptance rate: [106/494=21.5%]

Kenneth Czechowski and Richard Vuduc. A theoretical framework for algorithm-architecture co-design. In Proc. IEEE Int'l. Parallel and Distributed Processing Symp. (IPDPS), Boston, MA, USA, May 2013. doi:10.1109/IPDPS.2013.99.
→ PDF, DOI, BibTeX, Topics: energy and power; co-design; performance modeling, Acceptance rate: [106/494=21.5%]

Jee Whan Choi and Richard Vuduc. A roofline model of energy. Technical Report GT-CSE-12-01, Georgia Institute of Technology, School of Computational Science and Engineering, Atlanta, GA, USA, December 2012. https://smartech.gatech.edu/xmlui/handle/1853/45737.
→ BibTeX, Topics: energy and power; co-design; performance analysis; performance modeling

Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen mei Hwu. Performance analysis and tuning for general purpose graphics processing units (GPGPU). Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, San Rafael, CA, USA, November 2012.
→ DOI, BibTeX, Topics: microarchitecture; performance analysis; performance modeling

William B. March, Kenneth Czechowski, Marat Dukhan, Thomas Benson, Dongryeol Lee, Andrew J. Connolly, Richard Vuduc, Edmond Chow, and Alexander G. Gray. Optimizing the computation of -point correlations on large-scale astronomical data. In Proc. ACM/IEEE Conf. Supercomputing (SC), November 2012. http://dl.acm.org/citation.cfm?id=2389097.
→ PDF, BibTeX, Topics: performance evaluation; performance programming; n-body, Acceptance rate: [100/472=21.2%]

Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Ahn Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, and George Biros. A massively parallel adaptive fast multipole method on heterogeneous architectures. Communications of the ACM (CACM), 55(5):101-109, May 2012. doi:10.1145/2160718.2160740. Extends conference version: http://doi.acm.org/10.1145/1654059.1654118.
→ PDF, DOI, BibTeX, Topics: n-body; GPGPU; parallel algorithms; performance optimization; performance analysis; fast multipole method; performance evaluation

Cong Hou, Daniel Quinlan, David Jefferson, Richard Fujimoto, and Richard Vuduc. Loop synthesis for program inversion. In Proc. 4th Wkshp. Reversible Computation, Copenhagen, Denmark, July 2012. http://www.reversible-computation.org/2012/cms.
→ Talk, PDF, BibTeX, Topics: program inversion; compilers, Acceptance rate: [23/46=50%]

Aparna Chandramowlishwaran, Jee Whan Choi, Kamesh Madduri, and Richard Vuduc. Towards a communication optimal fast multipole method and its implications for exascale. In Proc. ACM Symp. Parallel Algorithms and Architectures (SPAA), Pittsburgh, PA, USA, June 2012. Brief announcement. doi:10.1145/2312005.2312039.
→ PDF, DOI, BibTeX, Topics: performance analysis; performance modeling; performance optimization; fast multipole method; exascale; co-design; parallel algorithms; n-body

Kenneth Czechowski, Chris McClanahan, Casey Battaglino, Kartik Iyer, P.-K. Yeung, and Richard Vuduc. On the communication complexity of 3D FFTs and its implications for exascale. In Proc. ACM Int'l. Conf. Supercomputing (ICS), San Servolo Island, Venice, Italy, June 2012. doi:10.1145/2304576.2304604.
→ Talk, PDF, DOI, BibTeX, Topics: FFT; exascale; performance modeling; co-design, Acceptance rate: [36/161=22.4%]

Richard Vuduc, Kenneth Czechowski, Aparna Chandramowlishwaran, and Jee Whan Choi. Courses in high-performance computing for scientists and engineers. In Proc. NSF/TCPP Wkshp. Parallel and Distributed Computing Education (EduPar), co-located with IPDPS'12, Shanghai, China, May 2012.
→ Talk, PDF, BibTeX, Topic: education

Cong Hou, George Vulov, Daniel Quinlan, David Jefferson, Richard Fujimoto, and Richard Vuduc. A new method for program inversion. In Proc. Int'l. Conf. Compiler Construction (CC), Tallinn, Estonia, March 2012. http://www.cc.gatech.edu/~chou3/ProgramInversion.pdf.
→ Talk, BibTeX, Topics: program inversion; compilers; reverse computation; parallel discrete-event simulation, Acceptance rate: [13/51=25.5%]

Sangmin Park, Richard Vuduc, and Mary Jean Harrold. A unified approach for localizing non-deadlock concurrency bugs. In Proc. IEEE Int'l. Conf. Software Testing, Verification, and Validation (ICST), Montréal, Canada, April 2012. doi:10.1109/ICST.2012.85.
→ PDF, DOI, BibTeX, Topics: fault-localization; software engineering; debugging; testing, Acceptance rate: [39/145=26.9%]

Dongryeol Lee, Richard Vuduc, and Alexander G. Gray. A distributed kernel summation framework for general-dimension machine learning. In Proc. SIAM Int'l. Conf. Data Mining (SDM), Anaheim, CA, USA, April 2012. Winner, Best Paper.
→ PDF, BibTeX, Topics: distributed memory parallelism; statistical machine learning; data mining

Jaewoong Sim, Aniruddha Dasgputa, Hyesoon Kim, and Richard Vuduc. A performance analysis framework for identifying performance benefits in GPGPU applications. In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), New Orleans, LA, USA, February 2012. doi:10.1145/2145816.2145819.
→ PDF, DOI, BibTeX, Topics: GPGPU; performance analysis; performance modeling; microarchitecture; performance optimization, Acceptance rate: [26/175=14.9%]

Jaekyu Lee, Hyesoon Kim, and Richard Vuduc. When prefetching works, when it doesn't, and why. ACM Trans. Architecture and Code Optimization (TACO), 9(1), March 2012. doi:10.1145/2133382.2133384.
→ PDF, DOI, BibTeX, Topics: microarchitecture; prefetching

Sooraj Bhat, Ashish Agarwal, Richard Vuduc, and Alexander Gray. A type theory for probability density functions. In ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages (POPL 2012), Philadelpha, PA, USA, January 2012. doi:10.1145/2103656.2103721.
→ PDF, DOI, BibTeX, Topics: statistical machine learning; type theory; probability theory; programming languages, Acceptance rate: [44/205=21.5%]

Richard W. Vuduc. Autotuning (definition). In David Padua, editor, Encyclopedia of Parallel Computing. Springer, 2011.
→ BibTeX, Topic: autotuning

George Vulov, Cong Hou, Richard Vuduc, Daniel Quinlan, Richard Fujimoto, and David Jefferson. The Backstroke framework for source level reverse computation applied to parallel discrete event simulation. In S. Jain, R. R. Creasey, J. Himmelspach, K.P. White, and M. Fu, editors, Proc. Winter Simulation Conf. (WSC), Phoenix, AZ, USA, December 2011. IEEE. http://www.informs-sim.org/wsc11papers/264.pdf.
→ BibTeX, Topics: compilers; program inversion; parallel discrete-event simulation

Kenneth Czechowski, Chris McClanahan, Casey Battaglino, Kartik Iyer, P.-K. Yeung, and Richard Vuduc. Prospects for scalable 3D FFTs on heterogeneous exascale systems. In In Proc. ACM/IEEE Conf. Supercomputing (SC), November 2011. (poster; extended version available as Georgia Tech report GT-CSE-11-02.
→ BibTeX, Topics: exascale; FFT; GPGPU; performance analysis; performance modeling

Richard Vuduc and Kenneth Czechowski. What GPU computing means for high-end systems. IEEE Micro, 31(4):74-78, July/August 2011. doi:10.1109/MM.2011.78.
→ PDF, DOI, BibTeX, Topics: GPGPU; performance analysis; performance modeling; exascale

Raghul Gunasekaran, David Dillow, Galen Shipman, Richard Vuduc, and Edmond Chow. Characterizing application runtime behavior from system logs and metrics. In Proc. Int'l. Wkshp. Characterizing Applications for Heterogeneous Exascale Systems (CACHES), Tucson, AZ, USA, June 2011.
→ BibTeX, Topics: performance analysis; performance modeling; systems software; job scheduling

Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, and Richard Vuduc. Balance principles for algorithm-architecture co-design. In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar), Berkeley, CA, USA, May 2011. http://www.usenix.org/events/hotpar11/tech/final_files/Czechowski.pdf.
→ Talk, PDF, BibTeX, Topics: parallel algorithms; architecture; co-design; performance analysis, Acceptance rate: [Talks: 16/45=35.5%]

Sam Williams, Nathan Bell, Jee Choi, Michael Garland, Leonid Oliker, and Richard Vuduc. Sparse matrix vector multiplication on multicore and accelerator systems. In Jakub Kurzak, David A. Bader, and Jack Dongarra, editors, Scientific Computing with Multicore Processors and Accelerators. CRC Press, 2010.
→ BibTeX

Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, and Richard Vuduc. Many-thread aware prefetching mechanisms for GPGPU applications. In Proc. IEEE/ACM Int'l. Symp. Microarchitecture (MICRO), Atlanta, GA, USA, December 2010. doi:10.1109/MICRO.2010.44.
→ PDF, DOI, BibTeX, Topics: GPGPU; performance evaluation; performance programming; prefetching; codesign, Acceptance rate: [45/248=18.1%]

Abtin Rahimian, Ilya Lashuk, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Shravan Veerapaneni, Jeffrey Vetter, Richard Vuduc, Denis Zorin, and George Biros. Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures. In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA, USA, November 2010. doi:10.1109/SC.2010.42. Winner, Gordon Bell Prize.
→ PDF, DOI, BibTeX, Topics: performance evaluation; performance programming; GPGPU; multicore; MPI; parallel algorithms; heterogeneous architectures; parallel algorithms; fast multipole method, Acceptance rate: [51/253=20.2%]

Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc. Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method. In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA, USA, November 2010. doi:10.1109/SC.2010.19.
→ PDF, DOI, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization, Acceptance rate: [51/253=20.2%]

Richard Vuduc, Aparna Chandramowlishwaran, Jee Whan Choi, Murat Efe Guney, and Aashay Shringarpure. On the limits of GPU acceleration. In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar), Berkeley, CA, USA, June 2010.
→ PDF, BibTeX, Topics: multicore; GPGPU; sparse linear algebra; n-body, Acceptance rate: [Talks: 16/68=23.5%]

Sooraj Bhat, Ashish Agarwal, Alexander Gray, and Richard Vuduc. Toward interactive statistical modeling. Procedia Computer Science, 1(1):1829-1838, May-June 2010. doi:10.1016/j.procs.2010.04.205. Proc. Int'l. Conf. Computational Science (ICCS), Wkshp. Automated Program Generation for Computational Science (APGCS).
→ PDF, DOI, BibTeX, Topics: machine learning; algorithm derivation; interactive modeling; type theory, Acceptance rate: [10/21=47.6%]

Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc. Performance evaluation of Concurrent Collections on high-performance multicore computing systems. Technical Report GT-CSE-10-01, Georgia Institute of Technology, Atlanta, GA, USA, February 2010.
→ BibTeX, Topics: parallel programming models; dense linear algebra; multicore

Aparna Chandramowlishwaran, Samuel Williams, Leonid Oliker, Ilya Lashuk, George Biros, and Richard Vuduc. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures. In Proc. IEEE Int'l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010.
→ PDF, BibTeX, Topics: n-body; multicore; performance analysis; performance optimization, Acceptance rate: [127/527=24.1%]

Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc. Performance evaluation of Concurrent Collections on high-performance multicore computing systems. In Proc. IEEE Int'l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010. doi:10.1109/IPDPS.2010.5470404. Winner, Best Paper (software track).
→ PDF, DOI, BibTeX, Topics: CnC; parallel programming models; dense linear algebra; multicore, Acceptance rate: [127/527=24.1%]

Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc. Applying the Concurrent Collections programming model to asynchronous parallel dense linear algebra. In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), Bangalore, India, January 2010. (poster). doi:10.1145/1693453.1693506.
→ PDF, DOI, BibTeX, Topics: parallel programming models; dense linear algebra; multicore

Sangmin Park, Richard W. Vuduc, and Mary Jean Harrold. FALCON: Fault localization for concurrent programs. In Proc. ACM/IEEE Int'l. Conf. Software Eng., Cape Town, South Africa, May 2010. doi:10.1145/1806799.1806838.
→ PDF, DOI, BibTeX, Topics: testing; debugging; fault-localization; concurrency, Acceptance rate: [52/380=13.7%]

Jee Whan Choi, Amik Singh, and Richard W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), Bangalore, India, January 2010. doi:10.1145/1693453.1693471.
→ PDF, DOI, BibTeX, Topics: sparse linear algebra; autotuning; GPGPU; performance modeling; performance optimization, Acceptance rate: [29/173=16.8%]

Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas. Effective source-to-source outlining to support whole program empirical optimization. In Proc. Int'l. Wkshp. Languages and Compilers for Parallel Computing (LCPC), volume LNCS, Newark, DE, USA, October 2009. doi:10.1007/978-3-642-13374-9_21.
→ DOI, BibTeX, Topics: compilers; autotuning; outlining

Nitin Arora, Ryan P. Russell, and Richard W. Vuduc. Fast sensitivity computations for numerical optimizations. In Proc. AAS/AIAA Astrodynamics Specialist Conference, AAS 09-435, Pittsburgh, PA, USA, August 2009. http://soliton.ae.gatech.edu/people/rrussell/FinalPublications/ConferencePapers/09AugAAS_09-392_p2pLowthrust.pdf.
→ PDF, BibTeX, Topics: numerical optimization; sensitivity; GPGPU; astrodynamics

Manisha Gajbe, Andrew Canning, John Shalf, Lin-Wang Wang, Harvey Wasserman, and Richard Vuduc. Auto-tuning distributed-memory 3-dimensional fast Fourier transforms on the Cray XT4. In Proc. Cray User's Group (CUG) Meeting, Atlanta, GA, USA, May 2009. http://www.cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/11-15Wednesday/14C-Gajbe/GAJBE-paper.pdf.
→ BibTeX, Topics: autotuning; performance analysis; performance optimization; FFT

Sundaresan Venkatasubramanian and Richard W. Vuduc. Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU platforms. In Proc. ACM Int'l. Conf. Supercomputing (ICS), New York, NY, USA, June 2009. doi:http://dx.doi.org/10.1145/1542275.1542312.
→ PDF, DOI, BibTeX, Topics: asynchronous iteration; GPGPU; heterogeneous architectures; performance optimization, Acceptance rate: [47/191=25%]

Nitin Arora, Aashay Shringarpure, and Richard Vuduc. Direct -body kernels for multicore platforms. In Proc. Int'l. Conf. Parallel Processing (ICPP), Vienna, Austria, September 2009. doi:http://dx.doi.org/10.1109/ICPP.2009.71.
→ PDF, DOI, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization, Acceptance rate: [71/220=32.3%]

Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, and George Biros. A massively parallel adaptive fast multipole method on heterogeneous architectures. In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, November 2009. doi:http://doi.acm.org/10.1145/1654059.1654118. Finalist, Best Paper.
→ PDF, DOI, BibTeX, Topics: n-body; multicore; GPGPU; MPI; parallel algorithms, Acceptance rate: [59/261=22.6%]

Seunghwa Kang, David Bader, and Richard Vuduc. Understanding the design trade-offs among current multicore systems for numerical computations. In Proc. IEEE Int'l. Parallel and Distributed Processing Symp. (IPDPS), Rome, Italy, May 2009. doi:http://doi.ieeecomputersociety.org/10.1109/IPDPS.2009.5161055.
→ PDF, DOI, BibTeX, Topics: statistical models; n-body; multicore; performance analysis; performance optimization, Acceptance rate: [101/440=23.0%]

Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine Yelick, and James Demmel. Optimizing sparse matrix-vector multiply on emerging multicore platforms. Parallel Computing (ParCo), 35(3):178-194, March 2009. doi:10.1016/j.parco.2008.12.006. Extends conference version: http://dx.doi.org/10.1145/1362622.1362674.
→ PDF, DOI, BibTeX, Topics: sparse linear algebra; multicore; autotuning

Aparna Chandramowlishwaran, Abhinav Karhu, Ketan Umare, and Richard Vuduc. Numerical algorithms with tunable parallelism. In Proc.Wkshp. Software Tools for Multicore Systems (STMCS), at IEEE/ACM Int'l. Symp. Code Generation and Optimization (CGO), Boston, MA, USA, April 2008. http://people.csail.mit.edu/rabbah/conferences/08/cgo/stmcs/papers/vuduc-stmcs08.pdf.
→ PDF, BibTeX, Topics: autotuning; asynchronous variational integration; asynchronous iteration

Thomas Panas, Dan Quinlan, and Richard Vuduc. Tool support for inspecting the code quality of HPC applications. In Proc. Wkshp. Software Eng. for High-Performance Computing Applications (SE-HPC), at ACM/IEEE Int'l. Conf. Software Eng. (ICSE), Minneapolis, MN, USA, May 2007. doi:http://dx.doi.org/10.1109/SE-HPC.2007.8.
→ PDF, DOI, BibTeX, Topics: program visualization; software engineering

Thomas Panas, Dan Quinlan, and Richard Vuduc. Analyzing and visualizing whole program architectures. In Proc. Wkshp. Aerospace Software Engineering (AeroSE), at ACM/IEEE Int'l. Conf. Software Eng. (ICSE), Minneapolis, MN, USA, May 2007. Also: Lawrence Livermore National Laboratory Technical Report UCRL-PROC-231453. http://www.osti.gov/bridge/servlets/purl/909924-c8K5TR/909924.pdf.
→ PDF, BibTeX, Topics: program visualization; software engineering

Dan Quinlan, Richard Vuduc, and Ghassan Misherghi. Techniques for specifying bug patterns. In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing and Debugging (PADTAD), at Int'l. Symp. Software Testing and Analysis (ISSTA), Portland, ME, USA, July 2007. doi:http://doi.acm.org/10.1145/1273647.1273654. Winner, Best Paper.
→ PDF, DOI, BibTeX, Topics: software security; compilers; debugging

Sam Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. ACM/IEEE Conf. Supercomputing (SC), 2007. doi:http://dx.doi.org/10.1145/1362622.1362674.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; multicore; performance analysis; performance optimization, Acceptance rate: [54/268=20.1%]

Rajesh Nishtala, Richard Vuduc, James W. Demmel, and Katherine A. Yelick. When cache blocking sparse matrix vector multiply works and why. Applicable Algebra in Engineering, Communication, and Computing: Special Issue on Computational Linear Algebra and Sparse Matrix Computations, March 2007. doi:http://dx.doi.org/10.1007/s00200-007-0038-9.
→ DOI, BibTeX, Topics: sparse linear algebra; performance analysis; performance optimization

Qing Yi, Keith Seymour, Haihang You, Richard Vuduc, and Dan Quinlan. POET: Parameterized Optimizations for Empirical Tuning. In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at IEEE Int'l. Par. Distrib. Processing Symp. (IPDPS), pages 1-8, Long Beach, CA, USA, March 2007. doi:http://dx.doi.org/10.1109/IPDPS.2007.370637.
→ PDF, DOI, BibTeX, Topics: compilers; autotuning; program generation

Dan Quinlan, Markus Schordan, Richard Vuduc, and Qing Yi. Annotating user-defined abstractions for optimization. In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at IEEE Int'l. Par. Distrib. Processing Symp. (IPDPS), Rhodes, Greece, April 2006. doi:http://dx.doi.org/10.1109/IPDPS.2006.1639722.
→ DOI, BibTeX, Topic: compilers

Dan Quinlan, Richard Vuduc, Thomas Panas, Jochen Härdtlein, and Andreas Sæbjørnsen. Support for whole-program analysis and the verification of the one-definition rule in C++. In Proc. Static Analysis Summit (SAS), volume NIST Special Publication 500-262, pages 27-35, 2006. http://samate.nist.gov/docs/NIST_Special_Publication_500-262.pdf.
→ PDF, BibTeX, Topics: program analysis; C++; one-definition rule; software security; compilers

Richard Vuduc, Martin Schulz, Dan Quinlan, and Bronis de Supinski. Improving distributed memory applications testing by message perturbation. In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing and Debugging (PADTAD), at Int'l. Symp. Software Testing and Analysis (ISSTA), Portland, ME, USA, July 2006. doi:http://dx.doi.org/10.1145/1147403.1147409. Winner, Best Paper.
→ PDF, DOI, BibTeX, Topics: MPI; testing; debugging; irritators

Yuan Zhao, Qing Yi, Ken Kennedy, Dan Quinlan, and Richard Vuduc. Parameterizing loop fusion for automated empirical tuning. Technical Report UCRL-TR-217808, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, California, USA, December 2005. doi:http://dx.doi.org/10.2172/890608.
→ DOI, BibTeX, Topics: compilers; autotuning

Dan Quinlan, Shmuel Ur, and Richard Vuduc. An extensible open-source compiler infrastructure for testing. In Proc. IBM Haifa Verification Conf. (VC), volume LNCS 3875, pages 116-133, Haifa, Israel, November 2005. Springer Berlin / Heidelberg. doi:http://dx.doi.org/10.1007/11678779_9.
→ PDF, DOI, BibTeX, Topics: compilers; testing

Richard Vuduc, James W. Demmel, and Katherine A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages 521-530, 2005. doi:http://dx.doi.org/10.1088/1742-6596/16/1/071.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance optimization

Richard W. Vuduc and Hyun-Jin Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proc. High-Performance Computing and Communications Conf. (HPCC), volume LNCS 3726, pages 807-816, Sorrento, Italy, September 2005. Springer. doi:http://dx.doi.org/10.1007/11557654_91.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance optimization, Acceptance rate: [116/387=30%]

James Demmel, Jack Dongarra, Viktor Eijkhout, Erika Fuentes, Antoine Petitet, Richard Vuduc, R. Clint Whaley, and Katherine Yelick. Self-adapting linear algebra algorithms and software. Proc. IEEE, 93(2):293-312, February 2005. doi:http://dx.doi.org/10.1109/JPROC.2004.840848.
→ DOI, BibTeX, Topics: dense linear algebra; sparse linear algebra; autotuning

Benjamin C. Lee, Richard Vuduc, James Demmel, and Katherine Yelick. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply. In Proc. Int'l. Conf. Parallel Processing (ICPP), Montreal, Canada, August 2004. doi:http://dx.doi.org/10.1109/ICPP.2004.1327917. Winner, Best Paper.
→ DOI, BibTeX, Topics: sparse linear algebra; performance modeling; autotuning, Acceptance rate: [65/190=34.2%]

Eun-Jin Im, Katherine Yelick, and Richard Vuduc. SPARSITY: Optimization framework for sparse matrix kernels. Int'l. J. High Performance Computing Applications (IJHPCA), 18(1):135-158, February 2004. doi:http://dx.doi.org/10.1177/1094342004041296.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance modeling; performance optimization

Richard W. Vuduc. Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, CA, USA, January 2004. http://bebop.cs.berkeley.edu/pubs/vuduc2003-dissertation.pdf.
→ BibTeX, Topics: performance analysis; performance modeling; performance optimization; autotuning; sparse linear algebra; statistical models

Richard Vuduc, James Demmel, and Jeff Bilmes. Statistical models for empirical search-based performance tuning. Int'l. J. High Performance Computing Applications (IJHPCA), 18(1):65-94, 2004. doi:10.1177/1094342004041293. Extends conference version: http://dx.doi.org/10.1007/3-540-45545-0_21.
→ DOI, BibTeX, Topics: statistical models; autotuning; survey; dense linear algebra; performance analysis

Richard Vuduc, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick. Memory hierarchy optimizations and bounds for sparse . In Proc. Wkshp. Parallel Linear Algebra (PLA), at Int'l. Conf. Computational Sci. (ICCS), volume LNCS 2659, pages 705-714, Melbourne, Australia, June 2003. Springer Berlin / Heidelberg. doi:http://dx.doi.org/10.1007/3-540-44863-2_69.
→ DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance modeling

Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, and Benjamin Lee. Performance optimizations and bounds for sparse matrix-vector multiply. In Proc. ACM/IEEE Conf. Supercomputing (SC), Baltimore, MD, USA, November 2002. Finalist, Best Student Paper. http://portal.acm.org/citation.cfm?id=762822.
→ BibTeX, Topics: sparse linear algebra; performance modeling; autotuning

Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, and Katherine A. Yelick. Automatic performance tuning and analysis of sparse triangular solve. In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at ACM Int'l. Conf. Supercomputing (ICS), New York, USA, June 2002. Winner, Best Presentation; Winner, Best Student Paper. http://www.ece.lsu.edu/jxr/pohll-02/papers/vuduc.pdf.
→ BibTeX, Topics: sparse linear algebra; performance modeling; autotuning

Richard Vuduc, James W. Demmel, and Jeff A. Bilmes. Statistical models for empirical search-based performance tuning. In Proc. Int'l. Conf. Computational Science (ICCS), volume LNCS 2073, pages 117-126, San Francisco, CA, USA, May 2001. Springer Berlin / Heidelberg. Extends workshop version: http://www.eecs.harvard.edu/~smith/fddo3/papers/107.ps. doi:http://dx.doi.org/10.1007/3-540-45545-0_21.
→ DOI, BibTeX, Topics: statistical models; autotuning; dense linear algebra

Richard Vuduc and James W. Demmel. Code generators for automatic tuning of numerical kernels: Experiences with FFTW. In Proc. Semantics, Applications, and Implementation of Program Generation (SAIG), at ACM SIGPLAN Int'l. Conf. Functional Programming (ICFP), Montréal, Canada, September 2000. doi:http://dx.doi.org/10.1007/3-540-45350-4_14.
→ PDF, DOI, BibTeX, Topics: program generation; signal processing; autotuning; FFT

Richard Vuduc, James Demmel, and Jeff Bilmes. Statistical modeling of feedback data in an automatic tuning system. In Proc. ACM Wkshp. Feedback-Directed Dynamic Optimization (FDDO), at Int'l. Symp. Microarchitecture (MICRO), Monterey, CA, USA, December 2000. Winner, Best Presentation. http://www.eecs.harvard.edu/~smith/fddo3/papers/107.ps.
→ BibTeX, Topics: dense linear algebra; statistical models; autotuning; performance modeling

Danyel Fisher, Kris Hildrum, Jason Hong, Mark Newman, Megan Thomas, and Richard Vuduc. SWAMI: A framework for collaborative filtering algorithm development and evaluation. In Proc. ACM Conf. Research and Development in Information Retrieval (SIGIR), pages 366-368, Athens, Greece, July 2000. (poster). doi:http://dx.doi.org/10.1145/345508.345658.
→ PDF, DOI, BibTeX, Topic: collaborative filtering

E. Jason Riedy and Richard Vuduc. Microbenchmarking the Tera MTA. http://vuduc.org/pubs/riedy99-tera-report.pdf, May 1998.
→ PDF, BibTeX, Topics: multithreaded architectures; benchmarking; performance analysis

Bohdan Balko, Irvin W. Kay, Richard Vuduc, and John W. Neuberger. Recovery of superfluorescence in inhomogeneously broadened systems through rapid relaxation. Phys. Rev. B, 55(18):12079-12085, May 1997. doi:http://dx.doi.org/10.1103/PhysRevB.55.12079.
→ PDF, DOI, BibTeX, Topic: gamma-ray lasers

Bohdan Balko, Irvin W. Kay, James D. Silk, Richard Vuduc, and John W. Neuberger. Superfluorescence in the presence of inhomogeneous broadening. Hyperfine Interactions: Special Issue on the Gamma-Ray Laser, 107(1-4):369-379, June 1997. doi:http://dx.doi.org/10.1023/A:1012020225589.
→ DOI, BibTeX, Topic: gamma-ray lasers

Bohdan Balko, Irvin Kay, Richard Vuduc, and John Neuberger. An investigation of the possible enhancement of nuclear superfluorescence through crystalline and hyperfine interaction effects. In Proc. Lasers '95, page 308, 1996.
→ BibTeX, Topic: gamma-ray lasers

Automatically created on Sun Mar 30 21:08:47 2014 by yab2web.