Increased computational performance for vector operations on BLAS-1
Keywords:
Scientific computing, BLAS-1, unroll technique, vector programmingAbstract
The functions library, called Basic Linear Algebra Subprograms (BLAS-1), is considered the programming standard in scientific computing. In this work, we focus on the analysis of various code optimization techniques to increase the computational performance of BLAS-1. In particular, we address a combinational approach to explore possible methods of encoding using unroll technique with di
erent levels of depth, vector data programming with MMX and SSE for Intel processors. Using the main functions of BLAS-1, it was determined numerically a computational increase, expressed in mega-ops, up to 52% compared to the optimized BLAS-1 ATLAS library
Downloads
References
Bouhamidi, A., Hached, M., y Jbilou, K. (2013). A meshless method for the numerical computation of the solution of steady burgerstype equations. Applied Numerical Mathematics, 74 (0), 95 - 110.
Chisnall, D. (2007, marzo). Programming with gcc. InformIT Article is provided courtesy of Prentice Hall Professional.
Davidson, J. W., y Jinturkar, S. (1995). Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation. En In proceedings of the 28th annual international symposium on microarchitecture (pp. 125-132). IEEE Computer Society.
Golub, G. H., y Loan, C. F. V. (1996). Matrix computations (3rd ed.). The Johns Hopkins University Press.
Goto, K., y Van De Geijn, R. (2008). High-performance implementation of the level-3 blas. ACM Trans. Math. Softw., 35 (1), 4:1-4:14.
Hennessy, J. L., y Patterson, D. A. (2003). Computer architecture: A quantitative approach (3.a ed.). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Higham, N. J. (2002). Accuracy and stability of numerical algorithms (2. ed.). SIAM.
Inc., I. (2012, abril). Intel R 64 and ia-32 architectures optimization reference manual (Vol. A) [Manual de software inform´atico].
Lawson, C. L., Hanson, R. J., Kincaid, D. R., y Krogh, F. T. (1979, septiembre). Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw., 5 (3), 308-323.
Mansour, A., y Gtze, J. (2013). Utilizing robustness of krylov subspace methods in reducing the effort of sparse matrix vector multiplication. Procedia Computer Science, 18 (0), 2406 - 2409. (2013 International Conference on Computational Science).
Mittal, M., Peleg, A., y Weiser, U. (1997). Mmx technology architecture overview. (Q3).
Napoli, E. D., Fabregat-Traver, D., Quintana-Ort, G., y Bientinesi, P. (2014). Towards an efficient use of the {BLAS} library for multilinear tensor contractions. Applied Mathematics and Computation, 235 (0), 454 - 468.
Trefethen, L. N., y Bau, D. (1997). Numerical linear algebra. SIAM.
Van Loan, C. F. (1999). Introduction to scientific computing. Prentice-Hall. Wang, Q., Zhang, X., Zhang, Y., y Yi, Q. (2013). Augem: automatically generate high performance dense linear algebra kernels on x86 cpus. En W. Gropp y S. Matsuoka (Eds.), Sc (p. 25). ACM. Whaley, R. C., y Dongarra, J. J. (1997). Automatically tuned linear algebra software (Inf. T´ec.). Knoxville, TN, USA: University of Tennessee.
Yzelman, A.-J. N., Roose, D., y Meerbergen, K. (2015). Chapter 27 - sparse matrix-vector multiplication: Parallelization and vectorization. En J. R. Jeffers (Ed.), High performance parallelism pearls (p. 457 - 476). Boston: Morgan Kaufmann
Published
How to Cite
Issue
Section
Creative Commons Reconocimiento-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
The opinions expressed by the authors do not necessarily reflect the position of the publisher of the publication or of UCLA. The total or partial reproduction of the texts published here is authorized, as long as the complete source and the electronic address of this journal are cited.
The authors fully retain the rights to their works, giving the journal the right to be the first publication where the article is presented. The authors have the right to use their articles for any purpose as long as it is done for non-profit. Authors are recommended to disseminate their articles in the final version, after publication in this journal, in the electronic media of the institutions to which they are affiliated or personal digital media.