98%
921
2 minutes
20
Large Language Models (LLMs) have significantly advanced natural language processing (NLP), establishing new benchmarks across a wide range of tasks. However, their large parameter sizes present challenges for deployment on resource-constrained devices. Current compression techniques often treat all layers uniformly, disregarding the operational differences across layers, which can lead to varying effects on performance. In this paper, we introduce a novel method named Layer-Selective Pruning via low-Rank Approximation and Sparsification Method (LS-PRISM), efficiently compressing LLMs while preserving their performance on key NLP benchmarks, such as BoolQ, RTE, and ARC-Challenge. LS-PRISM dynamically applies low-rank approximation to selected matrices within each model layer based on their impact on accuracy and loss, with ranks adaptively determined using Dynamic Rank Selection method, retaining approximations that improve performance and leaving others unaltered. Additionally, we employ unstructured pruning on the remaining matrices to further sparsify the model, followed by optional fine-tuning using LoRA to recover performance loss. Experimental results demonstrate that LS-PRISM achieves significant reductions in both parameter count and storage, with minimal degradation in accuracy. Specifically, for a 2.5B parameter LLM, we achieve up to a 12 % reduction in parameters, while maintaining performance comparable to the original model. We also explore the method's applicability to even smaller models, and discuss the observed performance differences. LS-PRISM offers a scalable and effective solution for compressing LLMs in resource-constrained environments.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2025.107909 | DOI Listing |
Neural Netw
August 2025
Faculty of Electronics, Photonics, and Microsystems, Wroclaw University of Science and Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland.
Convolutional neural networks (CNNs) are among the most widely used machine learning models for computer vision tasks, such as image classification. To improve the efficiency of CNNs, many compression approaches have been developed. Low-rank methods approximate the original convolutional kernel with a sequence of smaller convolutional kernels, leading to reduced storage and time complexities.
View Article and Find Full Text PDFJ Chem Theory Comput
August 2025
State Key Laboratory of Precision and Intelligent Chemistry, and Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China.
The second-order Møller-Plesset perturbation (MP2) theory is a post-Hartree-Fock method widely used to describe weak correlation energies in solids and molecules, but its high computational cost scales as (). Herein, we present an accurate and efficient implementation of MP2 within the plane-wave (PW) basis set for both periodic and molecular systems, which incorporates the interpolative separable density fitting (ISDF) decomposition and the Laplace transformation (LT) of the energy denominator. These innovations avoid the direct construction of electron repulsion integrals (ERIs) and reduce the computational complexity of MP2 from () to ().
View Article and Find Full Text PDFIFAC Pap OnLine
September 2024
ECE Dept., Northeastern University, Boston, MA 02115 USA.
This paper considers the problem of non-parametric identification of low-order models from time-domain experimental data using a combination of Caratheodory Fejer and Loewner-based interpolation, followed by a Loewner matrix Balanced Reduction (LBR) step. As we show in the paper, the Loewner matrix is an estimator for the trace norm of a system, playing a role similar to the one played by the Hankel matrix. However, utilizing Zolotarev numbers to establish decay rate bounds for singular values reveals that the decay of singular values in the Loewner matrix is considerably faster than that in the Hankel matrix.
View Article and Find Full Text PDFPhys Med Biol
August 2025
Department of Radiation Oncology, UT Southwestern Medical Center, 2280 inwood road, Dallas, Texas, 75235, UNITED STATES.
Based on a 3D pre-treatment magnetic resonance (MR) scan, we developed DREME-MR to jointly reconstruct the reference patient anatomy and solve a data-driven, patient-specific cardiorespiratory motion model. Via a motion encoder simultaneously learned during the reconstruction, DREME-MR further enables real-time volumetric MR imaging and cardiorespiratory motion tracking with minimal intra-treatment k-space data. Approach: DREME-MR integrates dynamic MRI reconstruction and real-time MR imaging into a unified, dual-task learning framework.
View Article and Find Full Text PDFBiophys J
August 2025
LPTMS, CNRS, Université Paris-Saclay, Orsay, France. Electronic address:
Fluorescence recovery after photobleaching (FRAP) is broadly used to investigate the dynamics of molecules in cells and tissues, notably to quantify diffusion coefficients. FRAP is based on the spatiotemporal imaging of fluorescent molecules after an initial bleaching of fluorescence in a region of the sample. Although a large number of methods have been developed to infer kinetic parameters from experiments, it is still a challenge to fully characterize molecular dynamics from noisy experiments in which diffusion is coupled to other molecular processes or in which the initial bleaching profile is not perfectly controlled.
View Article and Find Full Text PDF