Volume 32, Number 12, December 2021
Modeling and Analyzing Waiting Policies for Cloud-Enabled Schedulers.

Pradeep Ambati Noman Bashir David E. Irwin Prashant J. Shenoy

An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures.

Longlong Chen Jianfeng Zhu Yangdong Deng Zhaoshi Li Jian Chen Xiaowei Jiang Shouyi Yin Shaojun Wei Leibo Liu

Parallel Fine-Grained Comparison of Long DNA Sequences in Homogeneous and Heterogeneous GPU Platforms With Pruning.

Marco Antonio C. de Figueiredo João Paulo Navarro Edans Flavius de Oliveira Sandes George Teodoro Alba C. M. A. Melo

Logically Parallel Communication for Fast MPI+Threads Applications.

Rohit Zambre Damodar Sahasrabudhe Hui Zhou Martin Berzins Aparna Chandramowlishwaran Pavan Balaji

A Unified Framework for Flexible Playback Latency Control in Live Video Streaming.

Guanghui Zhang Jack Y. B. Lee Ke Liu Haibo Hu Vaneet Aggarwal

Self-Stabilizing Population Protocols With Global Knowledge.

Yuichi Sudo Masahiro Shibata Junya Nakamura Yonghwan Kim Toshimitsu Masuzawa

ReliableBox: Secure and Verifiable Cloud Storage With Location-Aware Backup.

Tao Jiang Wenjuan Meng Xu Yuan Liangmin Wang Jianhua Ge Jianfeng Ma

Fast, Accurate Processor Evaluation Through Heterogeneous, Sample-Based Benchmarking.

Pablo Prieto Pablo Abad Fidalgo José-Ángel Gregorio Valentin Puente

Scalable Energy Games Solvers on GPUs.

Andrea Formisano Raffaella Gentilini Flavio Vella

Architectural Adaptation and Performance-Energy Optimization for CFD Application on AMD EPYC Rome.

Lukasz Szustak Roman Wyrzykowski Lukasz Kuczynski Tomasz Olas

Towards Efficient Distributed Subgraph Enumeration Via Backtracking-Based Framework.

Zhaokang Wang Weiwei Hu Guowang Chen Chunfeng Yuan Rong Gu Yihua Huang

Identifying Degree and Sources of Non-Determinism in MPI Applications Via Graph Kernels.

Dylan Chapp Nigel Tan Sanjukta Bhowmick Michela Taufer

GML: Efficiently Auto-Tuning Flink's Configurations Via Guided Machine Learning.

Yijin Guo Huasong Shan Shixin Huang Kai Hwang Jianping Fan Zhibin Yu

RENDA: Resource and Network Aware Data Placement Algorithm for Periodic Workloads in Cloud.

Hiren Kumar Thakkar Prasan Kumar Sahoo Bharadwaj Veeravalli

A Quantum Approach Towards the Adaptive Prediction of Cloud Workloads.

Ashutosh Kumar Singh Deepika Saxena Jitendra Kumar Vrinda Gupta

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.

Cheng Tan Chenhao Xie Tong Geng Andres Marquez Antonino Tumeo Kevin J. Barker Ang Li

Tardiness Bounds for Sporadic Gang Tasks Under Preemptive Global EDF Scheduling.

Zheng Dong Kecheng Yang Nathan Fisher Cong Liu


Volume 32, Number 11, November 2021
High Performance Multivariate Geospatial Statistics on Manycore Systems.

Mary Lai O. Salvaña Sameh Abdulah Huang Huang Hatem Ltaief Ying Sun Marc G. Genton David E. Keyes

Multi-Queue Request Scheduling for Profit Maximization in IaaS Clouds.

Shuang Wang Xiaoping Li Quan Z. Sheng Rubén Ruiz Jinquan Zhang Amin Beheshti

An Incremental Iterative Acceleration Architecture in Distributed Heterogeneous Environments With GPUs for Deep Learning.

Xuedong Zhang Zhuo Tang Lifan Du Li Yang

A Split Execution Model for SpTRSV.

Najeeb Ahmad Buse Yilmaz Didem Unat

Efficient Virtual Network Embedding of Cloud-Based Data Center Networks into Optical Networks.

Weibei Fan Fu Xiao Xiaobai Chen Lei Cui Shui Yu

Offloading Tasks With Dependency and Service Caching in Mobile Edge Computing.

Gongming Zhao Hongli Xu Yangming Zhao Chunming Qiao Liusheng Huang

Improved MPC Algorithms for Edit Distance and Ulam Distance.

Mahdi Boroujeni Mohammad Ghodsi Saeed Seddighin

WindFlow: High-Speed Continuous Stream Processing With Parallel Building Blocks.

Gabriele Mencagli Massimo Torquati Andrea Cardaci Alessandra Fais Luca Rinaldi Marco Danelutto

Memory-Side Prefetching Scheme Incorporating Dynamic Page Mode in 3D-Stacked DRAM.

Muhammad M. Rafique Zhichun Zhu

MCFsyn: A Multi-Party Set Reconciliation Protocol With the Marked Cuckoo Filter.

Lailong Luo Deke Guo Yawei Zhao Ori Rottenstreich Richard T. B. Ma Xueshan Luo

Timestamped State Sharing for Stream Analytics.

Yunjian Zhao Zhi Liu Yidi Wu Guanxian Jiang James Cheng Kunlong Liu Xiao Yan

Efficient Forwarding Anomaly Detection in Software-Defined Networks.

Qi Li Yunpeng Liu Zhuotao Liu Peng Zhang Chunhui Pang

Coflow Scheduling in Data Centers: Routing and Bandwidth Allocation.

Li Shi Yang Liu Junwei Zhang Thomas G. Robertazzi

Trust: Triangle Counting Reloaded on GPUs.

Santosh Pandey Zhibin Wang Sheng Zhong Chen Tian Bolong Zheng Xiaoye Li Lingda Li Adolfy Hoisie Caiwen Ding Dong Li Hang Liu

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From Peking University.

Yihua Cheng Zejia Fan Jing Mai Yifan Wu Pengcheng Xu Yuxuan Yan Zhenxin Fu Yun Liang

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From University of Washington.

David Liu Matthew Cinnamon Thorne Garvin Andrei Karavanov Sungchan Park Darius Strobeck Andrew Lumsdaine

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From University of Warsaw.

Marek Masiak Iwona Kotlarska Ukasz Kondraciuk Maciej Szpindler

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From Tsinghua University.

Chen Zhang Chenggang Zhao Jiaao He Shengqi Chen Liyan Zheng Kezhao Huang Wentao Han Jidong Zhai

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From ETH Zurich.

Manuel Burger Jan Kleine

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From National Tsing Hua University.

Wei-Fang Sun Hung-Hsin Chen ShaoFu Lin YuanChing Lin Jing-Wei Wu En-Te Lin Jerry Chou

Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility.

Jia Shi Ruipeng Li Yuanzhe Xi Yousef Saad Maarten V. de Hoop

Transparency and Reproducibility Practice in Large-Scale Computational Science: A Preface to the Special Section.

Beth Plale Stephen Lien Harrell

Guest Editorial: Special Section on SC19 Student Cluster Competition.

Manish Parashar


Volume 32, Number 10, October 2021
Decentralized Dual Proximal Gradient Algorithms for Non-Smooth Constrained Composite Optimization Problems.

Huaqing Li Jinhui Hu Liang Ran Zheng Wang Qingguo Lü Zhenyuan Du Tingwen Huang

LightChain: Scalable DHT-Based Blockchain.

Yahya Hassanzadeh-Nazarabadi Alptekin Küpçü Öznur Özkasap

Data Life Aware Model Updating Strategy for Stream-Based Online Deep Learning.

Wei Rang Donglin Yang Dazhao Cheng Yu Wang

Virtualization Overhead of Multithreading in X86 State-of-the-Art & Remaining Challenges.

Stijn Schildermans Jianchen Shan Kris Aerts Jason Jackrel Xiaoning Ding

Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs.

Youhui Bai Cheng Li Zhiqi Lin Yufei Wu Youshan Miao Yunxin Liu Yinlong Xu

VeriML: Enabling Integrity Assurances and Fair Payments for Machine Learning as a Service.

Lingchen Zhao Qian Wang Cong Wang Qi Li Chao Shen Bo Feng

DTransE: Distributed Translating Embedding for Knowledge Graph.

Dandan Song Feng Zhang Meiyan Lu Sicheng Yang Heyan Huang

FRATO: Fog Resource Based Adaptive Task Offloading for Delay-Minimizing IoT Service Provisioning.

Hoa Tran-Dang Dong-Seong Kim

Group Reassignment for Dynamic Edge Partitioning.

He Li Hang Yuan Jianbin Huang Jiangtao Cui Xiaoke Ma Senzhang Wang Jaesoo Yoo Philip S. Yu

The Case for Cross-Component Power Coordination on Power Bounded Systems.

Rong Ge Xizhou Feng Tyler Allen Pengfei Zou

Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs.

Shi Dong Yifan Sun Nicolas Bohm Agostini Elmira Karimi Daniel Lowell Jing Zhou José Cano José L. Abellán David R. Kaeli

3D Perception With Slanted Stixels on GPU.

Daniel Hernández Juárez Antonio Espinosa David Vázquez Antonio M. López Juan C. Moure

ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms.

Saba Ahmadian Reza Salkhordeh Onur Mutlu Hossein Asadi

Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm.

Gregory J. Herschlag Seyong Lee Jeffrey S. Vetter Amanda Randles

gIM: GPU Accelerated RIS-Based Influence Maximization Algorithm.

Soheil Shahrouz Saber Salehkaleybar Matin Hashemi

Editor's Note.

Manish Parashar


Volume 32, Number 9, September 2021
Optimizing the LINPACK Algorithm for Large-Scale PCIe-Based CPU-GPU Heterogeneous Systems.

Guangming Tan Chaoyang Shui Yinshan Wang Xianzhi Yu Yujin Yan

Accelerating the Bron-Kerbosch Algorithm for Maximal Clique Enumeration Using GPUs.

Yi-Wen Wei Wei-Mei Chen Hsin-Hung Tsai

OWebSync: Seamless Synchronization of Distributed Web Clients.

Kristof Jannes Bert Lagaisse Wouter Joosen

YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve.

Feng Zhang Jiya Su Weifeng Liu Bingsheng He Ruofan Wu Xiaoyong Du Rujia Wang

Fine-Grained Multi-Query Stream Processing on Integrated Architectures.

Feng Zhang Chenyang Zhang Lin Yang Shuhao Zhang Bingsheng He Wei Lu Xiaoyong Du

BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUs.

Jing Chen Jianbin Fang Weifeng Liu Canqun Yang

PISTIS: An Event-Triggered Real-Time Byzantine-Resilient Protocol Suite.

David Kozhaya Jérémie Decouchant Vincent Rahli Paulo Esteves Veríssimo

An Efficient Parallel Secure Machine Learning Framework on GPUs.

Feng Zhang Zheng Chen Chenyang Zhang Amelie Chi Zhou Jidong Zhai Xiaoyong Du

Structured Allocation-Based Consistent Hashing With Improved Balancing for Cloud Infrastructure.

Yuichi Nakatani

Accurate Differentially Private Deep Learning on the Edge.

Rui Han Dong Li Junyan Ouyang Chi Harold Liu Guoren Wang Dapeng Wu Lydia Y. Chen

A Survey of System Architectures and Techniques for FPGA Virtualization.

Masudul Hassan Quraishi Erfan Bank Tavakoli Fengbo Ren

Octans: Optimal Placement of Service Function Chains in Many-Core Systems.

Heng Yu Zhilong Zheng Junxian Shen Congcong Miao Chen Sun Hongxin Hu Jun Bi Jianping Wu Jilong Wang

Optimizing Resource Allocation for Data-Parallel Jobs Via GCN-Based Prediction.

Zhiyao Hu Dongsheng Li Dongxiang Zhang Yiming Zhang Baoyun Peng

DeepSlicing: Collaborative and Adaptive CNN Inference With Low Latency.

Shuai Zhang Sheng Zhang Zhuzhong Qian Jie Wu Yibo Jin Sanglu Lu

Retargeting Tensor Accelerators for Epistasis Detection.

Ricardo Nobre Aleksandar Ilic Sergio Santander-Jiménez Leonel Sousa

Overlapping Communication With Computation in Parameter Server for Scalable DL Training.

Shaoqi Wang Aidi Pi Xiaobo Zhou Jun Wang Cheng-Zhong Xu


Volume 32, Number 8, August 2021
Joint SFC Deployment and Resource Management in Heterogeneous Edge for Latency Minimization.

Yu Liu Xiaojun Shang Yuanyuan Yang

Online Scheduling Technique To Handle Data Velocity Changes in Stream Workflows.

Mutaz Barika Saurabh Garg Albert Y. Zomaya Rajiv Ranjan

High-Performance Computing Implementations of Agent-Based Economic Models for Realizing 1: 1 Scale Simulations of Large Economies.

Amit Gill Lalith Maddegedara Sebastian Poledna Muneo Hori Kohei Fujita Tsuyoshi Ichimura

Joint Task Scheduling and Containerizing for Efficient Edge Computing.

Jiawei Zhang Xiaochen Zhou Tianyi Ge Xudong Wang Taewon Hwang

Proof of Federated Learning: A Novel Energy-Recycling Consensus Algorithm.

Xidi Qu Shengling Wang Qin Hu Xiuzhen Cheng

A Fault-Tolerant Distributed Framework for Asynchronous Iterative Computations.

Tian Zhou Lixin Gao Xiaohong Guan

Silhouette: Efficient Cloud Configuration Exploration for Large-Scale Analytics.

Yanjiao Chen Long Lin Baochun Li Qian Wang Qian Zhang

Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods.

Mochamad Asri Dhairya Malhotra Jiajun Wang George Biros Lizy K. John Andreas Gerstlauer

Hone: Mitigating Stragglers in Distributed Stream Processing With Tuple Scheduling.

Wenxin Li Duowen Liu Kai Chen Keqiu Li Heng Qi

Pebbles: Leveraging Sketches for Processing Voluminous, High Velocity Data Streams.

Thilina Buddhika Sangmi Lee Pallickara Shrideep Pallickara

A GPU Acceleration Framework for Motif and Discord Based Pattern Mining.

Biru Zhu Youyou Jiang Ming Gu Yangdong Deng

True Load Balancing for Matricized Tensor Times Khatri-Rao Product.

Nabil Abubaker Seher Acer Cevdet Aykanat

e-PoS: Making Proof-of-Stake Decentralized and Fair.

Muhammad Saad Zhan Qin Kui Ren DaeHun Nyang David Mohaisen

DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters.

Yanghua Peng Yixin Bao Yangrui Chen Chuan Wu Chen Meng Wei Lin

An Optimized Weighted Average Makespan in Fault-Tolerant Heterogeneous MPSoCs.

Hassan A. Youness Aly Omar Mohammed Moness

Burst Load Evacuation Based on Dispatching and Scheduling In Distributed Edge Networks.

Shuiguang Deng Cheng Zhang Chang Li Jianwei Yin Schahram Dustdar Albert Y. Zomaya

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning.

Shaohuai Shi Xiaowen Chu Bo Li


Volume 32, Number 7, July 2021
EDGES: An Efficient Distributed Graph Embedding System on GPU Clusters.

Dongxu Yang Junhong Liu Junjie Lai

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs.

Ang Li Simon Su

Efficient Methods for Mapping Neural Machine Translator on FPGAs.

Qin Li Xiaofan Zhang Jinjun Xiong Wen-Mei Hwu Deming Chen

Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach.

Lei Gong Chao Wang Xi Li Xuehai Zhou

Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity.

Min Li Yulong Ao Chao Yang

SGD$\_$_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition.

Hao Li Zixuan Li Kenli Li Jan S. Rellermeyer Lydia Y. Chen Keqin Li

A Probabilistic Machine Learning Approach to Scheduling Parallel Loops With Bayesian Optimization.

Khu-rai Kim Youngjae Kim Sungyong Park

Accelerating End-to-End Deep Learning Workflow With Codesign of Data Preprocessing and Scheduling.

Yang Cheng Dan Li Zhiyuan Guo Binyao Jiang Jinkun Geng Wei Bai Jianping Wu Yongqiang Xiong

Fine-Grained Powercap Allocation for Power-Constrained Systems Based on Multi-Objective Machine Learning.

Meng Hao Weizhe Zhang Yiming Wang Gangzhao Lu Farui Wang Athanasios V. Vasilakos

Privacy-Preserving Computation Offloading for Parallel Deep Neural Networks Training.

Yunlong Mao Wenbo Hong Heng Wang Qun Li Sheng Zhong

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression.

Cody Blakeney Xiaomin Li Yan Yan Ziliang Zong

A Distributed Framework for EA-Based NAS.

Qing Ye Yanan Sun Jixin Zhang Jiancheng Lv

iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures.

Chenyang Zhang Feng Zhang Xiaoguang Guo Bingsheng He Xiao Zhang Xiaoyong Du

Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.

Shigang Li Tal Ben-Nun Giorgi Nadiradze Salvatore Di Girolamo Nikoli Dryden Dan Alistarh Torsten Hoefler

A Runtime and Non-Intrusive Approach to Optimize EDP by Tuning Threads and CPU Frequency for OpenMP Applications.

Janaina Schwarzrock Charles Cardoso De Oliveira Marcus Ritt Arthur Francisco Lorenzon Antonio Carlos Schneider Beck

Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms.

Daning Cheng Shigang Li Hanping Zhang Fen Xia Yunquan Zhang

SmartTuning: Selecting Hyper-Parameters of a ConvNet System for Fast Training and Small Working Memory.

Xiaqing Li Guangyan Zhang Weimin Zheng

FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks.

Kai Zhao Sheng Di Sihuan Li Xin Liang Yujia Zhai Jieyang Chen Kaiming Ouyang Franck Cappello Zizhong Chen

Model Parallelism Optimization for Distributed Inference Via Decoupled CNN Structure.

Jiangsu Du Xin Zhu Minghua Shen Yunfei Du Yutong Lu Nong Xiao Xiangke Liao

A Hybrid Fuzzy Convolutional Neural Network Based Mechanism for Photovoltaic Cell Defect Detection With Electroluminescence Images.

Chunpeng Ge Zhe Liu Liming Fang Huading Ling Aiping Zhang Changchun Yin

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism.

Yosuke Oyama Naoya Maruyama Nikoli Dryden Erin McCarthy Peter Harrington Jan Balewski Satoshi Matsuoka Peter Nugent Brian Van Essen

A Game-Based Approach for Cost-Aware Task Assignment With QoS Constraint in Collaborative Edge and Cloud Environments.

Saiqin Long Weifan Long Zhetao Li Kenli Li Yuanqing Xia Zhuo Tang

Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection.

Liangyi Gong Hao Lin Zhenhua Li Feng Qian Yang Li Xiaobo Ma Yunhao Liu

Distributed Task Migration Optimization in MEC by Extending Multi-Agent Deep Reinforcement Learning Approach.

Chubo Liu Fan Tang Yikun Hu Kenli Li Zhuo Tang Keqin Li

Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms.

Rui Han Shilin Li Xiangwei Wang Chi Harold Liu Gaofeng Xin Lydia Y. Chen

Learning Spatiotemporal Failure Dependencies for Resilient Edge Computing Services.

Atakan Aral Ivona Brandic

FedSCR: Structure-Based Communication Reduction for Federated Learning.

Xueyu Wu Xin Yao Cho-Li Wang

An Efficiency-Boosting Client Selection Scheme for Federated Learning With Fairness Guarantee.

Tiansheng Huang Weiwei Lin Wentai Wu Ligang He Keqin Li Albert Y. Zomaya

Accelerating Federated Learning Over Reliability-Agnostic Clients in Mobile Edge Computing Systems.

Wentai Wu Ligang He Weiwei Lin Rui Mao

Mutual Information Driven Federated Learning.

Md. Palash Uddin Yong Xiang Xuequan Lu John Yearwood Longxiang Gao

Biscotti: A Blockchain System for Private and Secure Federated Learning.

Muhammad Shayan Clement Fung Chris J. M. Yoon Ivan Beschastnikh

Guest Editorial.

Pavan Balaji Jidong Zhai Min Si


Volume 32, Number 6, June 2021
Network-Aware Locality Scheduling for Distributed Data Operators in Data Centers.

Long Cheng Ying Wang Qingzhi Liu Dick H. J. Epema Cheng Liu Ying Mao John Murphy

A Scalable Platform for Distributed Object Tracking Across a Many-Camera Network.

Aakash Khochare Aravindhan Krishnan Yogesh Simmhan

A High-Throughput FPGA Accelerator for Short-Read Mapping of the Whole Human Genome.

Yen-Lung Chen Bo-Yi Chang Chia-Hsiang Yang Tzi-Dar Chiueh

A Parallel Jacobi-Embedded Gauss-Seidel Method.

Afshin Ahmadi Felice Manganiello Amin Khademi Melissa C. Smith

Co-Active: A Workload-Aware Collaborative Cache Management Scheme for NVMe SSDs.

Hui Sun Shangshang Dai Jianzhong Huang Xiao Qin

Reversible CSP Computations.

Carlos Galindo Naoki Nishida Josep Silva Salvador Tamarit

A Machine-Learning-Based Framework for Productive Locality Exploitation.

Engin Kayraklioglu Erwan Favry Tarek A. El-Ghazawi

Optimised Lambda Architecture for Monitoring Scientific Infrastructure.

Uthayanath Suthakar Luca Magnoni David Ryan Smith Akram Khan

A Scalable Stateful Approach for Virtual Security Functions Orchestration.

Niloofar Moradi Alireza Shameli Sendi Alireza Khajouei

On Consortium Blockchain Consistency: A Queueing Network Model Approach.

Tianhui Meng Yubin Zhao Katinka Wolter Cheng-Zhong Xu

Reliability and Confidentiality Co-Verification for Parallel Applications in Distributed Systems.

Guoqi Xie Kehua Yang Haibo Luo Renfa Li Shiyan Hu

Rings for Privacy: An Architecture for Large Scale Privacy-Preserving Data Mining.

Maria Luisa Merani Daniele Croce Ilenia Tinnirello

Partitioning-Based Scheduling of OpenMP Task Systems With Tied Tasks.

Yang Wang Xu Jiang Nan Guan Zhishan Guo Xue Liu Wang Yi

E2bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services.

Weihao Cui Quan Chen Han Zhao Mengze Wei Xiaoxin Tang Minyi Guo

Distributed Adaptive Consensus Tracking Control for Multi-Agent System With Communication Constraints.

Pu Zhang Huifeng Xue Shan Gao Jialong Zhang

Distributed and Dynamic Service Placement in Pervasive Edge Computing Networks.

Zhaolong Ning Peiran Dong Xiaojie Wang Shupeng Wang Xiping Hu Song Guo Tie Qiu Bin Hu Ricky Yu-Kwong Kwok


Volume 32, Number 5, May 2021
A Case for Pricing Bandwidth: Sharing Datacenter Networks With Cost Dominant Fairness.

Li Chen Yuan Feng Baochun Li Bo Li

On the Effective Parallelization and Near-Optimal Deployment of Service Function Chains.

Jianzhen Luo Jun Li Lei Jiao Jun Cai

Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore Processors.

Teng Yu Runxin Zhong Vladimir Janjic Pavlos Petoumenos Jidong Zhai Hugh Leather John Thomson

Auditing Cache Data Integrity in the Edge Computing Environment.

Bo Li Qiang He Feifei Chen Hai Jin Yang Xiang Yun Yang

PaKman: A Scalable Algorithm for Generating Genomic Contigs on Distributed Memory Machines.

Priyanka Ghosh Sriram Krishnamoorthy Ananth Kalyanaraman

Profiles of Upcoming HPC Applications and Their Impact on Reservation Strategies.

Ana Gainaru Brice Goglin Valentin Honoré Guillaume Pallez

Efficient Buffer Overflow Detection on GPU.

Bang Di Jianhua Sun Hao Chen Dong Li

A Scalable Multi-Layer PBFT Consensus for Blockchain.

Wenyu Li Chenglin Feng Lei Zhang Hao Xu Bin Cao Muhammad Ali Imran

Multi-Hop Multi-Task Partial Computation Offloading in Collaborative Edge Computing.

Yuvraj Sahni Jiannong Cao Lei Yang Yusheng Ji

Design and Implementation of a Criticality- and Heterogeneity-Aware Runtime System for Task-Parallel Applications.

Myeonggyun Han Jinsu Park Woongki Baek

Subutai: Speeding Up Legacy Parallel Applications Through Data Synchronization.

Rodrigo Cataldo Ramon Fernandes Kevin J. M. Martin Jarbas Silveira Gustavo Sanchez Johanna Sepúlveda César A. M. Marcon Jean-Philippe Diguet

Distributed and Collective Deep Reinforcement Learning for Computation Offloading: A Practical Perspective.

Xiaoyu Qiu Weikun Zhang Wuhui Chen Zibin Zheng

Privacy-Preserving Similarity Search With Efficient Updates in Distributed Key-Value Stores.

Wanyu Lin Helei Cui Baochun Li Cong Wang

IPPTS: An Efficient Algorithm for Scientific Workflow Scheduling in Heterogeneous Computing Systems.

Hamza Djigal Jun Feng Jiamin Lu Jidong Ge

Thermal Prediction for Efficient Energy Management of Clouds Using Machine Learning.

Shashikant Ilager Kotagiri Ramamohanarao Rajkumar Buyya

Petrel: Heterogeneity-Aware Distributed Deep Learning Via Hybrid Synchronization.

Qihua Zhou Song Guo Zhihao Qu Peng Li Li Li Minyi Guo Kun Wang

Transformations of High-Level Synthesis Codes for High-Performance Computing.

Johannes de Fine Licht Maciej Besta Simon Meierhans Torsten Hoefler

Boosting Parallel Influence-Maximization Kernels for Undirected Networks With Fusing and Vectorization.

Gökhan Göktürk Kamer Kaya

Analysis of Global and Local Synchronization in Parallel Computing.

Franco Cicirelli Andrea Giordano Carlo Mastroianni


Volume 32, Number 4, April 2021
Parallelization and Optimization of NSGA-II on Sunway TaihuLight System.

Xin Liu Jun Sun Lin Zheng Su Wang Yao Liu Tongquan Wei

A Generic Stochastic Model for Resource Availability in Fog Computing Environments.

Sudheer Kumar Battula Malgorzata M. O'Reilly Saurabh Garg James Montgomery

High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks.

Maciej Besta Jens Domke Marcel Schneider Marek Konieczny Salvatore Di Girolamo Timo Schneider Ankit Singla Torsten Hoefler

Towards Greening MapReduce Clusters Considering Both Computation Energy and Cooling Energy.

Tarik Reza Toha A. S. M. Rizvi Jannatun Noor Muhammad Abdullah Adnan A. B. M. Alim Al Islam

Large-Scale Analysis of Docker Images and Performance Implications for Container Storage Systems.

Nannan Zhao Vasily Tarasov Hadeel Albahar Ali Anwar Lukas Rupprecht Dimitrios Skourtis Arnab Kumar Paul Keren Chen Ali Raza Butt

Canary: Decentralized Distributed Deep Learning Via Gradient Sketch and Partition in Multi-Interface Networks.

Qihua Zhou Kun Wang Haodong Lu Wenyao Xu Yanfei Sun Song Guo

Coarse-Grained Parallel Routing With Recursive Partitioning for FPGAs.

Minghua Shen Guojie Luo Nong Xiao

Towards Efficient Large-Scale Interprocedural Program Static Analysis on Distributed Data-Parallel Computation.

Rong Gu Zhiqiang Zuo Xi Jiang Han Yin Zhaokang Wang Linzhang Wang Xuandong Li Yihua Huang

MO-Tree: An Efficient Forwarding Engine for Spatiotemporal-Aware Pub/Sub Systems.

Tianchen Ding Shiyou Qian Jian Cao Guangtao Xue Yanmin Zhu Jiadi Yu Minglu Li

SEIZE: Runtime Inspection for Parallel Dataflow Systems.

Youfu Li Matteo Interlandi Fotis Psallidas Wei Wang Carlo Zaniolo

Cuttlefish: Neural Configuration Adaptation for Video Analysis in Live Augmented Reality.

Ning Chen Siyi Quan Sheng Zhang Zhuzhong Qian Yibo Jin Jie Wu Wenzhong Li Sanglu Lu

Achieving Probabilistic Atomicity With Well-Bounded Staleness and Low Read Latency in Distributed Datastores.

Lingzhi Ouyang Yu Huang Hengfeng Wei Jian Lu

Energy-Aware Inference Offloading for DNN-Driven Applications in Mobile Edge Clouds.

Zichuan Xu Liqian Zhao Weifa Liang Omer F. Rana Pan Zhou Qiufen Xia Wenzheng Xu Guowei Wu

BOSSA: A Decentralized System for Proofs of Data Retrievability and Replication.

Dian Chen Haobo Yuan Shengshan Hu Qian Wang Cong Wang

Resettable Encoded Vector Clock for Causality Analysis With an Application to Dynamic Race Detection.

Tommaso Pozzetti Ajay D. Kshemkalyani

Homomorphic Sorting With Better Scalability.

Gizem S. Çetin Erkay Savas Berk Sunar

Accelerating Large-Scale Prioritized Graph Computations by Hotness Balanced Partition.

Shufeng Gong Yanfeng Zhang Ge Yu

Editor's Note.

Manish Parashar


Volume 32, Number 3, March 2021
Achieving Fine-Grained Flow Management Through Hybrid Rule Placement in SDNs.

Gongming Zhao Hongli Xu Jingyuan Fan Liusheng Huang Chunming Qiao

The Deep Learning Compiler: A Comprehensive Survey.

Mingzhen Li Yi Liu Xiaoyan Liu Qingxiao Sun Xin You Hailong Yang Zhongzhi Luan Lin Gan Guangwen Yang Depei Qian

Hierarchical Multi-Agent Optimization for Resource Allocation in Cloud Computing.

Xiangqiang Gao Rongke Liu Aryan Kaushik

Cryptomining Detection in Container Clouds Using System Calls and Explainable Machine Learning.

Rupesh Raj Karn Prabhakar Kudva Hai Huang Sahil Suneja Ibrahim M. Elfadel

Constructing Completely Independent Spanning Trees in Data Center Network Based on Augmented Cube.

Guo Chen Baolei Cheng Dajin Wang

Lewat: A Lightweight, Efficient, and Wear-Aware Transactional Persistent Memory System.

Kaixin Huang Sumin Li Linpeng Huang Kian-Lee Tan Hong Mei

Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters.

Florian Glaser Giuseppe Tagliavini Davide Rossi Germain Haugou Qiuting Huang Luca Benini

Modeling and Optimization of Performance and Cost of Serverless Applications.

Changyuan Lin Hamzeh Khazaei

Failure-Atomic Byte-Addressable R-tree for Persistent Memory.

Soojeong Cho Wonbae Kim Sehyeon Oh Changdae Kim Kwangwon Koh Beomseok Nam

Investigating the Adoption of Hybrid Encrypted Cloud Data Deduplication With Game Theory.

Xueqin Liang Zheng Yan Robert H. Deng Qinghua Zheng

PQC Acceleration Using GPUs: FrodoKEM, NewHope, and Kyber.

Naina Gupta Arpan Jati Amit Kumar Chauhan Anupam Chattopadhyay

Privacy-Preserving Multi-Keyword Searchable Encryption for Distributed Systems.

Xueqiao Liu Guomin Yang Willy Susilo Joseph Tonien Ximeng Liu Jian Shen

Bi-Objective Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Performance and Energy Through Workload Distribution.

Hamidreza Khaleghzadeh Muhammad Fahad Arsalan Shahid Ravi Reddy Manumachu Alexey L. Lastovetsky

Optimistic Causal Consistency for Geo-Replicated Key-Value Stores.

Kristina Spirovska Diego Didona Willy Zwaenepoel

ADRL: A Hybrid Anomaly-Aware Deep Reinforcement Learning-Based Resource Scaling in Clouds.

Sara Kardani-Moghaddam Rajkumar Buyya Kotagiri Ramamohanarao

A Thread Level SLO-Aware I/O Framework for Embedded Virtualization.

Xiaoli Gong Dingyuan Cao Yuxuan Li Ximing Liu Yusen Li Jin Zhang Tao Li


Volume 32, Number 2, February 2021
QShield: Protecting Outsourced Cloud Data Queries With Multi-User Access Control Based on SGX.

Yaxing Chen Qinghua Zheng Zheng Yan Dan Liu

Dynamic Load Balancing in Parallel Execution of Cellular Automata.

Andrea Giordano Alessio De Rango Rocco Rongo Donato D'Ambrosio William Spataro

Minimizing Coflow Completion Time in Optical Circuit Switched Networks.

Tong Zhang Fengyuan Ren Jiakun Bao Ran Shu Wenxue Cheng

Adaptive Preference-Aware Co-Location for Improving Resource Utilization of Power Constrained Datacenters.

Pu Pang Quan Chen Deze Zeng Minyi Guo

Towards Minimizing Resource Usage With QoS Guarantee in Cloud Gaming.

Yusen Li Changjian Zhao Xueyan Tang Wentong Cai Xiaoguang Liu Gang Wang Xiaoli Gong

Multi-Agent Imitation Learning for Pervasive Edge Computing: A Decentralized Computation Offloading Algorithm.

Xiaojie Wang Zhaolong Ning Song Guo

Towards Efficient Scheduling of Federated Mobile Devices Under Computational and Statistical Heterogeneity.

Cong Wang Yuanyuan Yang Pengzhan Zhou

Comment on "Circuit Ciphertext-Policy Attribute-Based Hybrid Encryption With Verifiable Delegation in Cloud Computing".

Zhengjun Cao Olivier Markowitch

Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters.

Ahmad Al Badawi Bharadwaj Veeravalli Jie Lin Xiao Nan Kazuaki Matsumura Khin Mi Mi Aung

A Parallel Structured Divide-and-Conquer Algorithm for Symmetric Tridiagonal Eigenvalue Problems.

Xia Liao Shengguo Li Yutong Lu José E. Román

A Resource and Performance Optimization Reduction Circuit on FPGAs.

Linhuai Tang Gang Cai Yong Zheng Jiamin Chen

CPDE: A Methodology for the Transparent Distribution of Centralized Smart Grid Programs.

Thi-Thanh-Quynh Nguyen Christophe Bobineau Vincent Debusschere Quang-Huy Giap Nouredine Hadjsaid

An Automatic Synthesizer of Advising Tools for High Performance Computing.

Hui Guan Xipeng Shen Hamid Krim

Realizing Best Checkpointing Control in Computing Systems.

Purushottam Sigdel Xu Yuan Nian-Feng Tzeng

Recent Advances of Resource Allocation in Network Function Virtualization.

Song Yang Fan Li Stojan Trajanovski Ramin Yahyapour Xiaoming Fu

Online Collaborative Data Caching in Edge Computing.

Xiaoyu Xia Feifei Chen Qiang He John Grundy Mohamed Abdelrazek Hai Jin

A Two-Phase Dynamic Throughput Optimization Model for Big Data Transfers.

Md. S. Q. Zulkar Nine Tevfik Kosar

Middleware to Manage Fault Tolerance Using Semi-Coordinated Checkpoints.

Alvaro Wong Elisa Heymann Dolores Rexachs Emilio Luque


Volume 32, Number 1, January 2021
Fast Adaptive Task Offloading in Edge Computing Based on Meta Reinforcement Learning.

Jin Wang Jia Hu Geyong Min Albert Y. Zomaya Nektarios Georgalas

Multi-GPU Parallelization of the NAS Multi-Zone Parallel Benchmarks.

Marc González Enric Morancho

Improving the Performance of Deduplication-Based Storage Cache via Content-Driven Cache Management Methods.

Yujuan Tan Congcong Xu Jing Xie Zhichao Yan Hong Jiang Witawas Srisa-an Xianzhang Chen Duo Liu

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.

Tong Geng Ang Li Tianqi Wang Chunshu Wu Yanfei Li Runbin Shi Wei Wu Martin C. Herbordt

Rusty: Runtime Interference-Aware Predictive Monitoring for Modern Multi-Tenant Systems.

Dimosthenis Masouros Sotirios Xydis Dimitrios Soudris

Blockchain at the Edge: Performance of Resource-Constrained IoT Networks.

Sudip Misra Anandarup Mukherjee Arijit Roy Nishant Saurabh Yogachandran Rahulamathavan Muttukrishnan Rajarajan

Feluca: A Two-Stage Graph Coloring Algorithm With Color-Centric Paradigm on GPU.

Zhigao Zheng Xuanhua Shi Ligang He Hai Jin Shuo Wei Hulin Dai Xuan Peng

Partitioning Models for General Medium-Grain Parallel Sparse Tensor Decomposition.

M. Ozan Karsavuran Seher Acer Cevdet Aykanat

CASpMV: A Customized and Accelerative SpMV Framework for the Sunway TaihuLight.

Guoqing Xiao Kenli Li Yuedan Chen Wangquan He Albert Y. Zomaya Tao Li

Sova: A Software-Defined Autonomic Framework for Virtual Network Allocations.

Zhiyong Ye Yang Wang Shuibing He Chengzhong Xu Xian-He Sun

Elastic Scheduling for Microservice Applications in Clouds.

Sheng Wang Zhijun Ding Changjun Jiang

K-Athena: A Performance Portable Structured Grid Finite Volume Magnetohydrodynamics Code.

Philipp Grete Forrest Wolfgang Glines Brian W. O'Shea

GPU Tensor Cores for Fast Arithmetic Reductions.

Cristóbal A. Navarro Roberto Carrasco Ricardo J. Barrientos Javier A. Riquelme Raimundo Vega

Self-Balancing Federated Learning With Global Imbalanced Data in Mobile Systems.

Moming Duan Duo Liu Xianzhang Chen Renping Liu Yujuan Tan Liang Liang

PredCom: A Predictive Approach to Collecting Approximated Communication Traces.

Shinobu Miwa Ignacio Laguna Martin Schulz

Cost-Effective App Data Distribution in Edge Computing.

Xiaoyu Xia Feifei Chen Qiang He John C. Grundy Mohamed Abdelrazek Hai Jin

Design and Evaluation of a Risk-Aware Failure Identification Scheme for Improved RAS in Erasure-Coded Data Centers.

Weichen Huang Juntao Fang Shenggang Wan Changsheng Xie Xubin He

Learning-Driven Interference-Aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster.

Haitao Zhang Xin Geng Huadong Ma