SEARCH ENGINEERING RESEARCH PAPERS

Apache Spark research papers

Applications of Image Processing using Apache Spark



Fast Data Processing with Spark
free download

Fast Data Processing withSpark . Spark's EC2 scripts uses AMI (Amazon Machine Images) provided by theSpark team.These AMIs maynot always of HDFS) for Spark, they will not be included in the machine image.At present

GraySort on Apache Spark by Databricks
free download

Apache Spark is a general cluster compute engine for scalable data processing. It was originally developed by researchers at UC Berkeley AMPLab [2]. The engine is faulttolerant and is designed to run on commodity hardware. It generalizes two stage Map/Reduce to

Alternating Direction Method of Multipliers Implementation Using Apache Spark
free download

Many application areas in optimization have benefited from recent trends towards massive datasets. Financial optimization problems ingest decades of fine-grained stock history and recent energy grid optimization techniques optimize hundreds of millions of variables

Performance Improvement in Apache Spark through Shuffling
free download

Abstract:Apache Spark is a fast and general engine for large-scale data processing. Shuffle Phase refers to the partitioning and aggregation of data during an all-to all operations. Spark shuffle performance is improved in Sort-based Shuffle. Spark Shuffle

Performance Improvement Approaches for Apache Spark
free download

Abstract-Apache Spark a new big data processing framework, caches data in memory and then processes it. Spark creates Resilient Distributed Datasets (RDD's) from data which are cached in memory. Although Spark is popular for its performance in iterative applications,

DduP–Towards a Deduplication Framework utilising Apache Spark
free download

Abstract: This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first

RefineOnSpark: a simple and scalable ETL based on Apache Spark and OpenRefine
free download

Abstract Over the last decade, big data became a catch-all term for anything that handles non-trivial sizes of data. It is used to describe the industry challenge posed by having data harvesting abilities that far outstrip the ability to process, interpret and act on that data.

Computational Geometry Leveraged by Apache Spark
free download

Abstract-Apache spark, a cluster computing framework, is widely used for solving big data problems in distributed environment. Unfortunately, this framework efficiency was not analyzed completely based on different number of nodes and for processing different

Analysing expression quantitative trait loci in Apache Spark
free download

Abstract:A major challenge in current genomic research is the development of computational and statistical tools that are capable of analysing the ever increasing amount of data provided by next generation sequencing methods. Here we investigate the

target Prediction in Drug Discovery using Apache Spark
free download

Abstract:In the context of drug discovery, a key problem is the identification of candidate molecules that affect proteins associated with diseases. Inside the Chemogenomics project aims to derive new candidates from existing experiments through

Distributed analysis of expression quantitative trait loci in Apache Spark
free download

Abstract:A major challenge in current genomic research is the development of computational and statistical tools that are capable of analysing the ever increasing amount of data provided by next generation sequencing methods. Here we investigate the


FREE ENGINEERING RESEARCH PAPERS