IEEE Transactions on Big Data
Expand your horizons with Colloquium, a monthly survey of abstracts from all CS transactions!
From the October-December 2018 issue
Content-Aware Partial Compression for Textual Big Data Analysis in Hadoop
By Dapeng Dong and John Herbert
A substantial amount of information in companies and on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. Compression as an effective means to reduce data size has been employed by many emerging data analytic platforms, whom the main purpose of data compression is to save storage space and reduce data transmission cost over the network. Since general purpose compression methods endeavour to achieve higher compression ratios by leveraging data transformation techniques and contextual data, this context-dependency forces the access to the compressed data to be sequential. Processing such compressed data in parallel, such as desirable in a distributed environment, is extremely challenging. This work proposes techniques for more efficient textual big data analysis with an emphasis on content-aware compression schemes suitable for the Hadoop analytic platform. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of public and private real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
View the PDF of this article
View this issue in the digital library
Editorials and Announcements
Announcements
- In order to promote timely publication of regular paper submissions, please note that TBD is not currently accepting proposals for new special issues until the existing publication queue has been cleared.
- TBD is pleased to participate in a free trial offering of the new IEEE DataPort data repository, which supports authors in hosting and referring to their datasets during the article submission process. Learn more about this exciting opportunity.
- We're pleased to announce that Qiang Yang, head of the Huawei Noah's Ark Research Lab and a professor at the Hong Kong University of Science and Technology, has accepted the position of inaugural Editor-in-Chief beginning 1 Jan. 2015. Read more.
Editorials
- State of the Journal Editorial (Jan-March 2018)
- State of the Journal Editorial (Jan-March 2017)
- Welcome to the IEEE Transactions on Big Data (Jan-March 2015)
- Introduction to the IEEE Transactions on Big Data (Jan-March 2015)
Guest Editorials
- Guest Editorial: Big Data Infrastructure II (July-September 2018)
- Guest Editorial: Big Data Infrastructure I (April-June 2018)
- Special Issue on Biomedical Big Data: Understanding, Learning and Applications (Oct-Dec 2017)
- Urban Computing (April-June 2017)
- Big Scholar Data Discovery and Collaboration (Jan-March 2017)
- Big Data Analytics and the Web (July-Sept 2016)
- Big Scholar Data Discovery and Collaboration (Continued) (April-June 2016)
- Big Scholar Data Discovery and Collaboration (Jan-March 2016)
- Big Data Analytics and the Web (Oct-Dec 2015)
- Big Media Data: Understanding, Search, and Mining (Part 2) (Oct-Dec 2015)
- Big Media Data: Understanding, Search, and Mining (July-Sept 2015)
Call for Papers
General Call for Papers
TBD Call-for-Papers Flyer Version 1
TBD Call-for-Papers Flyer Version 2
Reviewers List
Annual Index
Access Recently Published TBD Articles
Subscribe to the RSS feed of recently published TBD content
Sign up for e-mail notifications through IEEE Xplore Content Alerts
View TBD preprints in the Computer Society Digital Library
TBD is financially cosponsored by:
TBD is technically cosponsored by: