№1, 2016

"BIG DATA" ANALYTICS: AVAILABLE APPROACHES, PROBLEMS AND SOLUTIONS

Rena T. Gasimova

Increased volume of data and demand for ad hoc analysis of data leads to the rise of one of the biggest problems of Big Data called Big Data analysis. This article studies the current problems and most frequently used methods of big data analysis and gives some recommendations. The article also investigates the technological stages of Big data processing, and the basic characteristics and features of big data (pp.62-78).

Keywords: data warehouse, cloud, database management systems, data processing, big data, big data analytics, NoSQL, MapReduce, Hadoop, OLAP.
References
  • Miniwatts Marketing Group, Worldwide Internet Market Research, www.miniwatts.com
  • The digital universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Study report, IDC, December 2012. www.emc.com/leadership/digital-universe
  • Worldwide Big Data Technology and Services 2013-2017 Forecast, http://www.idc.com
  • Data Science Central, The online resource for Big Data practitioners, www.datasciencecentral.com
  • Big data: The next frontier for innovation, competition, and productivity. Analyst report, McKinsey Global Institute, May 2011. http://www.mckinsey.com
  • Madden S. From Databases to Big Data // IEEE Internet Computing, 2012, vol.16, no.3, pp.4–6.
  • What is big data? - Bringing big data to the enterprise, 2013. http://www-01.ibm.com
  • Laney D. 3D Data Management: Controlling Data Volume, Velocity and Variety. Technical report, META Group, Inc (now Gartner, Inc.), February 2001. http://blogs.gartner.com
  • Clifford L. Big data: How do your data grow? // Nature, 2008, vol.455, pp.28–29.
  • The digital universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Study report, IDC, December 2012. emc.com/leadership/digital-universe
  • Wei Fan, Albert Bifet. Mining big data: current status, and forecast to the future // ACM SIGKDD Explorations Newsletter, 2012, vol.14, no.2, pp.1–5.
  • Maté A., Llorens H., Gregorio E. An integrated multidimensional modeling approach to access big data in business intelligence platforms / Proceedings of the 2012 international conference on Advances in Conceptual Modeling (ER'12), Heidelberg, 2012, pp.111–120.
  • Szalay A., Gray J. 2020 Computing: Science in an exponential world // Nature, 2006, vol. 440, pp.413–414.
  • McAfee A., Brynjolfsson E. Big Data: The Management Revolution // Harvard Business Review, 2012, vol.90, no.10, p.60–68.
  • Birke R., Björkqvist M., Chen L. Y., Smirni E., Engbersen T. (Big)data in a virtualized world: volume, velocity, and variety in cloud datacenters / Proceedings of the 12th USENIX conference on File and Storage Technologies (FAST'14), USENIX Association Berkeley, CA, USA, 2014, pp.177–189.
  • Richard Price. Volume, velocity and variety: key challenges for mining large volumes of multimedia information // Proceedings of the 7th Australasian Data Mining Conference (AusDM '08), Australia, 2008, vol.87, p.17.
  • Chiang R.H.L., Goes P., Stohr E.A. Business Intelligence and Analytics Education, and Program Development: A Unique Opportunity for the Information Systems Discipline
    // ACM Transactions on Management Information Systems (TMIS), 2012, vol.3, no.3, Article 12 (pp.12:1-12:13).
  • Chen H., Chiang R.H. L., Storey V.C. Business intelligence and analytics: from big data to big impact // Journal MIS Quarterly, 2012, vol.36, no.4, pp.1165–1188.
  • Omar El-Gayar, Prem Timsina. Opportunities for Business Intelligence and Big Data Analytics in Evidence Based Medicine / HICSS '14 Proceedings of the 2014 47th Hawaii International Conference on System Sciences( HICSS '14), USA, 2014, pp.749–757.
  • Statchuk C., Iles M., Thomas F. Big data and analytics / Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research (CASCON '13), USA, 2013, pp.341–343.
  • Foster Y., Kesselman C., Tuecke S. The Anatomy of the Grid: Enabling Scalable Virtual Organizations // Intern. J. of High Performance Computing Applications, 2001, vol.15, no. 3, pp.200–222.
  • Leonid Chernyak. Big Data - A new Theory and Practice // Open Systems, 2011, No 10, pp.18–25.
  • Dean J., Ghemawat S. MapReduce: simplified data processing on large clusters // Communications of the ACM, 2008, vol.5, no.1, pp.107–113.
  • Lee K-H., Lee Y-J., Choi H., Chung Y.D., Moon B. Parallel data processing with MapReduce: a survey // ACM SIGMOD Record, 2011, vol.40, no.4, pp.11–20.
  • Brunozzi Simone. Big Data and NoSQL with Amazon DynamoDB / Proceedings of the 2012 workshop on Management of big data systems (MBDS '12), USA, 2012, pp.41–42.
  • Weiyi Shang, Zhen Ming Jiang, Hadi Hemmati, Bram Adams, Ahmed E. Hassan, Patrick Martin. Assisting developers of big data analytics applications when deploying on hadoop clouds / Proceedings of the 2013 International Conference on Software Engineering (ICSE '13), NJ, USA, 2013, pp.402–411.
  • Chuck Lamb. Hadoop in Action, Publisher: DMK Press, 2012, p.424.
  • Leonid Chernyak. Calculations with a focus on data // Open Systems, 2008, No 8, pp. 36–39.
  • Wu X., Zhu X., Wu G., Ding W. Data Mining with Big Data // Journal IEEE Transactions on Knowledge and Data Engineering, 2014, vol.26, no.1, pp. 97–107.
  • Wang Y.H., Cao K., Zhang X.M. Complex event processing over distributed probabilistic event streams // Computers & Mathematics with Applications, 2013, vol.66, no.10, pp.1808–1821.
  • Leonid Chernyak. Time of Troubles for database // Open Systems, 2012, No 2, pp. 16–21.
  • Leonid Chernyak. What to do with the chaos of data? // Open Systems 2013, No 9, pp. 16–20.
  • Natalia Dubova. Big Data closeup // Open Systems, 2011, No 10, pp. 30–33.
  • InfoSphere Platform: Big Data Analytics, 2013, http://www-01.ibm.com/software
  • Jacobs A. The pathologies of big data // Communications of the ACM. 2009, vol.52. no.8, рp. 36–44.
  • Vakhrameev Kirill. Database for Big Data analysis // Open Systems, 2011, No 10, pp. 26–29.
  • Babu S., Herodotou H. Massively Parallel Databases and MapReduce Systems // Foundations and Trends in Databases, 2013, vol.5, no.1, pp.1–104.
  • Vignesh Prajapati, Big Data Analytics with R and Hadoop, Publisher: Packt Publishing Ltd, 2013, pp.238.
  • Leonid Chernyak. A fresh look at Big Data // Open Systems 2013, No 7, pp. 48–51.
  • Krish Krishnan. Data Warehousing in the Age of Big Data. 1st Edition, Morgan Kaufmann Publishers Inc. San Francisco, USA, 2013, pp.370.
  • Bill Franks. Taming big data. How to extract knowledge from data arrays using deep analytics, trans. from English. Andrey Baranov, M .: Mann, Ivanov and Ferber, 2014, p. 352.
  • Big Data - What Is It? 2013, http://www.sas.com/big-data
  • MathWorks, http://www.mathworks.com/discovery/big-data-matlab.html
  • Hadoop Distributed File System. http://hadoop.apache.org/docs
  • Witt D., Gray J. Parallel Database Systems: The Future of High Performance Database Systems // Communications of the ACM, 1992, vol.35, no.6, pp. 85–98.
  • Seleznev K. Problems of Big Data analysis // Open Systems 2012, No7, pp. 25–29.
  • Gudivada V.N., Rao D., Raghavan V.V. NoSQL Systems for Big Data Management / Proceedings of the 2014 IEEE World Congress on Services (SERVICES '14), USA, 2014, pp.190–197.
  • Mayer-Shenberger Victor, Kukier Kenneth. Big data. A revolution that will change the way we live, work and think, trans. from English. Inna Gaydyuk, M .: Mann, Ivanov and Ferber, 2013 p. 240.
  • Kenn Slagter, Ching-Hsien Hsu, Yeh-Ching Chung, Daqiang Zhang. An improved partitioning mechanism for optimizing massive data analysis using MapReduce // The Journal of Supercomputing, 2013, vol.66, no.1, pp.539–555.
  • Alguliyev R.M., Hajirahimova M.S. Big data phenomenon: Challenges and Opportunities// Information Technology, 2014, No 2, pp. 3-16.
  • Marcos D. Assunção, Rodrigo N., Silvia Bianchi, Marco A.S. Netto, Rajkumar Buyya. Big Data computing and clouds // Journal of Parallel and Distributed Computing, 2015, vol.79, pp. 3–15.
  • Inmon W. H. “Building the Data Warehouse,” 3rd Edition, John Wiley & Sons, Inc., New York, 2002, pp.41.2.
  • Alguliyev R.M., Gasimova R.T., Alakbarova I.Y. About modern decision support concepts // ANAS News, physics and mathematics and technical sciences series, 2005, No 2, pp. 70-75.
  • Tonkin E.L., Pfeiffer H.D. Zombies Walk Among Us: Cross-Platform Data Mining for Event Monitoring / Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW '13), USA, 2013, pp.452–459.
  • Krishna Kumar K.P., Geethakumari G. A taxonomy for modelling and analysis of diffusion of (mis)information in social networks // International Journal of Communication Networks and Distributed Systems, Switzerland, 2014, vol. 3, no. 2, pp.119–143.
  • Alguliev R.M., Gasimova R.T. Identification of Categorical Registration Data of Domain Names in Data Warehouse Construction Task // Intelligent Control and Automation, 2013, vol.4, no.2, pp.227–234.
  • Alguliev R.M., Gasimova R.T. On an approach for intellectual analysis of registration data of domain names // International Journal of Ubiquitous Computing and Internationalization, 2011, vol.3, no.1, pp. 27–30.
  • Ordonez C. Can we analyze big data inside a DBMS? / Proceedings of the sixteenth international workshop on Data warehousing and OLAP (DOLAP '13), USA, 2013, pp. 85–92.
  • Alguliev R.M., Gasimova R.T, Alakbarova I.Y. An approach to performing complex queries based on OLAP technology // Information Technology of Simulation and Control, 2006, No 6, pp.728–731.
  • Gasimova R.T. Conceptual basis for the creation of a knowledge base of domain names // News of Baku University. Physical and Mathematical Sciences Series, 2010, No 4, pp. 95–102.
  • Park H.S., Jun C.H. A simple and fast algorithm for K-medoids clustering // Expert Systems with Applications, 2009, vol.36, no.2, pp.3336–3341.
  • Nevsky I.M., Filippovich A.Y. The technique of adaptive clustering factual data based on the integration of MST algorithms and Fuzzy C-means // Proceedings of the higher educational institutions. Printing problems and publishing industry. M.: Publishing house MSUP, 2009, No 3, pp. 48–61.
  • Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan. Automatic Subspace Clustering of High Dimensional Data // Data Mining and Knowledge Discovery, 2005, vol.11, no.1, pp.5–33.
  • Agrawal R., Imielinski T., Swami A. Mining association rules between sets of items in large databases. / Proceedings of the ACM SIGMOD Conference on Management of Data, Washington D.C., May 1993, pp.207–216.
  • Tsai-Hung Fan, Dennis K. J. Lin, Kuang-Fu Cheng. Regression analysis for massive datasets // Journal Data & Knowledge Engineering, 2007, vol.61, no.3, p. 554–562.
  • Abousalh-Neto N.A., Kazgan S. Big data exploration through visual analytics
    / Proceedings of the 2012 IEEE Conference on Visual Analytics Science and Technology (VAST '12), USA, 2012, pp. 285–286.
  • Phil Simon. The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions. Publisher: Wiley, 2014, 240 p.