Fulfilling big data’s promise with small and wide data

By Jim Hare, distinguished VP Analyst, Gartner

Disruptions such as the COVID-19 pandemic cause historical data that reflect past conditions for organisations to quickly become obsolete. As they experience the limitations of big data as a critical enabler of analytics and AI, new approaches known as ‘small data’ and ‘wide data’ are emerging.

The big data era achieved the ability to store and manage data but fell short of enabling organisations to derive value from it. This is where small and wide data come in to deliver on that promise.

The wide data approach enables the analysis and synergy of a variety of small and large, unstructured and structured data sources. The small data approach, on the other hand, is about the application of analytical techniques that require less data but still offer useful insights.

Both approaches enable more robust analytics and AI, reducing an organisation’s dependency on big data and enabling a richer, more complete situational awareness or 360-degree view. Organisations can then apply analytics for better decision making in the increasingly complex context of disruptions, market dynamics and demanding customers.

According to Gartner, 70% of organisations will be compelled to shift their focus from big to small and wide data by 2025, providing more context for analytics and making AI less data hungry.

Data and analytics (D&A) leaders must envisage a strategy that empowers their organisations to use small, wide and synthetic data to drive business transformation via analytics augmented with AI and machine learning (ML). This will help them tackle challenges such as low availability of training data or developing more robust models by using a wider variety of data.

Why is small and wide data important?

Analytics and AI need to be able to work with more recent and less voluminous data. In addition, collecting sufficiently large volumes of historical or labelled data for analytics and AI is a challenge for many organisations.

Data sourcing, data quality, bias and privacy protection are common challenges. But even if big data is available, the costs, time and energy to implement conventional supervised ML can still be prohibitive. In addition, decision making by humans and AI has become more complex and demanding, requiring a greater variety of data for better situational awareness.

Taken together, this means that there’s a growing need for analytical techniques that can leverage available data more effectively, either by reducing the required volume or by extracting more value from unstructured, diverse data sources.

What is the impact?

The wide data approach applies X analytics, with X standing for finding links between data sources, as well as for a diversity of data formats. These formats include tabular, text, image, video, audio, voice, temperature, or even smell and vibration. The data itself comes from an increasing range of internal and external data sources, such as data marketplaces, brokers, social media, IoT sensors and digital twins.

The small data approach includes the tailored use of less data hungry models, such as certain time-series analysis techniques, rather than using more data hungry deep learning techniques in a one-size-fits-all approach. Other techniques include few-shot learning, synthetic data or self-supervised learning. The need for data can be further alleviated through techniques such as collaborative or federated, adaptive, reinforcement and transfer learning.

Potential areas for innovation with small and wide data include, but not limited to, demand forecasting in retail, real-time behavioural and emotional intelligence in customer service applied to hyper-personalisation, and customer experience improvement.

Other areas include physical security or fraud detection and adaptive autonomous systems, such as robots, which constantly learn by the analysis of correlations in time and space of events in different sensory channels.

How to get started

Explore small and wide data approaches to lower your barrier to entry for advanced analytics and AI caused by a real or perceived lack of data, rather than overly relying on data hungry deep learning approaches.

Extend the toolbox of your D&A teams with techniques to provide a richer context for more accurate business decision making, leveraging the growing availability of external data sources through data sharing and marketplaces.

Finally, enrich and improve the predictive power of data by incorporating a greater variety of structured and unstructured data sources.

โดย มร.จิม ฮาเร่ รองประธานฝ่ายวิเคราะห์การ์ทเนอร์ อิงค์

เหตุการณ์ที่ทำให้เกิดการหยุดชะงัก อาทิ การระบาดครั้งใหญ่ของไวรัสโควิด-19 ส่งผลให้ข้อมูลในอดีตที่สะท้อนถึงสถานะขององค์กรต่าง ๆ ล้าสมัยอย่างรวดเร็ว เนื่องจากองค์กรธุรกิจต้องเผชิญกับข้อจำกัดต่าง ๆ ในการใช้บิ๊กดาต้าที่เป็นปัจจัยสำคัญในการสร้างระบบวิเคราะห์ข้อมูลและเอไอ ตามแนวทางใหม่ที่เรียกว่า ‘ชุดข้อมูลขนาดเล็ก’ และ ‘ชุดข้อมูลแบบกว้าง’

ในยุคของบิ๊กดาต้านั้นประสบความสำเร็จในเรื่องของการจัดเก็บและจัดการข้อมูล แต่ยังไม่สามารถช่วยให้องค์กรธุรกิจได้รับประโยชน์จากข้อมูลดังกล่าว แต่ “ชุดข้อมูลขนาดเล็กและกว้าง” นั้นสามารถตอบโจทย์ในจุดนี้ได้

แนวทางการใช้ “ชุดข้อมูลแบบกว้าง” ช่วยให้องค์กรวิเคราะห์และทำงานร่วมกับแหล่งข้อมูลหลากหลาย ไม่ว่าจะเป็นจากแหล่งข้อมูลขนาดเล็กและขนาดใหญ่ หรือแหล่งข้อมูลแบบไม่มีโครงสร้างและมีโครงสร้างได้ ในขณะที่แนวทางการใช้ชุดข้อมูลขนาดเล็กนั้นเป็นเรื่องเกี่ยวกับการประยุกต์ใช้เทคนิคในการวิเคราะห์ที่ต้องการข้อมูลน้อยลงแต่ยังให้ประโยชน์ในเชิงลึก

โดยทั้งสองแนวทางช่วยให้การวิเคราะห์และใช้ระบบเอไอมีประสิทธิภาพมากขึ้น ลดการพึ่งพาข้อมูลขนาดจำนวนมหาศาลให้แก่องค์กรและยังช่วยให้รู้เท่าทันสถานการณ์แบบเบ็ดเสร็จ เรียกว่ามองได้รอบด้านแบบ 360 องศา โดยองค์กรสามารถนำข้อมูลมาวิเคราะห์เพื่อช่วยในการตัดสินใจที่ดีขึ้นในบริบทที่ซับซ้อนอันเกี่ยวเนื่องกับเหตุการณ์การหยุดชะงักได้ เพื่อขับเคลื่อนได้อย่างรวดเร็วและตอบสนองความต้องการของลูกค้า

การ์ทเนอร์คาดว่าภายในปี 2568 องค์กรต่าง ๆ ประมาณ 70% จะถูกบังคับให้โฟกัสกับการใช้ชุดข้อมูลขนาดเล็กและกว้างแทนฐานข้อมูลขนาดใหญ่ ซึ่งช่วยให้วิเคราะห์บริบทของข้อมูลได้หลากหลายขึ้นและทำให้ระบบเอไอใช้ข้อมูลน้อยลง

ผู้บริหารด้านข้อมูลและการวิเคราะห์ (D&A) ต้องมองหากลยุทธ์ที่ช่วยให้องค์กรใช้ชุดข้อมูลขนาดเล็กและกว้าง รวมถึงข้อมูลสังเคราะห์เพื่อขับเคลื่อนองค์กรธุรกิจไปสู่การเปลี่ยนผ่านด้วยการใช้รูปแบบการวิเคราะห์ข้อมูลเพิ่มขึ้นด้วยระบบเอไอ (AI) และแมชชีนเลิร์นนิ่ง (ML) ที่จะช่วยให้สามารถจัดการกับความท้าทายต่าง ๆ ได้อย่างมีประสิทธิภาพ อาทิ จัดการกับข้อมูลการฝึกอบรมที่ไม่ค่อยได้ใช้งาน หรือการพัฒนาโมเดลธุรกิจที่มีประสิทธิภาพมากขึ้นโดยอาศัยข้อมูลหลากหลายและกว้างกว่าเดิม

ทำไมข้อมูลขนาดเล็กและกว้างจึงสำคัญ?

แน่นอนว่าการวิเคราะห์และใช้ระบบเอไอนั้นต้องทำงานร่วมกับข้อมูลที่สดใหม่และในขนาดของข้อมูลที่น้อยกว่าเดิม นอกจากนี้การเก็บรวบรวมข้อมูลในอดีตที่เพียงพอหรือติดป้ายกำกับไว้ใช้เฉพาะเพื่อใช้วิเคราะห์และสร้างระบบเอไอยังถือเป็นความท้าทายของหลาย ๆ องค์กรอยู่ในวันนี้

การจัดเรียงข้อมูล คุณภาพข้อมูล การปกป้องความเป็นส่วนตัวและอคติถือเป็นความท้าทายทั่วไป ถึงแม้จะมีฐานข้อมูลบิ๊กดาต้า แต่ค่าใช้จ่าย เวลาที่เสียไปและพลังงานในการใช้ระบบ ML ที่มีการควบคุมดูแลแบบเดิมก็ยังคงเป็นสิ่งที่เกิดขึ้นอยู่เสมอ นอกจากนี้การตัดสินใจของมนุษย์และระบบเอไอนั้นจะมีความซับซ้อนและความต้องการมากขึ้น ซึ่งต้องอาศัยข้อมูลหลากหลายเพื่อให้รับรู้สถานการณ์ได้อย่างถี่ถ้วน

เมื่อนำทุกอย่างมารวมเข้าด้วยกันนั่นหมายความว่าเราต้องการเทคนิคการวิเคราะห์ที่เพิ่มขึ้นถึงจะสามารถใช้ประโยชน์จากฐานข้อมูลที่มีอยู่เดิมได้อย่างมีประสิทธิภาพ ไม่ว่าจะลดปริมาณหรือเพิ่มปริมาณข้อมูลที่ต้องการใช้หรือดึงประโยชน์จากแหล่งข้อมูลที่หลากหลายและข้อมูลที่ไม่มีโครงสร้างมาใช้เพิ่ม

มีผลกระทบอะไรบ้าง?

แนวทางการใช้ “ชุดข้อมูลแบบกว้าง” ใช้หลักการวิเคราะห์ในรูปแบบสมการ X โดยที่ X หมายถึงการค้นหาความเชื่อมโยงระหว่างแหล่งข้อมูล ตลอดจนหมายถึงรูปแบบข้อมูลที่หลากหลาย ซึ่งรูปแบบข้อมูลเหล่านี้มีตั้งแต่ ข้อมูลในรูปแบบของตาราง ข้อความ รูปภาพ วิดีโอ เสียงที่ได้ยิน เสียงพูด อุณหภูมิหรือแม้แต่กลิ่นและการสั่นสะเทือน โดยมาจากแหล่งข้อมูลทั้งภายในและภายนอกที่มีความหลากหลาย อาทิ ข้อมูลบนมาร์เก็ตเพลส โบรกเกอร์ โซเชียลมีเดีย เซ็นเซอร์ไอโอที และฝาแฝดดิจิทัล (Digital Twins)

แนวทางการใช้ “ชุดข้อมูลขนาดเล็ก” คือการสร้างโมเดลเรียนรู้โดยใช้ข้อมูลจำนวนน้อย เช่น เทคนิคการวิเคราะห์ข้อมูลอนุกรมเวลา แทนที่จะใช้เทคนิคการเรียนรู้เชิงลึกที่อาศัยข้อมูลจำนวนมากในลักษณะรูปแบบเดียวใช้เหมือนกันทั้งหมด ซึ่งยังมีเทคนิคการสร้างโมเดลเรียนรู้อื่น ๆ ได้แก่ เทคนิค Few-Shot Learning เทคนิค Synthetic Data หรือ เทคนิค Self-Supervised Learning โดยเรายังสามารถใช้ข้อมูลน้อยลงได้อีกจากการใช้เทคนิคต่าง ๆ เช่น การทำงานร่วมกันหรือการรวมกลุ่ม การปรับตัว การเสริมกำลัง และการถ่ายโอนการเรียนรู้

สำหรับการพัฒนานวัตกรรมด้วยการใช้ “ชุดข้อมูลขนาดเล็กและกว้าง” ยังรวมถึงการใช้คาดการณ์ความต้องการสินค้าในร้านค้าปลีก เรียนรู้พฤติกรรมและอารมณ์แบบเรียลไทม์กับการบริการลูกค้าเพื่อให้แบรนด์ได้เรียนรู้ ศึกษาและวิเคราะห์พฤติกรรมของลูกค้าโดยละเอียดมากขึ้นในแบบ Hyper-Personalisation และช่วยการปรับปรุงประสบการณ์ของลูกค้าให้ดียิ่งขึ้น

ด้านอื่น ๆ ได้แก่การรักษาความปลอดภัยทางกายภาพหรือการตรวจจับการฉ้อโกงและระบบอัตโนมัติที่ปรับเปลี่ยนได้ เช่น หุ่นยนต์ ซึ่งมีการเรียนรู้การวิเคราะห์ความสัมพันธ์ของห้วงเวลาและพื้นที่เหตุการณ์ผ่านทางประสาทสัมผัสต่าง ๆ อยู่ตลอดเวลา

เริ่มต้นอย่างไรดี?

สำรวจแนวทางการปรับใช้ชุดข้อมูลขนาดเล็กและกว้างเพื่อลดอุปสรรคในการเข้าสู่โหมดการวิเคราะห์ขั้นสูงและเอไออันเนื่องมาจากการขาดข้อมูลที่ควรรับรู้จริง ๆ แทนที่จะอาศัยการเรียนรู้เชิงลึกที่พึ่งพาการใช้ข้อมูลมากเกินไป

เพิ่มเครื่องมือทางเทคนิคต่าง ๆ ให้กับทีม D&A เพื่อสร้างบริบทของข้อมูลที่สมบูรณ์ยิ่งขึ้นสำหรับใช้ในการตัดสินใจทางธุรกิจที่แม่นยำ โดยใช้ประโยชน์จากแหล่งข้อมูลภายนอกที่มีเพิ่มมากขึ้นผ่านการแบ่งปันข้อมูลและมาร์เก็ตเพลส

สุดท้าย

เพิ่มคุณค่าและปรับปรุงพลังการทำนายของข้อมูลด้วยการผสมผสานแหล่งข้อมูลทั้งที่มีโครงสร้างและไม่มีโครงสร้างให้มีมิติมากขึ้น