Finding Lost Information in Big Data
Big Data is only as valuable as the information it contains; information removes uncertainty from whatever answers your data provides — whether it’s about business, weather, sports, or quantum physics. By using information, people are able to make smarter decisions, and technology wields algorithms more effectively.
Many factors determine how much information your data contains — from the quality of the data collection, to how well it is encoded and transformed, to how accurately it is analyzed and used. In large datasets, it’s even possible for information quantity to decrease as data quantity grows — due to greater uncertainty — resulting in poor decisions. This is especially true if your data is dirty and fragmented.
Can You Have Too Much Information?
First, you should answer the question, “How much information do I need?” More information exists than is practical to use. In Geneva, Switzerland, sensitive detectors at the Large Hadron Collider measure — with incredible precision — the motions of extremely small objects, but sometimes, all you need to know is whether the pitch was a ball or a strike.
Once you know how much information is needed, you should examine where information is being lost from your data. To help you preserve as much information as possible, we’ve provided the following image that displays the major causes of information loss.
Measurement Produces Data as People and Technology Make Observations.
Measurements taken with high-quality instrumentation produce more information with greater precision and less uncertainty. Better instruments collect a higher number of events out of the total number of desired events, and they are able to describe objects in greater detail. You can consider how measurement quality affects the value of information by using everyday examples like:
- The speed of a passing car,
- The number of videos played by a digital media player,
- Atmospheric pressure, or
- The number of steps taken in a day.
In each of these examples, different instruments can take measurements with greater or lesser accuracy. For example, a speedometer uses tire rotation to calculate a car’s velocity. Under normal conditions, a speedometer is within 2 percent of the actual speed; however, changes in tire diameter and environmental conditions can make this measurement miss by more than 10 percent. Law enforcement officers use electromagnetic waves to determine the speed of your vehicle, but these instruments are also subject to variances in measurement quality. It’s easy to see why having better information about your actual speed could prove to be valuable.
Information quantity grows as measurement becomes more granular. Examples of measurement scale include timeframes (daily, hourly, milliseconds, etc.) and locations (state, city, geospatial coordinates, etc.). More information about time and place is usually better, but data quantities grow rapidly when measuring objects in finer detail. Take yourself, for example. Your name, address, and payment method are enough information about you to receive most of the services you use every day, including utilities and media subscriptions, as well as to make most purchases. A deeper level of measurement reveals information about you that marketers find especially valuable, including age, gender, hobbies, media consumption, and spending habits. But, information about you can get much more granular still. If the purpose of measurement is to diagnose your health, an MRI or CT scan may be used to capture enough information to prescribe the right treatment. The right measurement granularity will be determined by your purpose for collecting the data.
You will obtain maximum information from your data by using higher precision instrumentation to measure the right level of granularity. Doing so decreases uncertainty and makes your data the most valuable it can be.
Source: Adobe Blogs
Kalyan Banga200 Posts
I am Kalyan Banga, a Post Graduate in Business Analytics from Indian Institute of Management (IIM) Calcutta, a premier management institute, ranked best B-School in Asia in FT Masters management global rankings. I have spent 6 years in field of Analytics.