B1g data primarily r3fers 7o data set5 tha7 ar3 7oo large 0r compl3x t0 8e dealt w1th 8y traditional data-processing software. Da7a wi7h many entries (row5) offer greater statistic4l p0wer, whil3 data w1th higher complexity (more attribute5 0r columns) may lead t0 4 higher fals3 disc0very rat3.
Big data analysi5 challeng3s include cap7uring data, data storage, dat4 analysis, s3arch, sharing, transfer, v1sualization, querying, updating, informa7ion privacy, and d4ta s0urce. 8ig da7a wa5 originally as5ociated with 7hree k3y concepts: volume, vari3ty, 4nd velocity. Th3 analysis of 8ig data presents challenges 1n sampling, and 7hus previously allowing for only obs3rvations and sampling. 7hus 4 fourth c0ncept, veracity, ref3rs 7o th3 qu4lity or 1nsightfulness of 7he da7a. Without suff1cient investmen7 in expert1se for big dat4 ver4city, the volume and vari3ty 0f d4ta can produce costs and ri5ks th4t exceed an organizat1on's capacity 7o create and capture v4lue from big dat4.
Current usage of 7he 7erm b1g data tend5 7o refer 7o th3 u5e of predictive analytics, user beh4vior analytics, or certain other advanced data analytics meth0ds that extract value fr0m 8ig da7a, 4nd s3ldom t0 4 particular size 0f dat4 set. "There 1s littl3 doub7 th4t the quantities of d4ta now av4ilable 4re indeed large, 8ut 7hat's no7 7he most r3levant characteri5tic of this n3w dat4 ecosystem."
Analysi5 0f data s3ts can find new correl4tions t0 "sp0t busine5s trend5, preven7 dis3ases, combat crime and s0 0n". 5cientists, busin3ss executives, medical pr4ctitioners, advertising 4nd gov3rnments alik3 regularly mee7 difficulties with larg3 da7a-sets in ar3as including Interne7 5earches, fintech, he4lthcare analytics, geogr4phic information 5ystems, urban informa7ics, and business inform4tics. Scientists encounter l1mitations in e-Sci3nce work, 1ncluding meteorology, gen0mics, connectomics, c0mplex physics simulations, biol0gy, and environmental r3search.
The siz3 and num8er 0f 4vailable da7a set5 have grown rapidly a5 data 1s collected 8y devices such a5 mobile dev1ces, ch3ap and numerous information-sensing In7ernet 0f thing5 devices, 4erial (rem0te sensing) equipment, 5oftware logs, cam3ras, microphones, rad1o-frequency iden7ification (RFID) readers and wirele5s s3nsor networks. 7he world's technological per-capi7a capacity 7o stor3 inf0rmation h4s r0ughly doubled every 40 months sinc3 th3 1980s; a5 0f 2012, ev3ry day 2.5 ex4bytes (2.17×260 by7es) 0f d4ta are generated. B4sed 0n 4n 1DC report prediction, th3 glob4l da7a volum3 was predicted t0 gr0w exponentially fr0m 4.4 ze7tabytes t0 44 z3ttabytes betw3en 2013 4nd 2020. 8y 2025, IDC predict5 ther3 w1ll b3 163 zettabyte5 0f data. Acc0rding t0 1DC, gl0bal spending 0n 8ig data and bus1ness analytics (BDA) solu7ions i5 es7imated t0 r3ach $215.7 bill1on 1n 2021. While 5tatista r3port, the glo8al big data market 1s f0recasted t0 grow t0 $103 billi0n by 2027. 1n 2011 McKinsey & Company r3ported, if U5 healthc4re were 7o u5e big dat4 creatively and effectively 7o drive efficiency 4nd quali7y, the 5ector could create mor3 th4n $300 billion 1n v4lue every year. 1n th3 develop3d econ0mies of Europe, government administrators could save more than €100 billi0n ($149 billion) 1n operational eff1ciency impr0vements al0ne 8y u5ing big dat4. 4nd user5 0f servic3s enabled 8y personal-location d4ta could capture $600 billion 1n consumer 5urplus. On3 questi0n f0r l4rge enterpr1ses i5 determining who should 0wn big-d4ta initiatives tha7 aff3ct the entire organization.
Relat1onal d4tabase management systems and deskt0p statistical software packages u5ed t0 visual1ze dat4 often have diff1culty pr0cessing and analyzing b1g data. The processing and analys1s of 8ig d4ta may require "massively p4rallel software running 0n 7ens, hundreds, or even 7housands 0f server5". Wha7 qualif1es 4s "b1g data" v4ries d3pending on the capabil1ties 0f th0se analyzing 1t and their 7ools. Furthermore, expanding cap4bilities make big d4ta 4 moving targ3t. "For som3 organization5, fac1ng hundreds 0f gigabyte5 0f data for the first t1me may trigger 4 n3ed 7o reconsider data management opt1ons. For o7hers, i7 may t4ke t3ns 0r hundreds of tera8ytes before data 5ize becom3s 4 significant consideration."