Browsing All Posts filed under »Data Science«

LinkedIn’s Data Infrastructure

August 4, 2010


Jay Kreps of LinkedIn presented some informative details of how they process data at the recent Hadoop Summit. Kreps described how LinkedIn crunches 120 billion relationships per day and blends large scale data computation with high volume, low latency site serving. Much of LinkedIn’s important data is offline – it moves fairly slowly. So they […]

Facebook on Hadoop, Hive, HBase, and A/B Testing

July 14, 2010


The Hadoop Summit of 2010 included presentations from a number of large scale users of Hadoop and related technologies. Notably, Facebook presented a keynote and details information about their use of Hive for analytics. Mike Schroepfer, Facebook’s VP of Engineering delivered a keynote describing the scale of their data processing with Hadoop. Schroepfer gave an […]

GigaOm Structure Highlights

July 9, 2010


The GigaOM Stucture conference a couple of weeks ago addressed many areas of cloud computing. One of the key themes of the event was the emergence of new data architectures. Throughout the panels, interviews, and presentations many speakers identified significant changes in how data gets handled that will be coming. Paul Maritz, CEO of VMWare, […]