FREE! Subscribe to News Fetch, THE daily wine industry briefing - Click Here


SPONSORED BY:

CLICK TO VIEW VIDEO

epic AD final

Click ad, above, to see how corks with TCA above 0.5 ppt can be removed from production.

Commerce7Click to learn more

Netflix Amped Up Recommendations with its own Big Data. What that means for wine.

This is #5 in Wine Industry Insight’s in-depth series about the quest for the Netflix of wine

 

A note to readers:

Our Daily News Fetch email briefing is free to you and that’s made possible by our advertisers. When we spend substantial time researching and writing original articles, those are usually only available to premium subscribers to our Wine Executive News. Redacted versions of those are edited for free consumption.

 

Sometimes, the  premium articles contain significant information needed by the entire industry. In those cases, we offer premium subscribers the first look at the entire article, then later on, make it available for free after a substantial delay.

 

This article was originally published on March 23, 2021.

 

Premium subscribers are important to you and the entire industry because they make it  possible for us to do original reporting — articles that no other publication covers, but which can affect your business.



 

In article #3 of this series, (Reviews and 5-Star ratings are so useless for recommendations that Netflix tossed its prized $1-million algorithm. They’re even worse for wine) we took a deep look at why Netflix was so eager to move beyond a system that had previously worked so well for it.

In short, Netflix abandoned ratings and reviews because they recognized that the opinions expressed by those ratings and reviews are highly personal, individual perceptions that are both conscious and subconscious experiences that are also shaped by genetic, environmental, psychological, educational, and other factors.

 

This is the article that was planned to follow #3, but it got bumped because readers wanted more information on why genetics makes it impossible for current methods to make accurate wine recommendations, especially new versions of profile matching systems that have been in play (badly) for decades.

 

Next in this series: How Netflix has a gigantic sensory recommendation advantage over wine + what wine can do about that.


Collaborative Filtering: This 25-year-old paradigm frustrates but still beats ratings and reviews

As we explored in our last article, genetic variations play a key role in wine recommendation failures. But also at fault are the unsolved problems in Collaborative Filtering which uses data to make recommendations on previous behavior: “people who bought/liked this also bought/liked this.”

 

According to a scholarly paper (free PDF) published in 2015 by two of the key Netflix developers of the recommendation system (Carlos Gomez-Uribe, VP of product innovation, and Chief Product Officer Neil Hunt) the rating and recommendation systems lacked the necessary accuracy because of psychological, semantic, personal, and other intractable reasons.

“Now, we stream the content, and have vast amounts of data that describe what each Netflix member watches, how each member watches (e.g., the device, time of day, day of week, intensity of watching), the place in our product in which each video was discovered, and even the recommendations that were shown but not played in each session. These data and our resulting experiences improving the Netflix product have taught us that there are much better ways to help people find videos to watch than focusing only on those with a high predicted star rating.”

What the Gomez-Uribe/Hunt paper mentioned — but never elaborated on — is that the company moved beyond the fairly basic Collaborative Filtering system they had been using to organize their ratings and reviews, in order to expand its use of the Big Data techniques of hoovering up personal information from all over the Internet. This Big Data move was designed to improve recommendation accuracy, which Netflix has demonstrated, is a key element in customer retention.

Despite failures, Collaborative Filtering dominates recommendations.

In general, Big Data Collaborative Filtering systems spew irrelevant recommendations more often than useful ones.

 

This is evidenced by the reality that only 7% of consumers think that today’s “Big Data” recommendations are useful or relevant.

 

Screen Shot 2021-03-21 at 11.06.36 AM

But despite this abysmal performance, the 7% of those who do click on a recommendation amount to 24% of e-commerce revenues in general, and 35% of Amazon’s (McKinsey). This is even more vital to wine, given its marginalization relative to Netflix.

 

Screen Shot 2021-03-16 at 10.05.53 AM

Despite the disdain for current recommendations, our previous article on the stress of choosing among too many options — the Paradox of Choice — explains why people will click on recommendations, regardless of how frustrating that they can be. Even a lame recommendation may save time and relieve the stress of decisions. It’s important to realize that Collaborative Filtering (CF) is not an inherently bad paradigm on which to base recommendations.

 

Thumbnail-Perdue-CollaborativeFilteringExample copy 2

In the pre-ecommerce, pre-social media era, products -were bought in stores. No accurate, cost-effective way existed for a merchant or producer to sample or directly measure every consumer’s opinion of a physical product purchased or consumed.

 

Netflix was no different. It sold digital products — CDs — that were imprisoned in plastic. For that reason, ratings and reviews flourished for music (vinyl, tape, or CDs) and movies (tape or CD) because they were the only methods available– usually in print media. But beginning in the mid-1990s the first digital recommendation system — Collaborative Filtering — was invented. That was an epic epoch of change especially when you consider that e-commerce as we now know it had not yet been invented:

 

  • Web users were scarce.
  • Online was dominated by closed user sites like America Online and Compuserve.
  • Bandwidth was measured in a few thousand bits per second.
  • Video was dominated by physical merchandise — VHS tapes and CDs — that were rented or bought at physical stores.

 

It was a digital stone age when almost nothing was for sale online and bandwidth was so limited that listening to music on the Internet or watching an online movie was inconceivable.

 

Reviews of music, movies, books, wine, and more were available mostly in print except for the few people who actually had even email. Like wine, all had their own experts, but in the era when dead-tree media, traditional radio, and television dominated, the average consumer lacked the ability to express their own views other than “word of mouth” to friends and family.

Enter Ringo

To understand how this started, let’s take the Wayback Machine to Boston in 1994 and the early academic beginnings of the recommendation method now known as collaborative filtering which began, ironically enough, with a way to make it easier for music lovers to find new songs they would like to enjoy.

 

One of the earliest collaborative filtering systems was Ringo, a project invented at MIT in the 1993/94 era. Ringo allowed people to rate music on a scale of 1-7, then emailed their ratings to a central server where the ratings were automatically entered in a database. Then, on an hourly basis, the server processed the ratings and sent recommendations.

 

Ringo is fully described in this 1994 academic paper: “Social Information Filtering for Music Recommendation,” by Upendra Shardanand as part of his requirements for his BS and MS degrees in computer science and electrical engineering.

Screen Shot 2021-03-16 at 12.19.14 PM

 

Around this same time, there were other efforts to filter information to help selections (mostly for movies). These included Paramount’s “Movie Select,” but none of these were long-lived.

 

This history of collaborative filtering by professional archivist and historian Moya K. Mason is an astonishingly comprehensive history of the practice.

 

Significantly for the development of Collaborative Filtering, Ringo’s somewhat kludgy success begat an internet-based project (also at MIT) called FireFly whose core principles still rule the world of collaborative filtering.

In today’s online world, Collaborative Filtering Attempts To Eat All The World’s Data

The Ringo/Firefly era of CF worked well, especially before e-commerce began to flourish when Internet bandwidth was more than a million times slower than today’s gigabit speeds.

 

At that point, reviews and ratings were the only methods of recommending a product because there was no effective way for companies to spy on users by collecting personal data. In that era, reviews and ratings worked well for Netflix which was selling physical products that were episodically being discussed in emails, bulletin board systems, and increasingly on the original online dinosaurs like Compuserve and America Online.

 

Netflix began to track those reviews and ratings on its own computer systems and developed ways of using CF to start making recommendations from them. But beyond those early efforts, the realization began to dawn that users were generating useful data, not only about their opinions, but — more importantly — about measurable, behavioral activities like purchases, but also the very personal data attached to those activities like age, demographics, finances, credit ratings, and a lot more (see graphic, below).

personal-data-ecosystem-nstic-privacy-workshop-25-1024

All of this data was being hoovered up, stored, and manipulated to make more effective recommendations. Data scientists reasoned that they needed every sort of information about what people bought regardless of whether or not that information seemed to be unconnected data. Then they set out to connect the data in novel ways they thought might indicate purchasing desires.

 

At its best, CF relies on observable behavior of individuals with the goal of teasing out strains of commonality in order to make valid recommendations. This is why “people who bought this also bought this” offers a greater chance of accuracy than, “people who liked this also liked this.”

Quality data needed. Ratings & reviews = GIGO

Significantly, the “people who bought this also bought this” reflects a measurable action or behavior in the real world. As data, it reflects some portion of reality.

 

A rating or review, on the other hand, is not a behavioral measure. They reflect attitudes and opinions, psychological states, or emotions that may or may not reflect future opinions, attitudes, behaviors, or purchases.

 

If you start with ratings and reviews, you start with poor quality information and that makes for GIGO – Garbage In, Garbage Out.

 

Frequently, the “more data is better data” paradigm that relies on many unrelated data-point purchases or behaviors can be manipulated by algorithms to produce a recommendation.

 

The reasoning is that people who bought blue shirts, Birkenstock sandals, Craftsman tools, and Jerry Garcia albums would also like to know about fixer-up real estate in San Francisco, driving a Prius, meeting people on Craig’s list, and fermented vegetables. This is not always the case as indicated by the previous chart (above) on consumer opinion recommendations.

 

A feeble algorithm that fails to elevate relevant data from the collection bin also can produce GIGO recommendations.

 

But that has not stopped the rabid collection and purchase of personal data for recommendations. The graphic, below, shows some of the many types of personal data collected and algorithmically manipulated in an attempt to sell almost anything to almost anybody.

 

The fact is that this has become a global controversy over data theft and the wholesale collection and use of personal data by Big Data.

The Monster that Ringo Begat

The simplicity and naive accuracy of Ringo became an omnipresent and omnivorous consumer of data from billions of individuals and trillions of products, services, and attitudes (like voting behavior or sexual orientation).

 

The current paradigm for Big Data is to hoover up as much personal data as possible about any and all persons using the Internet. The great masses of data are then sorted and stored so that other algorithms can organize a system of similar actions and behaviors that can create a recommendation about a specific product or service to deliver to a specific person.

Screen Shot 2020-12-24 at 11.29.31 AMHow Ringo and Collaborative Filtering Led Netflix into Big Data

The chart below is a data disclosure cloud composed of exact quotes excerpted from disclosures and privacy notices posted in later December 2020 at netflix.com. It shows not only the data collected from within the Netflix system, but also offers hints and discloses data purchased or obtained from outside sources.
Screen Shot 2021-03-16 at 12.55.10 PM

Every move you make

It’s worth remembering what Netflix’s developers told us at the beginning of this article:

“Now, we stream the content, and have vast amounts of data that describe what each Netflix member watches, how each member watches (e.g., the device, time of day, day of week, intensity of watching), the place in our product in which each video was discovered, and even the recommendations that were shown but not played in each session.

As the data disclosure cloud (above) indicates, Netflix knows everything.

 

To paraphrase The Police: “Every move you make …Every click you take, They’ll be watching you.

 

More details on how Netflix does that at this link.

Big Data works for Netflix because of its market environment

Netflix has an enormous advantage over wine and other physical products of taste: Its customers inhabit a closed and heavily monitored digital environment.

 

In addition, as discussed in article 2 of this series, Netflix has:

 

  • U.S. Users – 197 million (60% of the population)
  • Product Inventory – 5,879 video streams
  • Customers Per Product: 33,509

 

That means that in addition to the data gathered on each user, the relatively small and stable base of available products means that Netflix has a lot of data and feedback on each product.

 

Significantly, by going beyond its own walled garden and acquiring data from many other sources, it can create even more detailed profiles of each user. Those details can not only solve the “paradox of choice” and make recommendations more accurate, but the added data can help Netflix determine which sorts of original content to produce.

 

By contrast, wine has approximately 28 times more products, but only about 1/8 the number of people in the customer base to work with:

 

  • U.S. Customers – 37.3 million (11% of the population)
  • Wine: U.S., Product Inventory: Probably 300,000+ with up to 160,000+ new wines per year.

 

This means that only a very limited number of people have ever tasted a given wine. That severely limits the valid data that could be used for crowdsourcing.

 

And perhaps more significantly major flaw of depending on reviewers is that an estimated 75% of wines have never been rated by a critic.

Ratings and reviews fail because wine ages

In addition, wine has a perpetually changing product base because a substantial portion of wines disappear every year as vintages head over the hill or inventories run out. This means that a customer who likes or buys a Chateau LaPlonk 2019, may not like the 2021 vintage — or may not like a 2019 that has aged for two years.

Big Data’s BIGGEST Fatal Flaw

As we’ve seen above, most consumers find big data recommendations an irrelevant failure. Collaborative Filtering is mostly a gateway drug to Machine Learning and Artificial Intelligence. However, none of those digital data disciplines can approach an IQ of more than 50 because none of them have a clue as to the WHY of the data that they gather and analyze for recommendations.

 

There are two reasons for this:

 

(1) Machine Learning systems do not — and (so far) cannot — know why a person bought/liked all the products they bought or visited all the online sites they visited.

 

Indeed, many people really don’t know the answer themselves because the answers are hiding in various consciousness states and perceptions. In addition, those perceptions are shaped by genetics, and the mental state and conditions at a given moment of decision. To capture this, decision-making would require a valid mechanism to accurately capture perception and intent. This would be called mind reading.

 

(2) Association is not causation. The most accurate recommendations will come from systems that know why a consumer makes a given decision or behaves in a certain manner.

 

Thus far, Machine Learning systems lack a mechanism to measure causality. ML systems do not know why data interacts in a given way, what effects one data point has on another, or in what temporal order those data points interact.

 

This article from VentureBeat — Why machine learning struggles with causality — offers an example that aptly illustrates the problem:

“When you look at a baseball player hitting the ball, you can make inferences about causal relations between different elements.

“For instance, you can see the bat and the baseball player’s arm moving in unison, but you also know that it is the player’s arm causing the bat’s movement and not the other way around. You also don’t need to be told that the bat is causing the sudden change in the ball’s direction.

Screen Shot 2021-03-20 at 9.22.20 AM

“Likewise, you can think about counterfactuals, such as what would happen if the ball flew a bit higher and didn’t hit the bat.

“Such inferences come to us humans intuitively.

“We learn them at a very early age, without being explicitly instructed by anyone and just by observing the world. But for machine learning algorithms, which have managed to outperform humans in complicated tasks such as Go and chess, causality remains a challenge. Machine learning algorithms, especially deep neural networks, are especially good at ferreting out subtle patterns in huge sets of data. They can transcribe audio in real time, label thousands of images and video frames per second, and examine X-ray and MRI scans for cancerous patterns. But they struggle to make simple causal inferences like the ones we just saw in the baseball example above.

“In a paper titled Towards Causal Representation Learning, researchers at the Max Planck Institute for Intelligent Systems, the Montreal Institute for Learning Algorithms (Mila), and Google Research discuss the challenges arising from the lack of causal representations in machine learning models and provide directions for creating artificial intelligence systems that can learn causal representations.”

Netflix gets it partially right

As Netflix wisely recognized, there’s no way to tease causality from a review or a rating. And as Towards Causal Representation Learning explains, even massive amounts of big data lack the necessary causality for precise inferences.

 

More on why ratings and reviews fail

All of this means that “like this” data for wine is close to meaningless (see links below) and the value of “bought this” data for wine depreciates rapidly.

 

The links below were written six years ago, but are still valid. Many of these topics (especially regarding genetics) have been updated in this series about Netflix and wine.

 

Next in this series:

A “Netflix of wine” is impossible with current recommendation methods because Sight & Sound dominate Smell &Taste

Why Netflix has a leg up on wine because its product experiences are all about sight and sound, while wine’s perception is mostly smell and taste (seasoned by sight, sound, and haptics). This final article will explain why no one currently in the market can become the “Netflix of wine” without implementing a radically new structure for measuring perception designed to drive recommendations in an open and digitally streamlined sales system.