SKIP TO CONTENT

data products

927 words 3 learners

Learn words with Flashcards and other activities

Full list of words from this list:

  1. datum
    an item of factual information from measurement or research
    Join the Data Revolution.
  2. optimization
    the act of rendering optimal
    There are many different optimization techniques to choose from (see see sidebar, below ), but it is a well-understood field with robust and accessible solutions.
  3. modeler
    a person who creates models
    The Modeler takes the raw data and converts it into slightly more refined predicted data.
  4. optimize
    make optimal; get the most out of; use best
    Optimizing for an actionable outcome over the right predictive models can be a company’s most important strategic decision.
  5. weather condition
    the atmospheric conditions that comprise the state of the atmosphere in terms of temperature and wind and clouds and precipitation
    If we want a more sophisticated system, we can build another model for traffic congestion and yet another model to forecast weather conditions and their effect on the safest maximum speed.
  6. randomize
    arrange or organize by chance, without any order or plan
    This will require conducting many randomized experiments in order to collect data about a wide range of recommendations for a wide range of customers.
  7. predictive
    relating to prediction
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  8. petabyte
    a unit of information equal to 1000 terabytes or 10^15 bytes
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  9. iterate
    say, state, or perform again
    Note here the different levels: models of individual components, tied together in a simulation given a set of inputs, iterated through over different input sets in a search optimizer.”
  10. simulator
    machine that models an environment for training or research
    The next machine on the assembly line is a

    Simulator , which lets ODG ask the “what if” questions to see how the levers affect the distribution of the final outcome.
  11. neural network
    computer architecture in which processors are connected in a manner suggestive of connections between neurons; can learn by trial and error
    Industrial engineers were among the first to begin using neural networks, applying them to problems like the optimal design of assembly lines and quality control.
  12. vehicle traffic
    the aggregation of vehicles coming and going in a particular locality
    For motor vehicle traffic, IBM performed a project with the city of Stockholm to optimize traffic flows that reduced congestion by nearly a quarter, and increased the air quality in the inner city by 25%.
  13. expected value
    the sum of the values of a random variable divided by the number of values
    Their actuaries could build models to predict a customer’s likelihood of being in an accident and the expected value of claims.
  14. traversal
    travel across
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  15. iterative
    marked by repetition
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  16. algorithm
    a precise rule specifying how to solve some problem
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  17. algorithmic
    of or relating to or having the characteristics of an algorithm
    Back in 1997, AltaVista was king of the algorithmic search world.
  18. actionable
    affording grounds for legal action
    We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.
  19. opportunity cost
    the benefits lost by choosing one option over another
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  20. assembly line
    series of machines and workers that build step-by-step
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  21. sophisticate
    a person who is cultured and has worldly experience
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  22. electrical system
    utility that provides electricity
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  23. optimal
    most desirable possible under a restriction
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  24. input
    signal going into an electronic system
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  25. arrival time
    the time at which a public conveyance is scheduled to arrive at a given destination
    We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map.
  26. anneal
    bring to a desired consistency by heating and cooling
    Optimization is a process we are all familiar with in our daily lives, even if we have never used algorithms like gradient descent or simulated annealing.
  27. raw data
    unanalyzed data; data not yet subjected to analysis
    Picture a Model Assembly Line for data products that transforms the raw data into an actionable outcome.
  28. Toni Morrison
    United States writer whose novels describe the lives of African-Americans (born in 1931)
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  29. customer
    someone who pays for goods or services
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  30. stoplight
    a visual signal to control the flow of traffic at intersections
    Any city with metered stoplights already has all the necessary information; they just haven’t found a way to suck the meaning out of it.
  31. responder
    someone who responds
    In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters .
  32. mechanical system
    a system of elements that interact on mechanical principles
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  33. recommendation
    praise of a person or thing as worthy or desirable
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  34. physical property
    any property used to characterize matter and energy and their interactions
    The

    data is in the wing materials’ physical properties; costs are listed in another tab of the application.
  35. weather forecasting
    predicting what the weather will be
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  36. crowd control
    activity of controlling a crowd
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  37. body type
    a category of physique
    Zafu’s approach is not to send their customers directly to the clothes, but to begin by asking a series of simple questions about the customers’ body type, how well their other jeans fit, and their fashion preferences.
  38. heuristic
    a commonsense rule to help solve some problem
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  39. sidebar
    a short, boxed section of text accompanying the main text
    There are many different optimization techniques to choose from (see see sidebar, below ), but it is a well-understood field with robust and accessible solutions.
  40. possible action
    a possible alternative
    The vehicle needs to use a simulator to examine the results of the possible actions it could take.
  41. simulate
    reproduce someone's behavior or looks
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  42. simulation
    the act of imitating the behavior of some situation
    Because the simulation is at a per-policy level, the insurer can view the impact of a given set of price changes on revenue, market share, and other metrics over time.
  43. data
    a collection of facts from which conclusions may be drawn
    Join the Data Revolution.
  44. model
    a representation of something, often on a smaller scale
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  45. insurance company
    a financial institution that sells insurance
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  46. logistic
    relating to necessary details of operation
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  47. gas mileage
    the ratio of the number of miles traveled to the number of gallons of gasoline burned
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  48. objective
    the goal intended to be attained
    Engineers start by defining a clear

    objective : They want a car to drive safely from point A to point B without human intervention.
  49. modeling
    the act of representing something
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  50. Gauss
    German mathematician who developed the theory of numbers and who applied mathematics to electricity and magnetism and astronomy and geodesy (1777-1855)
    Sidebar: Optimization in the real world

    Optimization is a classic problem that has been studied by Newton and Gauss all the way up to mathematicians and engineers in the present day.
  51. spam
    unwanted e-mail
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  52. predict
    make a guess about what will happen in the future
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  53. taxonomy
    a classification of organisms based on similarities
    Irfan Ahmed of CloudPhysics provides a good taxonomy of predictive modeling that describes this entire assembly line process:

    “When dealing with hundreds or thousands of individual components models to understand the behavior of the full-system, a ‘search’ has to be done.
  54. aerodynamics
    the branch of mechanics that deals with the motion of gases
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  55. product
    an artifact that has been created by someone or some process
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  56. stickiness
    the property of sticking to a surface
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  57. insurer
    a financial institution that sells insurance
    Insurers have centuries of experience in prediction, but as recently as 10 years ago, the insurance companies often failed to make optimal business decisions about what price to charge each new customer.
  58. component
    one of the individual parts making up a larger entity
    The first component of ODG’s Modeler was a model of price elasticity (the probability that a customer will accept a given price) for new policies and for renewals.
  59. add-on
    a supplementary component that improves capability
    The data collection and recommendation steps are not an add-on; they are Zafu’s entire business model — women’s jeans are now a data product.
  60. lever
    a simple machine giving a mechanical advantage on a fulcrum
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  61. dog food
    food prepared for dogs
    (“If Hulu shows me that same dog food ad one more time, I’m gonna stop watching!”)
  62. takeaway
    a concession made by a labor union to a company that is trying to lower its expenditures
    The takeaway, whether you are a tiny startup or a giant insurance company, is that we unconsciously use optimization whenever we decide how to get to where we want to go.
  63. leading edge
    forward edge of an airfoil
    Engineers are often quietly on the leading edge of algorithmic applications because they have long been thinking about their own modeling challenges in an objective-based way.
  64. randomized
    set up or distributed in a deliberately random way
    This will require conducting many randomized experiments in order to collect data about a wide range of recommendations for a wide range of customers.
  65. maximizing
    making as great as possible
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  66. thermostat
    a regulator for automatically regulating temperature
    Nest is designing smart thermostats that learn the home-owner’s temperature preferences and then optimizes their energy consumption.
  67. aerodynamic
    of or relating to the study of air
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  68. foothill
    a relatively low hill on the lower slope of a mountain
    The danger in this hill-climbing approach is that if the steps are too small, we may get stuck at one of the many local maxima in the foothills, which will not tell us the best set of controllable inputs.
  69. collaborative
    accomplished by working jointly
    Once they have the data in this format, data scientists apply some form of collaborative filtering to “fill in the matrix.”
  70. unhelpful
    providing no assistance
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  71. Strand
    a street in west central London famous for its theaters and hotels
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  72. annealing
    hardening something by heat treatment
    Optimization is a process we are all familiar with in our daily lives, even if we have never used algorithms like gradient descent or simulated annealing.
  73. prediction
    a statement made about the future
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  74. outcome
    something that results
    We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.
  75. controllable
    capable of being controlled
    The danger in this hill-climbing approach is that if the steps are too small, we may get stuck at one of the many local maxima in the foothills, which will not tell us the best set of controllable inputs.
  76. inner city
    the older and more populated and (usually) poorer central section of a city
    For motor vehicle traffic, IBM performed a project with the city of Stockholm to optimize traffic flows that reduced congestion by nearly a quarter, and increased the air quality in the inner city by 25%.
  77. spreadsheet
    a screen-oriented interactive program enabling a user to lay out financial data on the screen
    It is easy to stumble into the trap of thinking that since data exists somewhere abstract, on a spreadsheet or in the cloud, that data products are just abstract algorithms.
  78. maximize
    make as big or large as possible
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  79. actuary
    someone versed in the interpretation of numerical data
    Their actuaries could build models to predict a customer’s likelihood of being in an accident and the expected value of claims.
  80. PhD
    a doctorate usually based on at least 3 years graduate study
    We will show a systematic approach to step 4 that doesn’t require a PhD in computer science.
  81. revolutionize
    change radically
    The technology exists to build data products that can revolutionize entire industries.
  82. causality
    the relation between reasons and effects
    We can keep the “like” model that we have already built as well as the causality model for purchases with and without recommendations, and then take a staged approach to adding additional models that we think will improve the marketing effectiveness.
  83. motor vehicle
    a self-propelled wheeled vehicle that does not run on rails
    For motor vehicle traffic, IBM performed a project with the city of Stockholm to optimize traffic flows that reduced congestion by nearly a quarter, and increased the air quality in the inner city by 25%.
  84. pricing
    the evaluation of something in terms of its price
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  85. embed
    fix or set securely or deeply
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  86. filter
    device that removes something from what passes through it
    Once they have the data in this format, data scientists apply some form of collaborative filtering to “fill in the matrix.”
  87. William Faulkner
    United States novelist (originally Falkner) who wrote about people in the southern United States (1897-1962)
    The girl behind the counter recommended William Faulkner’s “Absolom Absolom.”
  88. jump-start
    start by connecting it to another car's battery
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  89. Amazon
    one of a nation of women warriors of Scythia
    A company like Amazon represents every purchase that has ever been made as a giant sparse matrix, with customers as the rows and products as the columns.
  90. quality control
    maintenance of standards of quality of manufactured goods
    Industrial engineers were among the first to begin using neural networks, applying them to problems like the optimal design of assembly lines and quality control.
  91. simulated
    reproduced or made to resemble; imitative in character
    Optimization is a process we are all familiar with in our daily lives, even if we have never used algorithms like gradient descent or simulated annealing.
  92. build
    make by combining materials and parts
    The technology exists to build data products that can revolutionize entire industries.
  93. computer science
    the branch of engineering science that studies (with the aid of computers) computable processes and structures
    We will show a systematic approach to step 4 that doesn’t require a PhD in computer science.
  94. cost-effective
    productive relative to the cost
    These outcomes can be fed to an

    Optimizer to build a functioning and cost-effective airplane wing.
  95. Santa Clara
    a city of west central California
    Suppose we wanted to get from San Francisco to the Strata 2012 Conference in Santa Clara .
  96. integration
    the act of combining into a whole
    When designing a product or manufacturing process, a drivetrain-like process followed by model integration, simulation and optimization is a familiar part of the toolkit of systems engineers .
  97. conditional
    imposing or depending on or containing an assumption
    The price elasticity model is a curve of price versus the probability of the customer accepting the policy conditional on that price.
  98. interaction
    mutual or reciprocal dealings or influence
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  99. metrics
    the study of poetic meter and the art of versification
    Because the simulation is at a per-policy level, the insurer can view the impact of a given set of price changes on revenue, market share, and other metrics over time.
  100. retailer
    a merchant who sells goods directly to consumers
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  101. dialog
    a conversation between two persons
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  102. competitor
    the contestant you hope to defeat
    ]

    ODG identified which

    levers the insurance company could control: what price to charge each customer, what types of accidents to cover, how much to spend on marketing and customer service, and how to react to their competitors’ pricing decisions.
  103. startup
    a newly established company or business venture
    One of the authors of this paper was explaining an iterative optimization technique, and the host says, “So, in a sense Jeremy, your approach was like that of doing a startup, which is just get something out there and iterate and iterate and iterate.”
  104. road map
    a map showing roads (for automobile travel)
    We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map.
  105. in tandem
    with one beside or behind the other
    In engineering, it is often necessary to link many component models together so that they can be simulated and optimized in tandem.
  106. sensor
    a device that responds to a signal or stimulus
    Next, we consider what

    data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians).
  107. thermal
    relating to or associated with heat
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  108. get stuck
    be unable to move further
    The danger in this hill-climbing approach is that if the steps are too small, we may get stuck at one of the many local maxima in the foothills, which will not tell us the best set of controllable inputs.
  109. gradient
    a graded change in the magnitude of something
    Optimization is a process we are all familiar with in our daily lives, even if we have never used algorithms like gradient descent or simulated annealing.
  110. teaser
    advertisement that offers something free to arouse interest
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  111. robot
    a mechanism that can move automatically
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  112. price
    the amount of money needed to purchase something
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  113. identifiable
    capable of being recognized
    The final curve has a clearly identifiable local maximum that represents the best price to charge a customer for the first year.
  114. matrix
    an enclosure within which something originates or develops
    A company like Amazon represents every purchase that has ever been made as a giant sparse matrix, with customers as the rows and products as the columns.
  115. neural
    of or relating to the nervous system
    Industrial engineers were among the first to begin using neural networks, applying them to problems like the optimal design of assembly lines and quality control.
  116. Carnegie Mellon University
    an engineering university in Pittsburgh
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  117. pedestrian
    a person who travels by foot
    Next, we consider what

    data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians).
  118. Silicon Valley
    a region in California to the south of San Francisco that is noted for its concentration of high-technology industries
    In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters .
  119. coefficient
    a constant number that serves as a measure of some property
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  120. engineer
    a person who uses scientific knowledge to solve problems
    Engineers start by defining a clear

    objective : They want a car to drive safely from point A to point B without human intervention.
  121. personalized
    made for or directed or adjusted to a particular individual
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  122. elasticity
    the tendency of a body to return to its original shape
    The first component of ODG’s Modeler was a model of price elasticity (the probability that a customer will accept a given price) for new policies and for renewals.
  123. disrupt
    make a break in
    As predictive modeling and optimization become more vital to a wide variety of activities, look out for the engineers to disrupt industries that wouldn’t immediately appear to be in the data business.
  124. macroeconomic
    of or relating to macroeconomics
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  125. approach
    move towards
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  126. transform
    change or alter in appearance or nature
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  127. servicing
    the act of mating by male animals
    The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
  128. black box
    equipment that records information about the performance of an aircraft during flight
    Why not bundle simulation and optimization engines with a physical engine, all inside the black box of a car?
  129. engine
    motor that converts energy into work or motion
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  130. defining
    the process of determining the form or meaning of something
    Engineers start by defining a clear

    objective : They want a car to drive safely from point A to point B without human intervention.
  131. define
    show the form or outline of
    Engineers start by defining a clear

    objective : They want a car to drive safely from point A to point B without human intervention.
  132. guesswork
    an estimate based on little or no information
    But those models did not solve the pricing problem, so the insurance companies would set a price based on a combination of guesswork and market studies.
  133. coalesce
    fuse or cause to come together
    We don’t know what design approaches will be developed in the future, but right now, there is a need for the data science community to coalesce around a shared vocabulary and product design process that can be used to educate others on how to derive value from their predictive models.
  134. scientist
    a person with advanced knowledge of empirical fields
    But as data scientists build increasingly sophisticated products, they need a systematic design approach.
  135. real world
    the practical world as opposed to the academic world
    Sidebar: Optimization in the real world

    Optimization is a classic problem that has been studied by Newton and Gauss all the way up to mathematicians and engineers in the present day.
  136. steering wheel
    a handwheel that is used for steering
    The levers are the vehicle controls we are all familiar with: steering wheel, accelerator, brakes, etc.
  137. denim
    a coarse durable cotton fabric used to make jeans
    Plenty of websites sell designer denim, but for many women, high-end jeans are the one item of clothing they never buy online because it’s hard to find the right pair without trying them on.
  138. randomly
    in a random manner
    It was necessary to build this dataset by randomly changing the prices of hundreds of thousands of policies over many months.
  139. congestion
    excessive crowding
    If we want a more sophisticated system, we can build another model for traffic congestion and yet another model to forecast weather conditions and their effect on the safest maximum speed.
  140. business community
    the body of individuals who manage businesses
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  141. bottom line
    the last line in an audit
    From there, they developed an optimized pricing process that added hundreds of millions of dollars to the insurers’ bottom lines.
  142. graph
    a visual representation of the relations between quantities
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  143. specify
    be particular about
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  144. Morrison
    United States writer whose novels describe the lives of African-Americans (born in 1931)
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  145. technique
    a practical method or art applied to some particular task
    There are many different optimization techniques to choose from (see see sidebar, below ), but it is a well-understood field with robust and accessible solutions.
  146. designing
    the act of working out the form of something
    When designing a product or manufacturing process, a drivetrain-like process followed by model integration, simulation and optimization is a familiar part of the toolkit of systems engineers .
  147. stratum
    one of several parallel layers of material
    Suppose we wanted to get from San Francisco to the Strata 2012 Conference in Santa Clara .
  148. Google
    a widely used search engine that uses text-matching techniques to find web pages that are important and relevant to a user's search
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  149. ranking
    position on a scale in relation to others
    In Google’s case, they could control the ranking of the search results.
  150. wart
    any small rounded protuberance
    Amazon’s recommendation engine is probably the best one out there, but it’s easy to get it to show its warts.
  151. marketing
    the commercial processes in promoting and selling something
    ]

    ODG identified which

    levers the insurance company could control: what price to charge each customer, what types of accidents to cover, how much to spend on marketing and customer service, and how to react to their competitors’ pricing decisions.
  152. curve
    the trace of a point whose direction of motion changes
    The price elasticity model is a curve of price versus the probability of the customer accepting the policy conditional on that price.
  153. acceleration
    an increase in rate of change
    We need to define the

    models we will need, such as physics models to predict the effects of steering, braking and acceleration, and pattern recognition algorithms to interpret data from the road signs.
  154. silicon
    a tetravalent nonmetallic element
    In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters .
  155. design
    the act of working out the form of something
    But as data scientists build increasingly sophisticated products, they need a systematic design approach.
  156. mathematically
    with respect to mathematics
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  157. homepage
    the main starting point for a website
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  158. interface
    a surface forming a common boundary between two things
    These disaster applications are a particularly good example of why data products need simple, well-designed interfaces that produce concrete recommendations.
  159. skid
    a plank used to make a track for rolling or sliding objects
    If it makes a right turn at 55 mph in these weather conditions, will it skid off the road?
  160. personalize
    make personal or more personal
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  161. seminal
    influential and providing a basis for later development
    Brian Ripley’s seminal book on pattern recognition gives credit for many ideas and techniques to largely forgotten engineering papers from the 1970s.
  162. identify
    recognize as being
    ]

    ODG identified which

    levers the insurance company could control: what price to charge each customer, what types of accidents to cover, how much to spend on marketing and customer service, and how to react to their competitors’ pricing decisions.
  163. encompass
    include in scope
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  164. accelerator
    a pedal that controls the throttle valve
    The levers are the vehicle controls we are all familiar with: steering wheel, accelerator, brakes, etc.
  165. enlarge
    make bigger
    Click to enlarge .
  166. base
    lowest support of a structure
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  167. typing
    writing done with a typewriter
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  168. green light
    a signal to proceed
    Next, we consider what

    data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians).
  169. recommend
    express a good opinion of
    For example, if customer A buys products 1 and 10, and customer B buys products 1, 2, 4, and 10, the engine will recommend that A buy 2 and 4.
  170. search engine
    a computer program that retrieves documents or files or data from a database or from a computer network (especially from the internet)
    The best way to illustrate this process is with a familiar data product: search engines.
  171. co-author
    be a co-author on (a book, a paper)
    [ Note: Co-author Jeremy Howard founded ODG.
  172. brake
    a restraint used to slow or stop a vehicle
    The levers are the vehicle controls we are all familiar with: steering wheel, accelerator, brakes, etc.
  173. bookstore
    a shop where books are sold
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  174. modelling
    a preliminary sculpture in wax or clay from which a finished work can be copied
    Jeremy Howard examined these questions in his Strata CA 12 session, “ From Predictive Modelling to Optimization: The Next Frontier .”
  175. profit
    the advantageous quality of being beneficial
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  176. sophisticated
    having worldly knowledge and refinement
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  177. forecasting
    a statement made about the future
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  178. common good
    the good of a community
    Data scientists now have the predictive tools to build products that increase the common good, but they need to be aware that building the models is not enough if they do not also produce optimized, implementable outcomes.
  179. react
    show a response to something
    ]

    ODG identified which

    levers the insurance company could control: what price to charge each customer, what types of accidents to cover, how much to spend on marketing and customer service, and how to react to their competitors’ pricing decisions.
  180. insurance
    protection against future loss
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  181. achieve
    gain with effort
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  182. complicate
    make less simple
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  183. combine
    put or add together
    The models will take both the levers and any uncontrollable variables as their inputs; the outputs from the models can be combined to predict the final state for our objective.
  184. pervade
    spread or diffuse through
    Data science is beginning to pervade even the most bricks-and-mortar elements of our lives.
  185. mileage
    distance measured in miles
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  186. constraint
    the state of being physically limited
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  187. mph
    the ratio of the distance traveled (in miles) to the time spent traveling (in hours)
    If it makes a right turn at 55 mph in these weather conditions, will it skid off the road?
  188. man-made
    not of natural origin; prepared or made artificially
    In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters .
  189. Phoenix
    the state capital and largest city located in south central Arizona; situated in a former desert that has become a prosperous agricultural area thanks to irrigation
    The screenshot below is taken from a model integration tool designed by Phoenix Integration .
  190. refine
    reduce to a pure state
    The Modeler takes the raw data and converts it into slightly more refined predicted data.
  191. placement
    the spatial property of the way in which something is placed
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  192. improve
    to make better
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  193. tandem
    an arrangement of objects or persons one behind another
    In engineering, it is often necessary to link many component models together so that they can be simulated and optimized in tandem.
  194. coordinate
    of equal importance, rank, or degree
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  195. systematic
    characterized by order and planning
    But as data scientists build increasingly sophisticated products, they need a systematic design approach.
  196. clothe
    provide with clothes or put clothes on
    Plenty of websites sell designer denim, but for many women, high-end jeans are the one item of clothing they never buy online because it’s hard to find the right pair without trying them on.
  197. based
    having a base
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  198. steering
    the act of guiding or showing the way
    The levers are the vehicle controls we are all familiar with: steering wheel, accelerator, brakes, etc.
  199. plumbing
    utility consisting of the pipes and fixtures for the distribution of water or gas in a building and for the disposal of sewage
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  200. GPS
    a navigational system involving satellites and computers that can determine the latitude and longitude of a receiver on Earth by computing the time difference for signals from different satellites to reach the receiver
    Instead of the femme-bot voice of the GPS unit telling us which route to take and where to turn, what would it take to build a car that would make those decisions by itself?
  201. server
    a person who waits on tables in a restaurant
    These firms have plenty of experience building models of each of the components and systems in their final product, whether they’re building a server farm or a fighter jet.
  202. in stock
    available for use or sale
    We can build a

    Simulator to test the utility of each of the many possible books we have in stock, or perhaps just over all the outputs of a collaborative filtering model of similar customer purchases, and then build a simple

    Optimizer that ranks and displays the recommended books based on their simulated utility.
  203. improving
    getting higher or more vigorous
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  204. bookseller
    the proprietor of a bookstore
    The Strand bookseller made a brilliant but far-fetched recommendation probably based more on the character of Morrison’s writing than superficial similarities between Morrison and other authors.
  205. probability
    a measure of how likely it is that some event will occur
    The first component of ODG’s Modeler was a model of price elasticity (the probability that a customer will accept a given price) for new policies and for renewals.
  206. feed
    provide as food
    The Simulator’s result is fed to an

    Optimizer , which takes the surface of possible outcomes and identifies the highest point.
  207. query
    an instance of questioning
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  208. airplane
    a fixed-wing aircraft powered by propellers or jets
    The

    objective is clearly defined: build an airplane wing.
  209. click
    a short light metallic sound
    Click to enlarge .
  210. turbulence
    instability in the atmosphere
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  211. signaling
    any nonverbal action or gesture that encodes a message
    The self-driving car needs to take the next step: after

    simulating all the possibilities, it must

    optimize the results of the simulation to pick the best combination of acceleration and braking, steering and signaling, to get us safely to Santa Clara.
  212. catastrophic
    extremely harmful; bringing physical or financial ruin
    The Optimizer not only finds the best outcomes, it can also identify catastrophic outcomes and show how to avoid them.
  213. podcast
    a digital audio file made available on the internet
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  214. tab
    a short strip of material attached to or projecting from something in order to facilitate opening or identifying or handling it
    The

    data is in the wing materials’ physical properties; costs are listed in another tab of the application.
  215. step
    the act of changing location by raising the foot and setting it down
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  216. paired
    used of gloves, socks, etc.
    For example, a pair of jeans that is often paired with a particular top, or the first part of a series of novels that often leads to a sale of the whole set.
  217. drive
    operate or control a vehicle
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  218. collect
    gather
    Our objective and available levers, what data we already have and what additional data we will need to collect, determine the models we can build.
  219. metric
    based on a decimal unit of measurement
    Because the simulation is at a per-policy level, the insurer can view the impact of a given set of price changes on revenue, market share, and other metrics over time.
  220. purchase
    acquire by means of a financial transaction
    The current algorithms predict what products a customer will like , based on purchase history and the histories of similar customers.
  221. tuning
    calibrating something to a standard frequency
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  222. trademark
    a registered symbol identifying a product's manufacturer
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  223. autonomous
    existing as an independent entity
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  224. steer
    be a guiding or motivating force or drive
    The levers are the vehicle controls we are all familiar with: steering wheel, accelerator, brakes, etc.
  225. generate
    bring into existence
    We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.
  226. apply
    employ for a particular purpose
    ODG approached this problem with an early use of the Drivetrain Approach and a practical take on step 4 that can be applied to a wide range of problems.
  227. allow for
    make a possibility or provide opportunity for
    These models predicted whether customers would renew their policies in one year, allowing for changes in price and willingness to jump to a competitor.
  228. strategy
    an elaborate and systematic plan of action
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  229. output
    production of a certain amount
    The models will take both the levers and any uncontrollable variables as their inputs; the outputs from the models can be combined to predict the final state for our objective.
  230. discount
    an amount or percentage deducted
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  231. business school
    a graduate school offering study leading to a degree of Master in Business Administration
    In the future, we hope to see optimization taught in business schools as well as in statistics departments.
  232. sparse
    not dense or plentiful
    A company like Amazon represents every purchase that has ever been made as a giant sparse matrix, with customers as the rows and products as the columns.
  233. similarity
    the quality of being alike
    The Strand bookseller made a brilliant but far-fetched recommendation probably based more on the character of Morrison’s writing than superficial similarities between Morrison and other authors.
  234. search
    look or seek
    The best way to illustrate this process is with a familiar data product: search engines.
  235. dynamics
    mechanics concerned with forces that cause motions of bodies
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  236. utility
    the quality of being of practical use
    The difference between these two probabilities is a utility function for a given recommendation to a customer (see Recommendation Engine figure, below).
  237. Faulkner
    United States novelist (originally Falkner) who wrote about people in the southern United States (1897-1962)
    The girl behind the counter recommended William Faulkner’s “Absolom Absolom.”
  238. construct
    make by combining materials and parts
    We could construct a patience model for the customers’ tolerance for poorly targeted communications: When do they tune them out and filter our messages straight to spam?
  239. user
    someone who employs or takes advantage of something
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  240. uphill
    upward on a hill or incline
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  241. emulate
    strive to equal or match, especially by imitating
    What we would really like to do is emulate the experience of Mark Johnson, CEO of Zite , who gave a perfect example of what a customer’s recommendation experience should be like in a recent TOC talk .
  242. check out
    examine so as to determine accuracy, quality, or condition
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  243. engineering
    applying scientific knowledge to practical problems
    Brian Ripley’s seminal book on pattern recognition gives credit for many ideas and techniques to largely forgotten engineering papers from the 1970s.
  244. good example
    a person or thing to be imitated; ideal model
    These disaster applications are a particularly good example of why data products need simple, well-designed interfaces that produce concrete recommendations.
  245. reconsider
    think about again, usually with a view to changing the mind
    Instead, let’s design an improved recommendation engine using the Drivetrain Approach, starting by reconsidering our

    objective .
  246. cooling
    the process of becoming cooler; a falling temperature
    There are many techniques to avoid this problem, some based on statistics and spreading our bets widely, and others based on systems seen in nature, like biological evolution or the cooling of atoms in glass.
  247. retention
    the act of keeping something
    ODG also built models for customer retention.
  248. mitigate
    lessen or to try to lessen the seriousness or extent of
    We will show how to go about building an optimized marketing strategy that mitigates these effects.
  249. invent
    come up with after a mental effort
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  250. produce
    bring forth or yield
    We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.
  251. tool
    an implement used to perform a task or job
    The screenshot below is taken from a model integration tool designed by Phoenix Integration .
  252. feed on
    be sustained by
    Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “ Discworld series :”

    All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books.
  253. item
    a distinct part that can be specified separately in a group
    Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “ Discworld series :”

    All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books.
  254. assembly
    a group of persons gathered together for a common purpose
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  255. maximum
    the greatest or most complete or best possible
    The final curve has a clearly identifiable local maximum that represents the best price to charge a customer for the first year.
  256. process
    a particular course of action intended to achieve a result
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  257. reconnaissance
    the act of scouting, especially to gain information
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  258. online
    connected to a computer network or accessible by computer
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  259. vehicle
    a conveyance that transports people or objects
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  260. bubble
    a hollow globule of gas (e.g., air or carbon dioxide)
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  261. function
    what something is used for
    The difference between these two probabilities is a utility function for a given recommendation to a customer (see Recommendation Engine figure, below).
  262. fill in
    supply with information on a specific topic
    Once they have the data in this format, data scientists apply some form of collaborative filtering to “fill in the matrix.”
  263. functioning
    performing or able to perform its regular purpose
    These outcomes can be fed to an

    Optimizer to build a functioning and cost-effective airplane wing.
  264. mechanical
    using tools or devices
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  265. designed
    done or made or performed with purpose and intent
    What is most important about these examples is that the engineers who designed these data products didn’t start by building a neato robot and then looking for something to do with it.
  266. stampede
    a wild headlong rush of frightened animals
    This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events.
  267. diffusion
    the act of dispersing something
    For example, resistance in the electrical system produces heat, which needs to be included as an input for the thermal diffusion and cooling model.
  268. mathematician
    a person skilled in the logic of quantity and arrangement
    Sidebar: Optimization in the real world

    Optimization is a classic problem that has been studied by Newton and Gauss all the way up to mathematicians and engineers in the present day.
  269. meter
    a basic unit of length (approximately 1.094 yards)
    Any city with metered stoplights already has all the necessary information; they just haven’t found a way to suck the meaning out of it.
  270. airline
    a commercial business that provides scheduled flights
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  271. buy
    obtain by purchase
    For example, if customer A buys products 1 and 10, and customer B buys products 1, 2, 4, and 10, the engine will recommend that A buy 2 and 4.
  272. Terry
    English actress (1847-1928)
    Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “ Discworld series :”

    All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books.
  273. book
    an object consisting of a number of pages bound together
    Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “ Discworld series :”

    All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books.
  274. Mellon
    United States financier and philanthropist (1855-1937)
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  275. blandly
    in a bland manner
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  276. example
    an item of information that is typical of a class or group
    We begin by applying the Drivetrain Approach to a familiar example, recommendation engines, and then building this up into an entire optimized marketing strategy.
  277. adept
    having or showing knowledge and skill and aptitude
    As scientists and engineers become more adept at applying prediction and optimization to everyday problems, they are expanding the art of the possible, optimizing everything from our personal health to the houses and cities we live in.
  278. wing
    a movable organ for flying (one of a pair)
    The

    objective is clearly defined: build an airplane wing.
  279. uncontrollable
    incapable of being restrained or managed
    The models will take both the levers and any uncontrollable variables as their inputs; the outputs from the models can be combined to predict the final state for our objective.
  280. jean
    close-fitting trousers worn for manual work or casual wear
    Plenty of websites sell designer denim, but for many women, high-end jeans are the one item of clothing they never buy online because it’s hard to find the right pair without trying them on.
  281. sale
    the general activity of selling
    The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with books he or she would not have purchased without the recommendation .
  282. warp
    bend or twist out of shape
    That excess heat could cause mechanical components to warp, producing stresses that should be inputs to the mechanical models.
  283. application
    the action of putting something into operation
    Engineers are often quietly on the leading edge of algorithmic applications because they have long been thinking about their own modeling challenges in an objective-based way.
  284. market
    a store where groceries are sold
    But those models did not solve the pricing problem, so the insurance companies would set a price based on a combination of guesswork and market studies.
  285. embedded
    enclosed firmly in a surrounding mass
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  286. taper
    diminish gradually
    The wing box includes the design

    levers like span, taper ratio and sweep.
  287. brakes
    a device that works to slow a motor vehicle
    The levers are the vehicle controls we are all familiar with: steering wheel, accelerator, brakes, etc.
  288. plumb
    exactly vertical
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  289. begin
    set in motion, cause to start
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  290. control
    power to direct or determine
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  291. additional
    further or extra
    Our objective and available levers, what data we already have and what additional data we will need to collect, determine the models we can build.
  292. electrical
    relating to or concerned with electricity
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  293. disaster
    an event resulting in great loss and misfortune
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  294. need
    require or want
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  295. step in
    act as a substitute
    The four steps in the Drivetrain Approach.
  296. traffic
    vehicles or pedestrians traveling in a particular locality
    If we want a more sophisticated system, we can build another model for traffic congestion and yet another model to forecast weather conditions and their effect on the safest maximum speed.
  297. building
    the act of constructing something
    So, why aren’t we building them?
  298. evacuation
    the act of leaving a dangerous place in an orderly fashion
    This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events.
  299. car
    a motor vehicle with four wheels
    Engineers start by defining a clear

    objective : They want a car to drive safely from point A to point B without human intervention.
  300. forecast
    a prediction about how something will develop
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  301. system
    a group of independent elements comprising a unified whole
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  302. wired
    tied or bound with wire
    As one engineer on the Google self-driving car project put it in a recent Wired article , “We’re analyzing and predicting the world 20 times a second.”
  303. problem
    a question raised for consideration or solution
    But those models did not solve the pricing problem, so the insurance companies would set a price based on a combination of guesswork and market studies.
  304. trampling
    the sound of heavy treading or stomping
    This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events.
  305. cut through
    travel across or pass over
    She cut through the chaff of the obvious to make a recommendation that will send the customer home with a new book, and returning to Strand again and again in the future.
  306. effectiveness
    power to be effective
    We can keep the “like” model that we have already built as well as the causality model for purchases with and without recommendations, and then take a staged approach to adding additional models that we think will improve the marketing effectiveness.
  307. chaff
    material consisting of seed coverings and pieces of stem
    She cut through the chaff of the obvious to make a recommendation that will send the customer home with a new book, and returning to Strand again and again in the future.
  308. Stockholm
    the capital and largest city of Sweden
    For motor vehicle traffic, IBM performed a project with the city of Stockholm to optimize traffic flows that reduced congestion by nearly a quarter, and increased the air quality in the inner city by 25%.
  309. implicit
    suggested though not directly expressed
    The third step was to consider what new

    data they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose.
  310. driving
    the act of controlling and steering the movement of a vehicle or animal
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  311. Key
    United States lawyer and poet who wrote a poem after witnessing the British attack on Baltimore during the War of 1812; the poem was later set to music and entitled `The Star-Spangled Banner' (1779-1843)
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  312. suck
    draw into the mouth by creating a vacuum in the mouth
    Any city with metered stoplights already has all the necessary information; they just haven’t found a way to suck the meaning out of it.
  313. analyze
    break down into components or essential features
    As one engineer on the Google self-driving car project put it in a recent Wired article , “We’re analyzing and predicting the world 20 times a second.”
  314. relevant
    having a bearing on or connection with the subject at issue
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  315. using
    an act that exploits or victimizes someone
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  316. use
    put into service
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  317. stumble
    miss a step and fall or nearly fall
    It is easy to stumble into the trap of thinking that since data exists somewhere abstract, on a spreadsheet or in the cloud, that data products are just abstract algorithms.
  318. familiar
    a friend who is frequently in the company of another
    The best way to illustrate this process is with a familiar data product: search engines.
  319. for example
    as an example
    For example, if customer A buys products 1 and 10, and customer B buys products 1, 2, 4, and 10, the engine will recommend that A buy 2 and 4.
  320. emphasize
    stress or single out as important
    Improving the data collection and predictive models is very important, but we want to emphasize the importance of beginning by defining a clear objective with levers that produce actionable outcomes.
  321. promotion
    the act of raising in rank or position
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  322. balancing
    getting two things to correspond
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  323. result
    something that follows as a consequence
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  324. final
    an exam administered at the end of an academic term
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  325. format
    the general appearance of a publication
    Once they have the data in this format, data scientists apply some form of collaborative filtering to “fill in the matrix.”
  326. inventory
    a detailed list of all the items in stock
    Only then does the customer get to browse a recommended selection of Zafu’s inventory.
  327. subway
    a rail system operating below the surface of the ground
    This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events.
  328. combined
    made or joined or united into one
    The models will take both the levers and any uncontrollable variables as their inputs; the outputs from the models can be combined to predict the final state for our objective.
  329. procedure
    a particular course of action intended to achieve a result
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  330. statistics
    a branch of mathematics concerned with quantitative data
    There are many techniques to avoid this problem, some based on statistics and spreading our bets widely, and others based on systems seen in nature, like biological evolution or the cooling of atoms in glass.
  331. staged
    deliberately arranged for effect
    We can keep the “like” model that we have already built as well as the causality model for purchases with and without recommendations, and then take a staged approach to adding additional models that we think will improve the marketing effectiveness.
  332. tweet
    a weak chirping sound as of a small bird
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  333. cancel
    declare null and void
    It will be low in cases where the algorithm recommends a familiar book that the customer has already rejected (both components are small) or a book that he or she would have bought even without the recommendation (both components are large and cancel each other out).
  334. accepting
    tolerating without protest
    The price elasticity model is a curve of price versus the probability of the customer accepting the policy conditional on that price.
  335. diagram
    a drawing intended to explain how something works
    Although it’s from a completely different engineering discipline, this diagram is very similar to the Drivetrain Approach we’ve recommended for data products.
  336. bury
    place in a grave or tomb
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  337. tolerance
    willingness to respect the beliefs or practices of others
    We could construct a patience model for the customers’ tolerance for poorly targeted communications: When do they tune them out and filter our messages straight to spam?
  338. sales
    income (at invoice values) received for goods and services over some given period of time
    The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with books he or she would not have purchased without the recommendation .
  339. preference
    the right or chance to choose
    Zafu’s approach is not to send their customers directly to the clothes, but to begin by asking a series of simple questions about the customers’ body type, how well their other jeans fit, and their fashion preferences.
  340. Howard
    Queen of England as the fifth wife of Henry VIII who was accused of adultery and executed (1520-1542)
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  341. transformed
    given a completely different form or appearance
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  342. Carnegie
    United States industrialist and philanthropist who endowed education and public libraries and research trusts (1835-1919)
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  343. reduce
    make smaller
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  344. dynamic
    characterized by action or forcefulness of personality
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  345. educate
    give knowledge acquired by learning and instruction
    We don’t know what design approaches will be developed in the future, but right now, there is a need for the data science community to coalesce around a shared vocabulary and product design process that can be used to educate others on how to derive value from their predictive models.
  346. develop
    progress or evolve through a process of natural growth
    From there, they developed an optimized pricing process that added hundreds of millions of dollars to the insurers’ bottom lines.
  347. likelihood
    the probability of a specified outcome
    Their actuaries could build models to predict a customer’s likelihood of being in an accident and the expected value of claims.
  348. tangible
    perceptible by the senses, especially the sense of touch
    So, we would like to conclude by showing you how objective-based data products are already a part of the tangible world.
  349. 1970s
    the decade from 1970 to 1979
    Brian Ripley’s seminal book on pattern recognition gives credit for many ideas and techniques to largely forgotten engineering papers from the 1970s.
  350. covert
    secret or hidden
    They started with an objective like, “I want my car to drive me places,” and then designed a covert data product to accomplish that task.
  351. biological
    pertaining to life and living things
    There are many techniques to avoid this problem, some based on statistics and spreading our bets widely, and others based on systems seen in nature, like biological evolution or the cooling of atoms in glass.
  352. abstract
    existing only in the mind
    It is easy to stumble into the trap of thinking that since data exists somewhere abstract, on a spreadsheet or in the cloud, that data products are just abstract algorithms.
  353. multiply
    combine by adding the same number repeatedly
    Multiplying these two curves creates a final curve that shows price versus expected profit (see Expected Profit figure, below).
  354. manufacture
    put together out of artificial or natural components
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  355. page
    one side of one leaf of a book or other document
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  356. think about
    have on one's mind, think about actively
    Only after these first three steps do we begin thinking about building the predictive

    models .
  357. get to
    arrive at the point of
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  358. variable
    something that is likely to change
    The models will take both the levers and any uncontrollable variables as their inputs; the outputs from the models can be combined to predict the final state for our objective.
  359. bias
    a partiality preventing objective consideration of an issue
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  360. self
    your consciousness of your own identity
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  361. experiment
    the act of conducting a controlled test or investigation
    While the insurers were reluctant to conduct these experiments on real customers, as they’d certainly lose some customers as a result, they were swayed by the huge gains that optimized policy pricing might deliver.
  362. bot
    a mechanism that can move independently of external control
    Instead of the femme-bot voice of the GPS unit telling us which route to take and where to turn, what would it take to build a car that would make those decisions by itself?
  363. manufacturing
    the act of making something (a product) from raw materials
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  364. entire
    constituting the full quantity or extent; complete
    The technology exists to build data products that can revolutionize entire industries.
  365. relate
    give an account of
    The second component of ODG’s Modeler related price to the insurance company’s profit, conditional on the customer accepting this price.
  366. adjust
    alter or regulate so as to conform to a standard
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  367. accident
    an unfortunate mishap
    Their actuaries could build models to predict a customer’s likelihood of being in an accident and the expected value of claims.
  368. start
    take the first step or steps in carrying out an action
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  369. Newton
    English mathematician and physicist
    Sidebar: Optimization in the real world

    Optimization is a classic problem that has been studied by Newton and Gauss all the way up to mathematicians and engineers in the present day.
  370. specified
    clearly and explicitly stated
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  371. can
    airtight sealed metal container for food or drink, etc.
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  372. deliver
    bring to a destination
    While the insurers were reluctant to conduct these experiments on real customers, as they’d certainly lose some customers as a result, they were swayed by the huge gains that optimized policy pricing might deliver.
  373. range
    a variety of different things or activities
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  374. vocabulary
    a language user's knowledge of words
    We don’t know what design approaches will be developed in the future, but right now, there is a need for the data science community to coalesce around a shared vocabulary and product design process that can be used to educate others on how to derive value from their predictive models.
  375. collection
    the act of gathering something together
    Online fashion retailer Zafu shows how to encourage the customer to participate in this collection process.
  376. defined
    showing clearly the outline or profile or boundary
    The

    objective is clearly defined: build an airplane wing.
  377. mining
    the act of extracting ores or coal from the earth
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  378. policy
    a plan of action adopted by an individual or social group
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  379. website
    a set of pages on the internet organized as a single unit
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  380. framework
    the underlying structure
    We introduced the Drivetrain Approach to provide a framework for designing the next generation of great data products and described how it relies at its heart on optimization.
  381. atom
    the smallest component of an element
    There are many techniques to avoid this problem, some based on statistics and spreading our bets widely, and others based on systems seen in nature, like biological evolution or the cooling of atoms in glass.
  382. consider
    think about carefully; weigh
    The third step was to consider what new

    data they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose.
  383. describe
    give a statement representing something
    Irfan Ahmed of CloudPhysics provides a good taxonomy of predictive modeling that describes this entire assembly line process:

    “When dealing with hundreds or thousands of individual components models to understand the behavior of the full-system, a ‘search’ has to be done.
  384. decision
    a position or opinion reached after consideration
    Optimizing for an actionable outcome over the right predictive models can be a company’s most important strategic decision.
  385. expect
    regard something as probable or likely
    Their actuaries could build models to predict a customer’s likelihood of being in an accident and the expected value of claims.
  386. renewal
    the act of renewing
    The first component of ODG’s Modeler was a model of price elasticity (the probability that a customer will accept a given price) for new policies and for renewals.
  387. unaware
    not having or showing knowledge or understanding
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  388. company
    an institution created to conduct business
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  389. encourage
    inspire with confidence
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  390. mortar
    a vessel in which substances can be ground with a pestle
    Data science is beginning to pervade even the most bricks-and-mortar elements of our lives.
  391. needs
    in such a manner as could not be otherwise
    For example, resistance in the electrical system produces heat, which needs to be included as an input for the thermal diffusion and cooling model.
  392. tune
    a succession of notes forming a distinctive sequence
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  393. physics
    the science of matter and energy and their interactions
    We need to define the

    models we will need, such as physics models to predict the effects of steering, braking and acceleration, and pattern recognition algorithms to interpret data from the road signs.
  394. impact
    the striking of one body against another
    If a new competitor enters the market and our company does not react, what will be the impact on our bottom line?”
  395. acquiring
    the act of coming into possession of something
    The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
  396. swarm
    a group of many things in the air or on the ground
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  397. link
    connect, fasten, or put together two or more pieces
    The third step was to consider what new

    data they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose.
  398. interpret
    make sense of; assign a meaning to
    We need to define the

    models we will need, such as physics models to predict the effects of steering, braking and acceleration, and pattern recognition algorithms to interpret data from the road signs.
  399. asking
    the verbal act of requesting
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  400. everyday
    commonplace and ordinary
    As scientists and engineers become more adept at applying prediction and optimization to everyday problems, they are expanding the art of the possible, optimizing everything from our personal health to the houses and cities we live in.
  401. participate
    be involved in
    Online fashion retailer Zafu shows how to encourage the customer to participate in this collection process.
  402. go about
    begin to deal with
    We will show how to go about building an optimized marketing strategy that mitigates these effects.
  403. span
    the distance or interval between two points
    The wing box includes the design

    levers like span, taper ratio and sweep.
  404. rank
    relative status
    In Google’s case, they could control the ranking of the search results.
  405. Santa
    the legendary patron saint of children
    Suppose we wanted to get from San Francisco to the Strata 2012 Conference in Santa Clara .
  406. fighter
    someone who fights (or is fighting)
    These firms have plenty of experience building models of each of the components and systems in their final product, whether they’re building a server farm or a fighter jet.
  407. sequence
    a following of one thing after another in time
    A purchase sequence causality model can be used to identify key “entry products.”
  408. pattern
    a repeated design, structure, or arrangement
    Brian Ripley’s seminal book on pattern recognition gives credit for many ideas and techniques to largely forgotten engineering papers from the 1970s.
  409. accessible
    capable of being reached
    There are many different optimization techniques to choose from (see see sidebar, below ), but it is a well-understood field with robust and accessible solutions.
  410. poorly
    in a poor or improper or unsatisfactory manner; not well
    We could construct a patience model for the customers’ tolerance for poorly targeted communications: When do they tune them out and filter our messages straight to spam?
  411. avoid
    stay away from
    The Optimizer not only finds the best outcomes, it can also identify catastrophic outcomes and show how to avoid them.
  412. create
    bring into existence
    Multiplying these two curves creates a final curve that shows price versus expected profit (see Expected Profit figure, below).
  413. toll
    a fee levied for the use of roads or bridges
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  414. emergency
    a sudden unforeseen crisis that requires immediate action
    This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events.
  415. communications
    the discipline that studies transmitting information
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  416. designer
    someone who creates plans to be used in making something
    Plenty of websites sell designer denim, but for many women, high-end jeans are the one item of clothing they never buy online because it’s hard to find the right pair without trying them on.
  417. include
    have as a part; be made up out of
    For example, resistance in the electrical system produces heat, which needs to be included as an input for the thermal diffusion and cooling model.
  418. take on
    take on titles, offices, duties, responsibilities
    ODG approached this problem with an early use of the Drivetrain Approach and a practical take on step 4 that can be applied to a wide range of problems.
  419. willingness
    cheerful compliance
    These models predicted whether customers would renew their policies in one year, allowing for changes in price and willingness to jump to a competitor.
  420. browse
    feed as in a meadow or pasture
    Only then does the customer get to browse a recommended selection of Zafu’s inventory.
  421. goal
    the state of affairs that a plan is intended to achieve
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  422. emerge
    come out into view, as from concealment
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  423. robust
    sturdy and strong in form, constitution, or construction
    There are many different optimization techniques to choose from (see see sidebar, below ), but it is a well-understood field with robust and accessible solutions.
  424. raw
    not treated with heat to prepare it for eating
    Picture a Model Assembly Line for data products that transforms the raw data into an actionable outcome.
  425. line
    a length between two points
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  426. industry
    the action of making of goods and services for sale
    The technology exists to build data products that can revolutionize entire industries.
  427. exist
    have a presence
    The technology exists to build data products that can revolutionize entire industries.
  428. slice
    a thin flat piece cut off of some object
    The expected profit curve is just a slice of the surface of possible outcomes.
  429. type
    a subdivision of a particular kind of thing
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  430. listed
    on a list
    The

    data is in the wing materials’ physical properties; costs are listed in another tab of the application.
  431. derive
    come from
    We don’t know what design approaches will be developed in the future, but right now, there is a need for the data science community to coalesce around a shared vocabulary and product design process that can be used to educate others on how to derive value from their predictive models.
  432. add
    join or combine or unite with others
    From there, they developed an optimized pricing process that added hundreds of millions of dollars to the insurers’ bottom lines.
  433. simple
    having few parts; not complex or complicated or involved
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  434. collective
    done by or characteristic of individuals acting together
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  435. climb
    go up or advance
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  436. helpful
    providing assistance or serving a useful function
    This is not to say that Amazon’s recommendation engine could not have made the same connection; the problem is that this helpful recommendation will be buried far down in the recommendation feed, beneath books that have more obvious similarities to “Beloved.”
  437. superficial
    of, affecting, or being on or near the surface
    The Strand bookseller made a brilliant but far-fetched recommendation probably based more on the character of Morrison’s writing than superficial similarities between Morrison and other authors.
  438. premium
    having or reflecting superior quality or value
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  439. reduced
    made less in size or amount or degree
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  440. CEO
    the corporate executive responsible for the operations of the firm; reports to a board of directors; may appoint other managers (including a president)
    What we would really like to do is emulate the experience of Mark Johnson, CEO of Zite , who gave a perfect example of what a customer’s recommendation experience should be like in a recent TOC talk .
  441. conducting
    the way of administering a business
    This will require conducting many randomized experiments in order to collect data about a wide range of recommendations for a wide range of customers.
  442. sporting
    relating to or used in sports
    This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events.
  443. explore
    travel to or penetrate into
    They can also explore how the distribution of profit is shaped by the inputs outside of the insurer’s control: “What if the economy crashes and the customer loses his job?
  444. science
    a branch of study or knowledge involving the observation, investigation, and discovery of general laws or truths that can be tested systematically
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  445. already
    prior to a specified or implied time
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  446. cost
    be priced at
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  447. experience
    the content of observation or participation in an event
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  448. recognition
    identifying something or someone by remembering
    Brian Ripley’s seminal book on pattern recognition gives credit for many ideas and techniques to largely forgotten engineering papers from the 1970s.
  449. determine
    find out or learn with certainty, as by making an inquiry
    Our objective and available levers, what data we already have and what additional data we will need to collect, determine the models we can build.
  450. multiple
    having or involving more than one part or entity
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  451. renew
    reestablish on an improved basis
    These models predicted whether customers would renew their policies in one year, allowing for changes in price and willingness to jump to a competitor.
  452. ratio
    relation with respect to comparative quantity or magnitude
    The wing box includes the design

    levers like span, taper ratio and sweep.
  453. examine
    observe, check out, and look over carefully or inspect
    The vehicle needs to use a simulator to examine the results of the possible actions it could take.
  454. combination
    the act of bringing things together to form a new whole
    But those models did not solve the pricing problem, so the insurance companies would set a price based on a combination of guesswork and market studies.
  455. similar
    having the same or nearly the same characteristics
    The current algorithms predict what products a customer will like , based on purchase history and the histories of similar customers.
  456. detect
    discover or determine the existence, presence, or fact of
    Next, we consider what

    data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians).
  457. tailor
    a person whose occupation is making and altering garments
    Zafu can tailor their recommendations to fit as well as their jeans because their system is asking the right questions.
  458. assumption
    the act of taking something for granted
    Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “ Discworld series :”

    All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books.
  459. trivial
    (informal) small and of little importance
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  460. inspire
    serve as the inciting cause of
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  461. developed
    being changed over time, as to be stronger or more complete
    From there, they developed an optimized pricing process that added hundreds of millions of dollars to the insurers’ bottom lines.
  462. illustrate
    depict with a visual representation
    The best way to illustrate this process is with a familiar data product: search engines.
  463. wide
    having great extent from one side to the other
    ODG approached this problem with an early use of the Drivetrain Approach and a practical take on step 4 that can be applied to a wide range of problems.
  464. operator
    an agent that operates some apparatus or machine
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  465. session
    a meeting for execution of a group's functions
    Jeremy Howard examined these questions in his Strata CA 12 session, “ From Predictive Modelling to Optimization: The Next Frontier .”
  466. reject
    refuse to accept or acknowledge
    It will be low in cases where the algorithm recommends a familiar book that the customer has already rejected (both components are small) or a book that he or she would have bought even without the recommendation (both components are large and cancel each other out).
  467. expected
    considered likely or probable to happen or arrive
    Their actuaries could build models to predict a customer’s likelihood of being in an accident and the expected value of claims.
  468. accurately
    strictly correctly
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  469. intervention
    the act of putting something between two things
    Engineers start by defining a clear

    objective : They want a car to drive safely from point A to point B without human intervention.
  470. improved
    become or made better in quality
    Instead, let’s design an improved recommendation engine using the Drivetrain Approach, starting by reconsidering our

    objective .
  471. unexpected
    not anticipated or planned for
    There may be some unexpected recommendations on pages 2 through 14 of the feed, but how many customers are going to bother clicking through?
  472. distribution
    the act of spreading or apportioning
    The next machine on the assembly line is a

    Simulator , which lets ODG ask the “what if” questions to see how the levers affect the distribution of the final outcome.
  473. download
    transfer a file or program to a smaller computer
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  474. show
    make visible or noticeable
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  475. display
    something intended to communicate a particular impression
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  476. jet
    a hard black form of lignite that takes a brilliant polish
    These firms have plenty of experience building models of each of the components and systems in their final product, whether they’re building a server farm or a fighter jet.
  477. suite
    the group following and attending to some important person
    This new suite of models is not a final answer because it only identifies the outcome for a given set of inputs.
  478. machine
    a mechanical or electrical device that transmits energy
    The next machine on the assembly line is a

    Simulator , which lets ODG ask the “what if” questions to see how the levers affect the distribution of the final outcome.
  479. value
    the quality that renders something desirable
    Their actuaries could build models to predict a customer’s likelihood of being in an accident and the expected value of claims.
  480. tech
    a school teaching mechanical and industrial arts and the applied sciences
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  481. expand
    extend in one or more directions
    As scientists and engineers become more adept at applying prediction and optimization to everyday problems, they are expanding the art of the possible, optimizing everything from our personal health to the houses and cities we live in.
  482. convert
    change the nature, purpose, or function of something
    The Modeler takes the raw data and converts it into slightly more refined predicted data.
  483. transaction
    conducting business within or between groups
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  484. new
    not of long duration
    The third step was to consider what new

    data they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose.
  485. fluid
    continuous amorphous matter that tends to flow
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  486. vary
    become different in some particular way
    This can vary case by case, but a few online retailers are taking creative approaches to this step.
  487. challenge
    a call to engage in a contest or fight
    Engineers are often quietly on the leading edge of algorithmic applications because they have long been thinking about their own modeling challenges in an objective-based way.
  488. change
    become different in some particular way
    This situation changed in 1999 with a company called Optimal Decisions Group (ODG).
  489. represent
    be a delegate or spokesperson for
    The final curve has a clearly identifiable local maximum that represents the best price to charge a customer for the first year.
  490. emerging
    coming into existence
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  491. reluctant
    not eager
    While the insurers were reluctant to conduct these experiments on real customers, as they’d certainly lose some customers as a result, they were swayed by the huge gains that optimized policy pricing might deliver.
  492. obstacle
    something that stands in the way and must be surmounted
    Next, we consider what

    data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians).
  493. take to
    have a fancy or particular liking or desire for
    Instead of the femme-bot voice of the GPS unit telling us which route to take and where to turn, what would it take to build a car that would make those decisions by itself?
  494. safely
    in a manner unlikely to cause damage or harm
    Engineers start by defining a clear

    objective : They want a car to drive safely from point A to point B without human intervention.
  495. route
    an established line of travel or access
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  496. take
    get into one's hands
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  497. will
    the capability of conscious choice and decision
    The third step was to consider what new

    data they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose.
  498. realize
    be fully aware or cognizant of
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  499. flow
    move along, of liquids
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  500. suggest
    make a proposal; declare a plan for something
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  501. shipping
    the commercial enterprise of moving goods and materials
    ODG’s competitors use different techniques to find an optimal price, but they are shipping the same over-all data product.
  502. sway
    move back and forth
    While the insurers were reluctant to conduct these experiments on real customers, as they’d certainly lose some customers as a result, they were swayed by the huge gains that optimized policy pricing might deliver.
  503. communication
    the activity of conveying information
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  504. project
    a planned undertaking
    As one engineer on the Google self-driving car project put it in a recent Wired article , “We’re analyzing and predicting the world 20 times a second.”
  505. strategic
    relating to an elaborate and systematic plan of action
    Optimizing for an actionable outcome over the right predictive models can be a company’s most important strategic decision.
  506. plenty
    a full supply
    Plenty of websites sell designer denim, but for many women, high-end jeans are the one item of clothing they never buy online because it’s hard to find the right pair without trying them on.
  507. solve
    find the answer to or understand the meaning of
    But those models did not solve the pricing problem, so the insurance companies would set a price based on a combination of guesswork and market studies.
  508. steps
    the course along which a person has walked or is walking in
    The four steps in the Drivetrain Approach.
  509. solution
    a homogeneous mixture of two or more substances
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  510. automobile
    a motor vehicle with four wheels
    For an insurance company, policy price is the product, so an optimal pricing model is to them what the assembly line is to automobile manufacturing.
  511. linked
    connected, as railway cars or trailer trucks
    The third step was to consider what new

    data they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose.
  512. costs
    pecuniary reimbursement to the winning party for the expenses of litigation
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  513. confirm
    strengthen
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  514. come along
    come into being or existence, or appear on the scene
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  515. shaped
    having the shape of
    They can also explore how the distribution of profit is shaped by the inputs outside of the insurer’s control: “What if the economy crashes and the customer loses his job?
  516. response
    the speech act of continuing a conversational exchange
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  517. jump
    move forward by leaps and bounds
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  518. series
    similar things placed in order or one after another
    Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “ Discworld series :”

    All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books.
  519. technology
    the practical application of science to commerce or industry
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  520. era
    a period marked by distinctive character
    We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.
  521. obvious
    easily perceived by the senses or grasped by the mind
    She cut through the chaff of the obvious to make a recommendation that will send the customer home with a new book, and returning to Strand again and again in the future.
  522. pair
    a set of two similar things considered as a unit
    Plenty of websites sell designer denim, but for many women, high-end jeans are the one item of clothing they never buy online because it’s hard to find the right pair without trying them on.
  523. accept
    receive willingly something given or offered
    The first component of ODG’s Modeler was a model of price elasticity (the probability that a customer will accept a given price) for new policies and for renewals.
  524. below
    in or to a place that is lower
    Multiplying these two curves creates a final curve that shows price versus expected profit (see Expected Profit figure, below).
  525. good enough
    adequately good for the circumstances
    Merely predicting what will happen isn’t good enough.
  526. decide
    reach, make, or come to a conclusion about something
    The takeaway, whether you are a tiny startup or a giant insurance company, is that we unconsciously use optimization whenever we decide how to get to where we want to go.
  527. bother
    disturb, especially by minor irritations
    There may be some unexpected recommendations on pages 2 through 14 of the feed, but how many customers are going to bother clicking through?
  528. destination
    the place designated as the end, as of a race or journey
    There are plenty of cool challenges in building these models, but by themselves, they do not take us to our destination.
  529. encouraging
    giving courage or confidence or hope
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  530. beloved
    dearly loved
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  531. want
    the state of needing something that is absent or unavailable
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  532. unconsciously
    without awareness
    The takeaway, whether you are a tiny startup or a giant insurance company, is that we unconsciously use optimization whenever we decide how to get to where we want to go.
  533. happen
    come to pass
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  534. disposal
    the act or means of getting rid of something
    Second question: "What

    levers do we have at our disposal to achieve this objective?"
  535. claim
    assert or affirm strongly
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  536. rely
    have confidence or faith in
    We introduced the Drivetrain Approach to provide a framework for designing the next generation of great data products and described how it relies at its heart on optimization.
  537. realized
    successfully completed or brought to an end
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  538. lifetime
    the period during which something is functional
    Simple: we want to optimize the lifetime value from each customer.
  539. jurisdiction
    the territory within which power can be exercised
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  540. built
    having a substance added to increase effectiveness
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  541. disappear
    become invisible or unnoticeable
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  542. barrier
    a structure or object that impedes free movement
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  543. sending
    the act of causing something to go (especially messages)
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  544. quote
    repeat a passage from
    What gets lost in the quote is what happens as a result of that prediction.
  545. acceptance
    the state of being satisfactory
    This curve moves from almost certain acceptance at very low prices to almost never at high prices.
  546. camera
    equipment for taking photographs
    Next, we consider what

    data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians).
  547. withdrawn
    tending to be reserved, quiet, or introspective
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  548. check
    examine to determine accuracy or quality
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  549. question
    a sentence of inquiry that asks for a reply
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  550. ask
    make a request or demand for something to somebody
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  551. author
    a person who writes professionally
    [ Note: Co-author Jeremy Howard founded ODG.
  552. giant
    any creature of exceptional size
    The takeaway, whether you are a tiny startup or a giant insurance company, is that we unconsciously use optimization whenever we decide how to get to where we want to go.
  553. creative
    having the ability or power to invent or make something
    This can vary case by case, but a few online retailers are taking creative approaches to this step.
  554. refined
    cultivated and genteel
    The Modeler takes the raw data and converts it into slightly more refined predicted data.
  555. coin
    a flat metal piece (usually a disc) used as money
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  556. concrete
    capable of being perceived by the senses
    These disaster applications are a particularly good example of why data products need simple, well-designed interfaces that produce concrete recommendations.
  557. bundle
    a collection of things wrapped or boxed together
    Why not bundle simulation and optimization engines with a physical engine, all inside the black box of a car?
  558. respective
    considered individually
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  559. increase
    a process of becoming larger or longer or more numerous
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  560. recent
    of the immediate past or just previous to the present time
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  561. ruler
    a person who governs or commands
    We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map.
  562. climbing
    an event that involves rising to a higher point
    The danger in this hill-climbing approach is that if the steps are too small, we may get stuck at one of the many local maxima in the foothills, which will not tell us the best set of controllable inputs.
  563. weather
    atmospheric conditions such as temperature and precipitation
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  564. counter
    a calculator recording the number of times something happens
    The girl behind the counter recommended William Faulkner’s “Absolom Absolom.”
  565. related
    connected logically or causally or by shared characteristics
    The second component of ODG’s Modeler related price to the insurance company’s profit, conditional on the customer accepting this price.
  566. increasingly
    advancing in amount or intensity
    But as data scientists build increasingly sophisticated products, they need a systematic design approach.
  567. stress
    special emphasis attached to something
    That excess heat could cause mechanical components to warp, producing stresses that should be inputs to the mechanical models.
  568. surface
    the outer boundary of an artifact or a material layer
    The expected profit curve is just a slice of the surface of possible outcomes.
  569. acquire
    come into the possession of something concrete or abstract
    The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
  570. owner
    a person who owns something
    Nest is designing smart thermostats that learn the home-owner’s temperature preferences and then optimizes their energy consumption.
  571. detailed
    developed with careful treatment of particulars
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  572. look out
    be vigilant, be on the lookout or be careful
    As predictive modeling and optimization become more vital to a wide variety of activities, look out for the engineers to disrupt industries that wouldn’t immediately appear to be in the data business.
  573. starting
    appropriate to the beginning or start of an event
    Instead, let’s design an improved recommendation engine using the Drivetrain Approach, starting by reconsidering our

    objective .
  574. overhead
    located or originating from above
    The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
  575. lose
    fail to keep or to maintain
    While the insurers were reluctant to conduct these experiments on real customers, as they’d certainly lose some customers as a result, they were swayed by the huge gains that optimized policy pricing might deliver.
  576. pick
    look for and gather
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  577. extension
    act of expanding in scope
    In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters .
  578. emphasis
    intensity or forcefulness of expression
    In general, when choosing an objective function to optimize, we need less emphasis on the “function” and more on the “objective.”
  579. margin
    the boundary line or area immediately inside the boundary
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  580. identified
    having the identity known or established
    ]

    ODG identified which

    levers the insurance company could control: what price to charge each customer, what types of accidents to cover, how much to spend on marketing and customer service, and how to react to their competitors’ pricing decisions.
  581. ask for
    increase the likelihood of
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  582. drag
    pull, as against a resistance
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  583. best
    having the most positive qualities
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  584. conclude
    bring to a close
    So, we would like to conclude by showing you how objective-based data products are already a part of the tangible world.
  585. behavior
    the way a person acts toward other people
    Irfan Ahmed of CloudPhysics provides a good taxonomy of predictive modeling that describes this entire assembly line process:

    “When dealing with hundreds or thousands of individual components models to understand the behavior of the full-system, a ‘search’ has to be done.
  586. gap
    an open or empty space in or between things
    What matters is that using a Drivetrain Approach combined with a Model Assembly Line bridges the gap between predictive models and actionable outcomes.
  587. induce
    cause to act in a specified manner
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  588. low
    less than normal in degree or intensity or amount
    This curve moves from almost certain acceptance at very low prices to almost never at high prices.
  589. someone
    a human being
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  590. elevation
    distance of something above a reference point
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  591. require
    have need of
    We will show a systematic approach to step 4 that doesn’t require a PhD in computer science.
  592. appearing
    formal attendance of a party in an action
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  593. used to
    in the habit
    Finally, ODG started to design the

    models that could be used to optimize the insurer’s profit.
  594. controlled
    restrained or managed or kept within certain bounds
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  595. and then
    subsequently or soon afterward
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  596. again and again
    repeatedly
    She cut through the chaff of the obvious to make a recommendation that will send the customer home with a new book, and returning to Strand again and again in the future.
  597. all the way
    completely
    Sidebar: Optimization in the real world

    Optimization is a classic problem that has been studied by Newton and Gauss all the way up to mathematicians and engineers in the present day.
  598. complicated
    difficult to analyze or understand
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  599. spreading
    act of extending over a wider scope or expanse of space or time
    There are many techniques to avoid this problem, some based on statistics and spreading our bets widely, and others based on systems seen in nature, like biological evolution or the cooling of atoms in glass.
  600. founder
    a person who establishes some institution
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  601. possible
    capable of happening or existing
    The expected profit curve is just a slice of the surface of possible outcomes.
  602. road
    an open way (generally public) for travel or transportation
    We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map.
  603. set
    put into a certain place or abstract location
    But those models did not solve the pricing problem, so the insurance companies would set a price based on a combination of guesswork and market studies.
  604. year
    the period of time that it takes for a planet (as, e.g., Earth or Mars) to make a complete revolution around the sun
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  605. exit
    move out of or depart from
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  606. registered
    listed or recorded officially
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  607. send
    cause to go somewhere
    She cut through the chaff of the obvious to make a recommendation that will send the customer home with a new book, and returning to Strand again and again in the future.
  608. provide
    give something useful or necessary to
    Irfan Ahmed of CloudPhysics provides a good taxonomy of predictive modeling that describes this entire assembly line process:

    “When dealing with hundreds or thousands of individual components models to understand the behavior of the full-system, a ‘search’ has to be done.
  609. find
    discover or determine the existence, presence, or fact of
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  610. accomplish
    achieve with effort
    They started with an objective like, “I want my car to drive me places,” and then designed a covert data product to accomplish that task.
  611. often
    many times at short intervals
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  612. entry
    the act of going in
    A purchase sequence causality model can be used to identify key “entry products.”
  613. effects
    property of a personal character that is portable
    We will show how to go about building an optimized marketing strategy that mitigates these effects.
  614. buried
    placed in a grave
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  615. brick
    rectangular block of clay baked by the sun or in a kiln
    Data science is beginning to pervade even the most bricks-and-mortar elements of our lives.
  616. applied
    concerned with concrete problems or data
    ODG approached this problem with an early use of the Drivetrain Approach and a practical take on step 4 that can be applied to a wide range of problems.
  617. choose
    pick out from a number of alternatives
    There are many different optimization techniques to choose from (see see sidebar, below ), but it is a well-understood field with robust and accessible solutions.
  618. many
    a large number of the persons or things being discussed
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  619. conditions
    the context that influences the performance of a process
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  620. excess
    the state of being more than full
    That excess heat could cause mechanical components to warp, producing stresses that should be inputs to the mechanical models.
  621. classic
    of recognized authority or excellence
    Sidebar: Optimization in the real world

    Optimization is a classic problem that has been studied by Newton and Gauss all the way up to mathematicians and engineers in the present day.
  622. trap
    a device in which something can be caught and penned
    It is easy to stumble into the trap of thinking that since data exists somewhere abstract, on a spreadsheet or in the cloud, that data products are just abstract algorithms.
  623. introduce
    bring something new to an environment
    We introduced the Drivetrain Approach to provide a framework for designing the next generation of great data products and described how it relies at its heart on optimization.
  624. elaborate
    marked by complexity and richness of detail
    What is particularly interesting is that there was no need to build an elaborate new data collection system.
  625. test
    standardized procedure for measuring sensitivity or aptitude
    We can build a

    Simulator to test the utility of each of the many possible books we have in stock, or perhaps just over all the outputs of a collaborative filtering model of similar customer purchases, and then build a simple

    Optimizer that ranks and displays the recommended books based on their simulated utility.
  626. consumption
    the act of using something up
    Nest is designing smart thermostats that learn the home-owner’s temperature preferences and then optimizes their energy consumption.
  627. good
    having desirable or positive qualities
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  628. make
    perform or carry out
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  629. evolution
    sequence of events involved in the development of a species
    There are many techniques to avoid this problem, some based on statistics and spreading our bets widely, and others based on systems seen in nature, like biological evolution or the cooling of atoms in glass.
  630. elegant
    refined and tasteful in appearance, behavior, or style
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  631. next
    immediately following in time or order
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  632. target
    a reference point to shoot at
    We could construct a patience model for the customers’ tolerance for poorly targeted communications: When do they tune them out and filter our messages straight to spam?
  633. future
    the time yet to come
    She cut through the chaff of the obvious to make a recommendation that will send the customer home with a new book, and returning to Strand again and again in the future.
  634. take up
    turn one's interest to
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  635. like
    having the same or similar characteristics
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  636. are
    a unit of surface area equal to 100 square meters
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  637. crash
    break violently or noisily
    They can also explore how the distribution of profit is shaped by the inputs outside of the insurer’s control: “What if the economy crashes and the customer loses his job?
  638. sweep
    clean by using a broom or as if with a broom
    The wing box includes the design

    levers like span, taper ratio and sweep.
  639. reaction
    an idea evoked by some experience
    They considered what additional

    data they would need to predict a customer’s reaction to changes in price.
  640. desirable
    worth having or seeking or achieving
    We hope to see data scientists ship products that are designed to produce desirable business outcomes.
  641. descent
    a movement downward
    Optimization is a process we are all familiar with in our daily lives, even if we have never used algorithms like gradient descent or simulated annealing.
  642. fetch
    go or come after and bring or take back
    The Strand bookseller made a brilliant but far-fetched recommendation probably based more on the character of Morrison’s writing than superficial similarities between Morrison and other authors.
  643. plus
    on the positive side or higher end of a scale
    The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
  644. unit
    a single undivided whole
    Instead of the femme-bot voice of the GPS unit telling us which route to take and where to turn, what would it take to build a car that would make those decisions by itself?
  645. inspiration
    arousal of the mind to unusual activity or creativity
    The inspiration for the phrase “Drivetrain Approach,” for example, is already on the streets of Mountain View .
  646. service
    an act of help or assistance
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  647. important
    significant in effect or meaning
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  648. set in
    enter a particular state
    Note here the different levels: models of individual components, tied together in a simulation given a set of inputs, iterated through over different input sets in a search optimizer.”
  649. curtain
    hanging cloth used as a blind (especially for a window)
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  650. share
    assets belonging to an individual person or group
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  651. register
    an official written record of names or events
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  652. speed
    a rate at which something happens
    We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map.
  653. affect
    have an influence upon
    The next machine on the assembly line is a

    Simulator , which lets ODG ask the “what if” questions to see how the levers affect the distribution of the final outcome.
  654. computer
    a machine for performing calculations automatically
    We will show a systematic approach to step 4 that doesn’t require a PhD in computer science.
  655. surprising
    causing surprise or wonder or amazement
    The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with books he or she would not have purchased without the recommendation .
  656. well-known
    widely or fully known
    On Amazon, the top results for a similar query leads to another book by Toni Morrison and several books by well-known female authors of color.
  657. frontier
    a wilderness at the edge of a settled area of a country
    Jeremy Howard examined these questions in his Strata CA 12 session, “ From Predictive Modelling to Optimization: The Next Frontier .”
  658. sell
    exchange or deliver for money or its equivalent
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  659. explain
    make plain and comprehensible
    One of the authors of this paper was explaining an iterative optimization technique, and the host says, “So, in a sense Jeremy, your approach was like that of doing a startup, which is just get something out there and iterate and iterate and iterate.”
  660. charge
    assign a duty, responsibility or obligation to
    Insurers have centuries of experience in prediction, but as recently as 10 years ago, the insurance companies often failed to make optimal business decisions about what price to charge each new customer.
  661. choice
    the act of selecting
    What choice are we actually helping him or her make?
  662. quality
    an essential and distinguishing attribute of something
    Industrial engineers were among the first to begin using neural networks, applying them to problems like the optimal design of assembly lines and quality control.
  663. different
    unlike in nature, quality, form, or degree
    There are many different optimization techniques to choose from (see see sidebar, below ), but it is a well-understood field with robust and accessible solutions.
  664. specific
    stated explicitly or in detail
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  665. motor
    machine that creates mechanical energy and imparts movement
    For motor vehicle traffic, IBM performed a project with the city of Stockholm to optimize traffic flows that reduced congestion by nearly a quarter, and increased the air quality in the inner city by 25%.
  666. aware
    having or showing knowledge or understanding or realization
    Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “ Discworld series :”

    All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books.
  667. business
    the principal activity in one's life to earn money
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  668. New York City
    the largest city in New York State and in the United States
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  669. load
    weight to be borne or conveyed
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  670. completely
    with everything necessary
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  671. analysis
    abstract separation of something into its various parts
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  672. physical
    involving the body as distinguished from the mind or spirit
    The

    data is in the wing materials’ physical properties; costs are listed in another tab of the application.
  673. shared
    have in common; held or experienced in common
    We don’t know what design approaches will be developed in the future, but right now, there is a need for the data science community to coalesce around a shared vocabulary and product design process that can be used to educate others on how to derive value from their predictive models.
  674. used
    previously owned by another
    The third step was to consider what new

    data they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose.
  675. induced
    brought about or caused; not spontaneous
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  676. medium
    the surrounding environment
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  677. bet
    stake on the outcome of an issue
    There are many techniques to avoid this problem, some based on statistics and spreading our bets widely, and others based on systems seen in nature, like biological evolution or the cooling of atoms in glass.
  678. condition
    a mode of being or form of existence of a person or thing
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  679. bigger
    large or big relative to something else
    But why not think bigger?
  680. dealing
    method or manner of conduct in relation to others
    Irfan Ahmed of CloudPhysics provides a good taxonomy of predictive modeling that describes this entire assembly line process:

    “When dealing with hundreds or thousands of individual components models to understand the behavior of the full-system, a ‘search’ has to be done.
  681. look for
    try to locate or discover, or try to establish the existence of
    What is most important about these examples is that the engineers who designed these data products didn’t start by building a neato robot and then looking for something to do with it.
  682. get it
    understand, usually after some initial difficulty
    Amazon’s recommendation engine is probably the best one out there, but it’s easy to get it to show its warts.
  683. lead
    take somebody somewhere
    On Amazon, the top results for a similar query leads to another book by Toni Morrison and several books by well-known female authors of color.
  684. rejected
    rebuffed (by a lover) without warning
    It will be low in cases where the algorithm recommends a familiar book that the customer has already rejected (both components are small) or a book that he or she would have bought even without the recommendation (both components are large and cancel each other out).
  685. selection
    the act of choosing
    Only then does the customer get to browse a recommended selection of Zafu’s inventory.
  686. temperature
    the degree of hotness or coldness of a body or environment
    Nest is designing smart thermostats that learn the home-owner’s temperature preferences and then optimizes their energy consumption.
  687. clearly
    without doubt or question
    The final curve has a clearly identifiable local maximum that represents the best price to charge a customer for the first year.
  688. etc.
    continuing in the same way
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  689. tie
    fasten or secure with a rope, string, or cord
    Note here the different levels: models of individual components, tied together in a simulation given a set of inputs, iterated through over different input sets in a search optimizer.”
  690. go into
    to come or go into
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  691. give
    transfer possession of something concrete or abstract
    The first component of ODG’s Modeler was a model of price elasticity (the probability that a customer will accept a given price) for new policies and for renewals.
  692. community
    a group of people living in a particular local area
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  693. given
    acknowledged as a supposition
    The first component of ODG’s Modeler was a model of price elasticity (the probability that a customer will accept a given price) for new policies and for renewals.
  694. each
    separately for every person or thing
    Insurers have centuries of experience in prediction, but as recently as 10 years ago, the insurance companies often failed to make optimal business decisions about what price to charge each new customer.
  695. trying
    hard to endure
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  696. balance
    harmonious arrangement or relation of parts within a whole
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  697. taking
    the act of someone who picks up or takes something
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  698. have
    possess, either in a concrete or an abstract sense
    To jump-start this process, we suggest a four-step approach that has already transformed the insurance industry.
  699. beginning
    the act of starting something
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  700. thinking
    endowed with the capacity to reason
    Only after these first three steps do we begin thinking about building the predictive

    models .
  701. heat
    a form of energy transferred by a difference in temperature
    For example, resistance in the electrical system produces heat, which needs to be included as an input for the thermal diffusion and cooling model.
  702. finding
    something that is discovered
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  703. smart
    characterized by quickness and ease in learning
    Nest is designing smart thermostats that learn the home-owner’s temperature preferences and then optimizes their energy consumption.
  704. San Francisco
    a port in western California near the Golden Gate that is one of the major industrial and transportation centers; it has one of the world's finest harbors; site of the Golden Gate Bridge
    Suppose we wanted to get from San Francisco to the Strata 2012 Conference in Santa Clara .
  705. vital
    performing an essential function in the living body
    As predictive modeling and optimization become more vital to a wide variety of activities, look out for the engineers to disrupt industries that wouldn’t immediately appear to be in the data business.
  706. helping
    an individual quantity of food or drink taken as part of a meal
    What choice are we actually helping him or her make?
  707. mike
    a device for changing sound waves into electrical energy
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  708. fashion
    the latest and most admired style in clothes or behavior
    Online fashion retailer Zafu shows how to encourage the customer to participate in this collection process.
  709. level
    a relative position or degree of value in a graded group
    Because the simulation is at a per-policy level, the insurer can view the impact of a given set of price changes on revenue, market share, and other metrics over time.
  710. network
    an open fabric woven together at regular intervals
    Industrial engineers were among the first to begin using neural networks, applying them to problems like the optimal design of assembly lines and quality control.
  711. outside
    the region that is outside of something
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  712. nest
    a structure in which animals lay eggs or give birth to their young
    Nest is designing smart thermostats that learn the home-owner’s temperature preferences and then optimizes their energy consumption.
  713. 100
    ten 10s
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  714. get
    come into the possession of something concrete or abstract
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  715. looking for
    the act of searching visually
    What is most important about these examples is that the engineers who designed these data products didn’t start by building a neato robot and then looking for something to do with it.
  716. widely
    to a great degree
    There are many techniques to avoid this problem, some based on statistics and spreading our bets widely, and others based on systems seen in nature, like biological evolution or the cooling of atoms in glass.
  717. limit
    as far as something can go
    We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map.
  718. come up
    move upward
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  719. damage
    the occurrence of a change for the worse
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  720. bottom
    the lower side of anything
    From there, they developed an optimized pricing process that added hundreds of millions of dollars to the insurers’ bottom lines.
  721. stuck
    caught or fixed
    The danger in this hill-climbing approach is that if the steps are too small, we may get stuck at one of the many local maxima in the foothills, which will not tell us the best set of controllable inputs.
  722. discipline
    a system of rules of conduct or method of practice
    Although it’s from a completely different engineering discipline, this diagram is very similar to the Drivetrain Approach we’ve recommended for data products.
  723. wire
    ligament made of metal and used to fasten things or make cages or fences etc
    As one engineer on the Google self-driving car project put it in a recent Wired article , “We’re analyzing and predicting the world 20 times a second.”
  724. horizon
    the line at which the sky and Earth appear to meet
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  725. box
    a (usually rectangular) container; may have a lid
    The wing box includes the design

    levers like span, taper ratio and sweep.
  726. wheel
    a simple machine consisting of a circular frame with spokes (or a solid disc) that can rotate on a shaft or axle (as in vehicles or other machines)
    The levers are the vehicle controls we are all familiar with: steering wheel, accelerator, brakes, etc.
  727. real
    being or occurring in fact or actuality
    While the insurers were reluctant to conduct these experiments on real customers, as they’d certainly lose some customers as a result, they were swayed by the huge gains that optimized policy pricing might deliver.
  728. Page
    English industrialist who pioneered in the design and manufacture of aircraft (1885-1962)
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  729. but then
    (contrastive) from another point of view
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  730. phrase
    an expression consisting of one or more words
    The inspiration for the phrase “Drivetrain Approach,” for example, is already on the streets of Mountain View .
  731. rescue
    free from harm or evil
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  732. scarce
    deficient in quantity or number compared with the demand
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  733. gather
    assemble or get together
    Next, we consider what

    data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians).
  734. changing
    marked by continuous modification or effective action
    It was necessary to build this dataset by randomly changing the prices of hundreds of thousands of policies over many months.
  735. tendency
    an inclination to do something
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  736. conduct
    the way a person behaves toward other people
    While the insurers were reluctant to conduct these experiments on real customers, as they’d certainly lose some customers as a result, they were swayed by the huge gains that optimized policy pricing might deliver.
  737. fit
    meeting adequate standards for a purpose
    Zafu’s approach is not to send their customers directly to the clothes, but to begin by asking a series of simple questions about the customers’ body type, how well their other jeans fit, and their fashion preferences.
  738. revenue
    the entire amount of income before any deductions are made
    Because the simulation is at a per-policy level, the insurer can view the impact of a given set of price changes on revenue, market share, and other metrics over time.
  739. flood
    the rising of a body of water and its overflowing onto land
    What if a 100-year flood hits his home?
  740. article
    one of a class of artifacts
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  741. clothing
    a covering designed to be worn on a person's body
    Plenty of websites sell designer denim, but for many women, high-end jeans are the one item of clothing they never buy online because it’s hard to find the right pair without trying them on.
  742. particularly
    to a distinctly greater extent or degree than is common
    What is particularly interesting is that there was no need to build an elaborate new data collection system.
  743. valley
    a long depression in the surface of the land
    In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters .
  744. perform
    get done
    For motor vehicle traffic, IBM performed a project with the city of Stockholm to optimize traffic flows that reduced congestion by nearly a quarter, and increased the air quality in the inner city by 25%.
  745. hit
    deal a blow to, either with the hand or with an instrument
    What if a 100-year flood hits his home?
  746. originally
    with reference to the origin or beginning
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  747. allow
    make it possible for something to happen
    These models predicted whether customers would renew their policies in one year, allowing for changes in price and willingness to jump to a competitor.
  748. individual
    being or characteristic of a single thing or person
    Irfan Ahmed of CloudPhysics provides a good taxonomy of predictive modeling that describes this entire assembly line process:

    “When dealing with hundreds or thousands of individual components models to understand the behavior of the full-system, a ‘search’ has to be done.
  749. feature
    a prominent attribute or aspect of something
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  750. interesting
    catching or holding your attention
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  751. competition
    the act of contending with others for rewards or resources
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  752. action
    something done (usually as opposed to something said)
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  753. inside
    relating to or being on the side closer to the center or within a defined space
    A look inside the Modeler.
  754. column
    a line of units following one after another
    A company like Amazon represents every purchase that has ever been made as a giant sparse matrix, with customers as the rows and products as the columns.
  755. inspired
    of surpassing excellence
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  756. as well
    in addition
    Zafu can tailor their recommendations to fit as well as their jeans because their system is asking the right questions.
  757. another
    an additional or different one
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  758. existing
    having being or actuality
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  759. try
    make an effort or attempt
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  760. significant
    rich in implication
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  761. host
    a person who invites guests to a social event
    One of the authors of this paper was explaining an iterative optimization technique, and the host says, “So, in a sense Jeremy, your approach was like that of doing a startup, which is just get something out there and iterate and iterate and iterate.”
  762. platform
    a raised horizontal surface
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  763. contain
    hold or have within
    One way to escape the recommendation bubble would be to build a

    Modeler containing two models for purchase probabilities, conditional on seeing or not seeing a recommendation.
  764. map
    a diagrammatic representation of the earth's surface
    We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map.
  765. instead
    in place of, or as an alternative to
    Instead, let’s design an improved recommendation engine using the Drivetrain Approach, starting by reconsidering our

    objective .
  766. crowd
    a large number of things or people considered together
    Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers.
  767. extra
    more than is needed, desired, or required
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  768. effect
    a phenomenon that is caused by some previous phenomenon
    We will show how to go about building an optimized marketing strategy that mitigates these effects.
  769. lift
    raise from a lower to a higher position
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  770. hundred
    ten 10s
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  771. local
    of or belonging to or characteristic of a particular area
    The final curve has a clearly identifiable local maximum that represents the best price to charge a customer for the first year.
  772. city
    a large and densely populated urban area
    He went into Strand bookstore in New York City and asked for a book similar to Toni Morrison’s “Beloved.”
  773. quite a
    of an unusually noticeable or exceptional or remarkable kind
    Quite a few.
  774. critical
    of a serious examination and judgment of something
    All of these systems have critical interactions.
  775. detail
    a small part considered separately from the whole
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  776. element
    a substance that cannot be separated into simpler substances
    Data science is beginning to pervade even the most bricks-and-mortar elements of our lives.
  777. just
    and nothing more
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  778. in general
    without distinction of one from others
    In general, when choosing an objective function to optimize, we need less emphasis on the “function” and more on the “objective.”
  779. escape
    run away from confinement
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  780. effective
    producing or capable of producing an intended result
    These outcomes can be fed to an

    Optimizer to build a functioning and cost-effective airplane wing.
  781. inner
    located inward
    For motor vehicle traffic, IBM performed a project with the city of Stockholm to optimize traffic flows that reduced congestion by nearly a quarter, and increased the air quality in the inner city by 25%.
  782. top
    the upper part of anything
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  783. let
    actively cause something to happen
    The next machine on the assembly line is a

    Simulator , which lets ODG ask the “what if” questions to see how the levers affect the distribution of the final outcome.
  784. resistance
    any mechanical force that tends to slow or oppose motion
    For example, resistance in the electrical system produces heat, which needs to be included as an input for the thermal diffusion and cooling model.
  785. sign
    a visible clue that something has happened or is present
    Next, we consider what

    data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians).
  786. video
    broadcasting visual images of stationary or moving objects
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  787. offer
    present for acceptance or rejection
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  788. Mark
    Apostle and companion of Saint Peter
    What we would really like to do is emulate the experience of Mark Johnson, CEO of Zite , who gave a perfect example of what a customer’s recommendation experience should be like in a recent TOC talk .
  789. more
    greater in size or amount or extent or degree
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  790. studied
    produced or marked by conscious design or premeditation
    Sidebar: Optimization in the real world

    Optimization is a classic problem that has been studied by Newton and Gauss all the way up to mathematicians and engineers in the present day.
  791. find out
    find out, learn, or determine with certainty, usually by making an inquiry or other effort
    Once we have these models, we construct a

    Simulator and an

    Optimizer and run them over the combined models to find out what recommendations will achieve our objectives: driving sales and improving the customer experience.
  792. sentiment
    a personal belief or judgment
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  793. dollar
    the basic monetary unit in many countries
    From there, they developed an optimized pricing process that added hundreds of millions of dollars to the insurers’ bottom lines.
  794. industrial
    of or relating to commercial enterprise
    Industrial engineers were among the first to begin using neural networks, applying them to problems like the optimal design of assembly lines and quality control.
  795. enter
    to come or go into
    We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.
  796. entering
    the act of entering
    We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.
  797. see
    perceive by sight or have the power to perceive by sight
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  798. collected
    brought together in one place
    New

    data must also be collected to generate recommendations that will cause new sales.
  799. property
    something owned
    The

    data is in the wing materials’ physical properties; costs are listed in another tab of the application.
  800. hill
    a local and well-defined elevation of the land
    The danger in this hill-climbing approach is that if the steps are too small, we may get stuck at one of the many local maxima in the foothills, which will not tell us the best set of controllable inputs.
  801. structure
    a complex entity made of many parts
    There is a

    Modeler for aerodynamics and mechanical structure that can then be fed to a

    Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag.
  802. maintain
    keep in a certain state, position, or activity
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  803. necessary
    absolutely essential
    It was necessary to build this dataset by randomly changing the prices of hundreds of thousands of policies over many months.
  804. offering
    something put forward for acceptance
    We could add a price elasticity model to test how offering a discount might change the probability that the customer will buy the item.
  805. email
    (computer science) a system of world-wide electronic communication in which a computer user can compose a message at one terminal that can be regenerated at the recipient's terminal when the recipient logs in
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  806. net
    an open fabric of string or rope or wire woven together
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  807. live in
    live in the house where one works
    As scientists and engineers become more adept at applying prediction and optimization to everyday problems, they are expanding the art of the possible, optimizing everything from our personal health to the houses and cities we live in.
  808. possibility
    capability of existing or happening or being true
    The self-driving car needs to take the next step: after

    simulating all the possibilities, it must

    optimize the results of the simulation to pick the best combination of acceleration and braking, steering and signaling, to get us safely to Santa Clara.
  809. thousand
    the cardinal number that is the product of 10 and 100
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  810. novel
    an extended fictional work in prose
    For example, a pair of jeans that is often paired with a particular top, or the first part of a series of novels that often leads to a sale of the whole set.
  811. increasing
    becoming greater or larger
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  812. think of
    devise or invent
    I think of this as a complicated machine (full-system) where the curtain is withdrawn and you get to model each significant part of the machine under controlled experiments and then simulate the interactions.
  813. part
    one of the portions into which something is regarded as divided and which together constitute a whole
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  814. teach
    impart skills or knowledge to
    In the future, we hope to see optimization taught in business schools as well as in statistics departments.
  815. point
    a distinguishing or individuating characteristic
    Engineers start by defining a clear

    objective : They want a car to drive safely from point A to point B without human intervention.
  816. patience
    good-natured tolerance of delay or incompetence
    We could construct a patience model for the customers’ tolerance for poorly targeted communications: When do they tune them out and filter our messages straight to spam?
  817. elements
    violent or severe weather
    Data science is beginning to pervade even the most bricks-and-mortar elements of our lives.
  818. answer
    a statement made to reply to a question or criticism
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  819. pull
    apply force so as to cause motion towards the source of the motion
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  820. dawn
    the first light of day
    This is still the dawn of data science.
  821. considered
    carefully weighed
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  822. interest
    a sense of concern with and curiosity about something
    Prediction technology can be interesting and mathematically elegant, but we need to take the next step.
  823. tiny
    very small
    The takeaway, whether you are a tiny startup or a giant insurance company, is that we unconsciously use optimization whenever we decide how to get to where we want to go.
  824. danger
    the condition of being susceptible to harm or injury
    The danger in this hill-climbing approach is that if the steps are too small, we may get stuck at one of the many local maxima in the foothills, which will not tell us the best set of controllable inputs.
  825. second
    coming next after the first in position in space or time
    Once we have specified the goal, the second step is to specify what inputs of the system we can control, the

    levers we can pull to influence the final outcome.
  826. case
    an occurrence of something
    In Google’s case, they could control the ranking of the search results.
  827. activity
    any specific behavior
    As predictive modeling and optimization become more vital to a wide variety of activities, look out for the engineers to disrupt industries that wouldn’t immediately appear to be in the data business.
  828. appear
    come into sight or view
    As predictive modeling and optimization become more vital to a wide variety of activities, look out for the engineers to disrupt industries that wouldn’t immediately appear to be in the data business.
  829. then
    at that time
    Then, Google came along and transformed online search by beginning with a simple question: What is the user’s main objective in typing in a search query?
  830. tied
    bound or secured closely
    Note here the different levels: models of individual components, tied together in a simulation given a set of inputs, iterated through over different input sets in a search optimizer.”
  831. shape
    a perceptual structure
    They can also explore how the distribution of profit is shaped by the inputs outside of the insurer’s control: “What if the economy crashes and the customer loses his job?
  832. information
    knowledge acquired through study or experience
    The third step was to consider what new

    data they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose.
  833. available
    obtainable or accessible and ready for use or service
    Our objective and available levers, what data we already have and what additional data we will need to collect, determine the models we can build.
  834. over
    beyond the top or upper surface or edge
    Optimizing for an actionable outcome over the right predictive models can be a company’s most important strategic decision.
  835. also
    in addition
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  836. included
    enclosed in the same envelope or package
    For example, resistance in the electrical system produces heat, which needs to be included as an input for the thermal diffusion and cooling model.
  837. annual
    occurring every year
    These additional models allow the annual models to be combined to predict profit from a new customer over the next five years.
  838. generation
    group of genetically related organisms in a line of descent
    We introduced the Drivetrain Approach to provide a framework for designing the next generation of great data products and described how it relies at its heart on optimization.
  839. think
    judge or regard; look upon; judge
    Only after these first three steps do we begin thinking about building the predictive

    models .
  840. key
    metal device that allows a lock's mechanism to be rotated
    A purchase sequence causality model can be used to identify key “entry products.”
  841. row
    an arrangement of objects or people side by side in a line
    A company like Amazon represents every purchase that has ever been made as a giant sparse matrix, with customers as the rows and products as the columns.
  842. seeing
    having vision, not blind
    One way to escape the recommendation bubble would be to build a

    Modeler containing two models for purchase probabilities, conditional on seeing or not seeing a recommendation.
  843. history
    a record or narrative description of past events
    Step 4 of the Drivetrain Approach for Google is now part of tech history: Larry Page and Sergey Brin invented the graph traversal algorithm PageRank and built an engine on top of it that revolutionized search.
  844. well
    in a good or satisfactory manner or to a high standard
    There are many different optimization techniques to choose from (see see sidebar, below ), but it is a well-understood field with robust and accessible solutions.
  845. turn
    move around an axis or a center
    Instead of the femme-bot voice of the GPS unit telling us which route to take and where to turn, what would it take to build a car that would make those decisions by itself?
  846. largely
    mainly or chiefly
    Brian Ripley’s seminal book on pattern recognition gives credit for many ideas and techniques to largely forgotten engineering papers from the 1970s.
  847. study
    applying the mind to learning and understanding a subject
    But those models did not solve the pricing problem, so the insurance companies would set a price based on a combination of guesswork and market studies.
  848. high
    being at or having a relatively great or specific elevation
    This curve moves from almost certain acceptance at very low prices to almost never at high prices.
  849. cloud
    a visible mass of water or ice particles suspended at a considerable altitude
    It is easy to stumble into the trap of thinking that since data exists somewhere abstract, on a spreadsheet or in the cloud, that data products are just abstract algorithms.
  850. easy
    posing no difficulty; requiring little effort
    Amazon’s recommendation engine is probably the best one out there, but it’s easy to get it to show its warts.
  851. figure
    alternate name for the body of a human being
    Multiplying these two curves creates a final curve that shows price versus expected profit (see Expected Profit figure, below).
  852. latest
    up to the immediate present; most recent or most up-to-date
    Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “ Discworld series :”

    All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books.
  853. sport
    active diversion requiring physical exertion and competition
    This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events.
  854. text
    the words of something written
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  855. variety
    a category of things distinguished by a common quality
    As predictive modeling and optimization become more vital to a wide variety of activities, look out for the engineers to disrupt industries that wouldn’t immediately appear to be in the data business.
  856. somewhere
    in or at or to some place
    It is easy to stumble into the trap of thinking that since data exists somewhere abstract, on a spreadsheet or in the cloud, that data products are just abstract algorithms.
  857. energy
    forceful exertion
    Nest is designing smart thermostats that learn the home-owner’s temperature preferences and then optimizes their energy consumption.
  858. gas
    state of matter distinguished from solid and liquid states
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  859. image
    a visual representation produced on a surface
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  860. section
    one of several parts or pieces that fit with others
    By Jeremy Howard , Margit Zwemer and Mike Loukides

    Sections

    Download this free report

    In the past few years, we’ve seen many data products based on predictive modeling.
  861. red
    the chromatic color resembling the hue of blood
    The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
  862. setting
    the physical position of something
    They began by defining the

    objective that the insurance company was trying to achieve: setting a price that maximizes the net-present value of the profit from a new customer over a multi-year time horizon, subject to certain constraints such as maintaining market share.
  863. actual
    existing in fact
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  864. fill
    make full, also in a metaphorical sense
    Once they have the data in this format, data scientists apply some form of collaborative filtering to “fill in the matrix.”
  865. making
    the act that results in something coming to be
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  866. returning
    tending to be turned back
    She cut through the chaff of the obvious to make a recommendation that will send the customer home with a new book, and returning to Strand again and again in the future.
  867. vision
    the ability to see
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  868. field
    extensive tract of level open land
    We call it the

    Drivetrain Approach , inspired by the emerging field of self-driving vehicles.
  869. fail
    be unable
    Insurers have centuries of experience in prediction, but as recently as 10 years ago, the insurance companies often failed to make optimal business decisions about what price to charge each new customer.
  870. revolution
    a single complete turn
    Join the Data Revolution.
  871. but
    and nothing more
    But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction.
  872. small
    limited or below average in number or quantity or magnitude
    Many optimization procedures are iterative; they can be thought of as taking a small step, checking our elevation and then taking another small uphill step until we reach a point from which there is no direction in which we can climb any higher.
  873. brilliant
    full of light; shining intensely
    The Strand bookseller made a brilliant but far-fetched recommendation probably based more on the character of Morrison’s writing than superficial similarities between Morrison and other authors.
  874. if not
    perhaps
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  875. separate
    standing apart; not attached to or supported by anything
    There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc.
  876. connection
    a relation between things or events
    This is not to say that Amazon’s recommendation engine could not have made the same connection; the problem is that this helpful recommendation will be buried far down in the recommendation feed, beneath books that have more obvious similarities to “Beloved.”
  877. form
    a perceptual structure
    We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.
  878. paper
    a material made of cellulose pulp derived mainly from wood or rags or certain grasses
    One of the authors of this paper was explaining an iterative optimization technique, and the host says, “So, in a sense Jeremy, your approach was like that of doing a startup, which is just get something out there and iterate and iterate and iterate.”
  879. become
    come into existence
    Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing.
  880. mine
    excavation from which ores and minerals are extracted
    A great image for optimization in the real world comes up in a recent TechZing podcast with the co-founders of data-mining competition platform Kaggle .
  881. job
    a specific piece of work required to be done as a duty
    They can also explore how the distribution of profit is shaped by the inputs outside of the insurer’s control: “What if the economy crashes and the customer loses his job?
  882. spend
    pass time in a specific way
    ]

    ODG identified which

    levers the insurance company could control: what price to charge each customer, what types of accidents to cover, how much to spend on marketing and customer service, and how to react to their competitors’ pricing decisions.
  883. cool
    neither warm nor very cold; giving relief from heat
    There are plenty of cool challenges in building these models, but by themselves, they do not take us to our destination.
  884. conference
    a prearranged meeting for consultation or discussion
    Suppose we wanted to get from San Francisco to the Strata 2012 Conference in Santa Clara .
  885. slightly
    to a small degree or extent
    The Modeler takes the raw data and converts it into slightly more refined predicted data.
  886. go to
    be present at (meetings, church services, university), etc.
    There may be some unexpected recommendations on pages 2 through 14 of the feed, but how many customers are going to bother clicking through?
  887. Friday
    the sixth day of the week; the fifth working day
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  888. note
    a brief written record
    [ Note: Co-author Jeremy Howard founded ODG.
  889. ship
    a vessel that carries passengers or freight
    ODG’s competitors use different techniques to find an optimal price, but they are shipping the same over-all data product.
  890. active
    characterized by energetic movement
    In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters .
  891. Monday
    the second day of the week; the first working day
    Full video from that session is embedded below:

    Related:

    © 2012, O'Reilly Media, Inc.

    (800) 889-8969 or (707) 827-7019 Monday-Friday 7:30am-5pm PT

    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  892. gain
    obtain
    While the insurers were reluctant to conduct these experiments on real customers, as they’d certainly lose some customers as a result, they were swayed by the huge gains that optimized policy pricing might deliver.
  893. practical
    guided by experience and observation rather than theory
    ODG approached this problem with an early use of the Drivetrain Approach and a practical take on step 4 that can be applied to a wide range of problems.
  894. same
    same in identity
    ODG’s competitors use different techniques to find an optimal price, but they are shipping the same over-all data product.
  895. store
    a mercantile establishment for the sale of goods or services
    This encompasses all the interactions that a retailer has with its customers outside of the actual buy-sell transaction, whether making a product recommendation, encouraging the customer to check out a new feature of the online store, or sending sales promotions.
  896. arrival
    the act of coming to a certain place
    We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map.
  897. right
    free from error; especially conforming to fact or truth
    Optimizing for an actionable outcome over the right predictive models can be a company’s most important strategic decision.
  898. estate
    extensive landed property retained by the owner
    Making the wrong choices comes at a cost to the retailer in the form of reduced margins (discounts that do not drive extra sales), opportunity costs for the scarce real-estate on their homepage (taking up space in the recommendation feed with products the customer doesn’t like or would have bought without a recommendation) or the customer tuning out (sending so many unhelpful email promotions that the customer filters all future communications as spam).
  899. make it
    succeed in a big way; get to the top
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  900. flight
    an instance of traveling by air
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  901. not
    negation of a word or group of words
    Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work.
  902. other
    not the same one or ones already mentioned or implied
    Google realized that the objective was to show the most relevant search result; for other companies, it might be increasing profit, improving the customer experience, finding the best path for a robot, or balancing the load in a data center.
  903. farm
    workplace or land used for growing crops or raising animals
    These firms have plenty of experience building models of each of the components and systems in their final product, whether they’re building a server farm or a fighter jet.
  904. raise
    move upwards
    The operator can adjust the input levers to answer specific questions like, “What will happen if our company offers the customer a low teaser price in year one but then raises the premiums in year two?”
  905. help
    give assistance; be of service
    What choice are we actually helping him or her make?
  906. showing
    the display of a motion picture
    So, we would like to conclude by showing you how objective-based data products are already a part of the tangible world.
  907. color
    a visual attribute of things from the light they emit
    On Amazon, the top results for a similar query leads to another book by Toni Morrison and several books by well-known female authors of color.
  908. recently
    in the recent past
    Insurers have centuries of experience in prediction, but as recently as 10 years ago, the insurance companies often failed to make optimal business decisions about what price to charge each new customer.
  909. driven
    compelled forcibly by an outside agency
    Instead of being data driven, we can now let the data drive us.
  910. bridge
    structure allowing passage across a river or other obstacle
    These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a

    Simulator ) and then pick the shortest one (an

    Optimizer ) subject to constraints like avoiding bridge tolls or maximizing gas mileage.
  911. natural
    relating to or concerning the physical world
    They also considered inputs outside of their control, like competitors’ strategies, macroeconomic conditions, natural disasters, and customer “stickiness.”
  912. method
    a way of doing something, especially a systematic way
    We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision.
  913. reader
    a person who can read; a literate person
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  914. area
    the extent of a two-dimensional surface within a boundary
    In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters .
  915. join
    cause to become joined or linked
    Join the Data Revolution.
  916. Johnson
    36th President of the United States
    What we would really like to do is emulate the experience of Mark Johnson, CEO of Zite , who gave a perfect example of what a customer’s recommendation experience should be like in a recent TOC talk .
  917. interested
    showing curiosity or fascination or concern
    While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 of the search results.
  918. about
    (of quantities) imprecise but fairly close to correct
    Only after these first three steps do we begin thinking about building the predictive

    models .
  919. term
    a limited period of time during which something lasts
    The objective is to escape a recommendation filter bubble , a term which was originally coined by Eli Pariser to describe the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
  920. extent
    the point or degree to which something extends
    Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses.
  921. there
    in or at that place
    From there, they developed an optimized pricing process that added hundreds of millions of dollars to the insurers’ bottom lines.
  922. delight
    a feeling of extreme pleasure or satisfaction
    The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with books he or she would not have purchased without the recommendation .
  923. event
    something that happens at a given place and time
    This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events.
  924. times
    a more or less definite period of time now or previously present
    These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself.
  925. probably
    with considerable certainty; without much doubt
    Amazon’s recommendation engine is probably the best one out there, but it’s easy to get it to show its warts.
  926. look
    perceive with attention; direct one's gaze towards
    Let’s look at how we could apply this process to another industry: marketing.
  927. task
    any piece of work that is undertaken or attempted
    They started with an objective like, “I want my car to drive me places,” and then designed a covert data product to accomplish that task.
Created on Thu Oct 25 19:15:24 EDT 2012

Sign up now (it’s free!)

Whether you’re a teacher or a learner, Vocabulary.com can put you or your class on the path to systematic vocabulary improvement.