#SparkAISummit keynote on Databricks' Next Generation Data Science Workspace. The demo’s a great forecasting example using Facebooks's Prophet and Databricks’ Koalas https://youtu.be/HsfMmBfQtvI
1. Introduction 1.1 Background "UK restaurant market facing fastest decline in seven years" A headline from last year[i] prior to the coronavirus. MCA’s UK Restaurant Market Report 2019[ii] indicated that "large falls in the sales value and outlet volumes of independent restaurants is the cause of the overall decline of the UK restaurant market. It attributes this to a “perfect storm” of rising costs, over-supply, and weakening consumer demand." London's restaurant scene changes week on week, with openings and closures happening on a regular basis; it must be hard to keep up. The hyper-competitiveness of London's restaurant scene make it one of the toughest cities in the world to launch a new venture. "With business rates up and footfall down, a winning formula is worth its weight in gold and although first-rate food is inevitably the focus, other factors can also affect a restaurant's success. Atmosphere is frequently cited in customer surveys as second only to food in an enjoyable restaurant visit and getting the vibe right is crucial."[iii] Due to the coronavirus most businesses have suffered even greater losses. As restrictions lift businesses will be looking for ways to make up for lost time and earnings. Reopening a restaurant once lockdown is over is one thing, but knowing what to put on the menu if you haven't been in contact with a punter in months is another. "Are there any grounds for hope? A wild optimist might point to some encouraging data about the overperformance of small chains while everyone else loses their shirts; a realist might make coughing noises about small sample sizes and growth from a low base. The queues snaking out of Soho’s recently opened Pastaio suggest one genuinely viable route to salvation – concepts may need to follow its lead and amp up the comfort food factor while dialling down prices. And while home delivery is a source of confidence for some parties (Deliveroo, for instance, recently listed its shares on the stock market) it may well end up a false friend: the increased volume of so-called “dark kitchens” presage a sinister vision of the future, where restaurants don’t exist to serve customers onsite at all, but just pump out takeaway meals for us to consume on our sofas. A little far-fetched, perhaps, but with lights going out at a faster rate than many can remember, it can’t be too long before whole tranches of…
Max Welling, former physicist, current VP Technologies at Qualcomm. Max is also a ML researcher affiliated with UC Irvine, CIFAR and the University of Amsterdam. Max has just shared some great insights about the current state of research in ML, and the future direction of the field: “Computations cost energy, and drain phone batteries quickly, so machine learning engineers and chipmakers need to come up with clever ways to reduce the computational cost of running deep learning algorithms. One way this is achieved is by compressing neural networks, or identifying neurons that can be removed with minimal consequences for performance, and another is to reduce the number of bits used to represent each network parameter (sometimes all the way down to one bit!). These strategies tend to be used together, and they’re related in some fairly profound ways.” “Currently, machine learning models are trained on very specific problems (like classifying images into a few hundred categories, or translating from one language to another), and they immediately fail if they’re applied even slightly outside of the domain they were trained for. A computer vision model trained to recognize facial expressions on a dataset featuring people with darker skin will underperform when tested on a different dataset featuring people with lighter skin, for example. Life experience teaches humans that skin tone shouldn’t affect interpretations of facial features, yet this minor difference is enough to throw off even cutting-edge algorithms today.” “So the real challenge is generalizability — something that humans still do much better than machines. But how can we train machine learning algorithms to generalize? Max believes that the answer has to do with the way humans learn: unlike machines, our brains seem to focus on learning physical principles, like “when I take one thing and throw it at another thing, those things bounce off each other.” This reasoning is somewhat independent of what those two things are. By contrast, machines tend to learn in the other direction, reasoning not in terms of universal patterns or laws, but rather in terms of patterns that hold for a very particular problem class.” “For that reason, Max feels that the most promising future areas of progress in machine learning will concentrate on learning logical and physical laws, rather than specific applications of those laws or principles.”Jeremy Harris, Towards Data Science, Jun 3 2020 (https://towardsdatascience.com/the-future-of-machine-learning-cd5b8b6e43cd) Hear the full topic discussion on Spotify: https://open.spotify.com/episode/20flI9imCj9YhW7HVUL92Z?si=glb6JLwzR86KKc6Yc-LvRQ
Audi releases 2.3TB self-driving car dataset. Niceoo! 04/14/20 - Research in machine learning, mobile robotics, and autonomous driving is accelerated by the availability of high quality annotated... — Read on deepai.org/publication/a2d2-audi-autonomous-driving-dataset
In Linear Regression Residual Analysis heteroscedastic results mean that the variance in errors is not consistent (see: Graph 1 and 2), which is what a good linear regression model should show — a good random scattering, showing no particular pattern. This is called, homoscedasticity (see: Graph 3). Graph 1Graph 2Graph 3 If your residual analysis results look like this then the model is not a good fit. To fix this, one could perform a data transform, or add a variable to the model to help account for what is the cause between the relationship of errors and input values. In the example above for Graph 1 and 2, this could be the number of people at a table or the time of day — since larger groups sometimes tip less because they assume everyone else will tip, or people are more generous later in day after some vino in the evening! But remember, "Essentially, all models are wrong, but some are useful." Now that’s what I call statistical bombasticity!