I was walking around a book store in Fremont and found an elementary math education book. While flipping through the book, I found the following neat problem.
Suppose you find a dinosaur skeleton that looks like an alligator, except that it is twice as big. A typical alligator weighs 500 pounds. Estimate how much the dinosaur weighed?
The answer is not 1000 pounds! This is not how things scale. If I double a length, areas increase by 4, and volumes increase 8 times (imagine doubling the side length of a square or cube). In d dimensions, if lengths are scaled by s, volumes will scale by sd.
Weight is proportional to volume, and the dinosaur had 23=8 times the volume as the alligator. Thus the dinosaur weighed 4000 pounds. Another way to see why 1000 pounds is incorrect is to consider the following: if I scale the alligator by 2, not only is the length doubled, but the width and height are also doubled. Three dimensions doubled, which leads to an 8-fold increase in volume.
We can use these ideas to determine the dimension of a dataset. High-dimensional data often lies in a low-dimensional manifold. The dimension of this manifold is the “true” dimension of the data. Imagine I make a ball around one of the data points, and count the number of points in the ball. If I double the radius of the ball, the number of points will increase by 2d, where d is the “true” dimension of the data. We can estimate d with
ˆd=log2(# points in big ball# points in small ball). Written on April 13th, 2018 by Scott Roy