impossiblewizardry

The results for E[η_{n,m}] are useful for population genetics, but are not really relevant to cancer modeling. To investigate genetic diversity in the exponentially growing population of humans, you would sequence the DNA of a sample of individuals from the population. However, in the study of cancer each patient has their own exponentially growing cell population, so it is more interesting to have the information provided by Theorem 1 about the fraction of cells in the population with a given mutation.

The results he seems to think are so useless still seem to be the only results in the paper that has been used in data analysis (in this paper).

What Rick Durrett doesn’t seem to have realized is that, two years prior to the publication of this paper, people started doing DNA sequencing of individual cells from patient’s tumors. So, the study of a single patient became a population genetics problem, and results from a population genetics perspective were exactly what was needed.

Rick Durrett can be forgiven for not noticing that. He’s a mathematician, and can’t be expected to keep up with the latest advances in genomics. But as a consequence he should worry less about whether his results are applicable to the few problems he’s familiar with from his collaborators.

I prefer the spirit he demonstrated in this paper with Foo and Leder, where they provide extensive information about the growth of cancer in a space of 3 or more dimensions.

Sobel and Frankowski:

The special case in which all p_i are equal and all r_i (i=1, 2, …, b) are equal is the most important application and we have…

The part of the paper by “applied” mathematicians where they tell me that my application isn’t important, as a rhetorical device to transition to a special case that they happen to have results for.

Zeilberger:

Ewens and Wilf are very right when they claim that P(r, n,m) and Q(r, n,m) are very far apart around the “tail” of the distribution, but who cares about the tail? Definitely not a scientist and even not an applied mathematician. It turns out, empirically (and we did extensive numerical testing, see Procedure HowGoodPA1(R0,N0,Incr,M0,m,eps) in BallsInBoxes), that whenever P(r, n,m) is not extremely small, it is very well approximated by Q(r, n,m), and using the latter (it is so much faster!) gives very good approximations, and enables one to construct the “center” of the probability distribution (i.e. ignoring the tails) very accurately.

Statistical geneticists care about the tails. Imagine a gene association study, where you have some disease, and you want to know, for each of the 20,000 or so genes in the human genome, whether rare variants in those genes increase risk of the disease. You use the Bonferroni correction: instead of using 0.05 as a p-value cutoff, you use 0.05/20,000. So your question is about whether a probability is above or below a number which is around 1 in a million.

The approximation Zeilberger is considering does seem to be good even to probabilities around 10^-7 in the example he looks at, so it seems it can handle another order of magnitude more tests, but this could certainly occur in a study with a larger scope. For example, if instead of one disease, we consider 10, or 100.

Ewens knows this, because he works in statistical genetics. Zeilberger has no reason to think he knows more than Ewens about what matters in scientific practice. Zeilberger knows very little about what matters in practice, and doesn’t need to. He just needs to keep making his solutions more general, so that whatever does come up in practice will turn out to be a special case of something.

Which is what he does in fact do, so I have no problem with how Zelberger manages his research, it’s just ridiculous for him to pepper his papers with baseless value judgments.

Let X be a random variable which can be 0, 1, 2, etc. (that is, with support on the non-negative integers.)

Consider a constant number n. The probability that n divides X is

(1/n) ∑ Ψ(2π k / n) from k=0 to n-1

where Ψ(t) is the characteristic function of X, that is, the expected value of exp(i t X).

Deriving this formula was a good exercise in using the fact that the discrete Fourier transform is unitary. The basic idea is: expectations are dot products, dot products are preserved under unitary transformations. So instead of taking a dot product between a function and a probability distribution, you can take it between the Fourier transform of that function, and the Fourier transform of the probability distribution, which is given by values of the characteristic function.

The formula itself is also cool, since I didn’t really expect it to be this easy, since it’s related to divisibility, which I think of as a stubborn discrete math concept. For most of the distributions I think about, the characteristic function is easily obtained. In fact, I have the characteristic function more often than I have the probabilities themselves. So I can evaluate this probability pretty easily for small n.

Although for large n this formula is not so easy. Although the sum is finite, its size scales with n, and for no distribution that I’ve tried have I been able to simplify it to a constant-size formula. (although for the geometric distribution, I can get a constant-size formula more directly.)

I keep thinking that Timothy Gowers should be gay because he looks kind of like Anderson Cooper

In a geometric distribution, the probability of being a multiple of n is p / (1 - (1-p)^n).

The definition I’m using here is the number of failures before the first success, when the success probability is p.

I think of rotations as, in a way, not real transformations. Instead of imagining the object itself rotating, I just imagine the axes rotating. I’m just choosing to describe the same object, in a different way.

Recently I’ve had to get used to thinking of reflections in the same way too. Imagine a map of a city, and you describe points on the map as how far east of the center they are, and how far north of the center they are. Now you decide to describe them instead in terms of how far east and how far south. What you’ve done to the coordinates is a reflection. But you have not changed the map; it’s not a reflected inverted map; instead it’s only one of the basis vectors which has been reflected.

This perspective has helped me understand the terminology people use in principal component analysis. In PCA, you diagonalize the correlation matrix:

C = P D P’

The matrix P is described as the rotation matrix. But rotations all have determinant 1; might this have determinant -1 instead? At first I tried to prove somehow that the determinant must be 1 instead of -1, to explain why they call it a rotation.

But I realized, they probably make the determinant 1 by convention. You can always flip the sign of the determinant by flipping the sign of one of the columns of P. And you can always do this because P must be a basis made of eigenvectors of C, and that doesn’t change if you flip the sign of one of the basis vectors.

A symbolic way to see this is

P₂ = P₁ F

where F flips the sign of a column. And what is F? An identity matrix, but with one of the diagonal elements changed from 1 to -1. And

det P₂ = det P₁ det F

And det F is -1. So flipping the sign of a column flips the sign of the determinant.

And to see that it still works as a diagonalization:

P₂ D P₂‘ = P₁ F D F P₁ = P₁ D P₁

Sextus Empiricus:

Some of the natural philosophers, Epicurus being one of them, said that the motion whereby things change is one particular type of the process whereby things move from one place to another: because the admixture which undergoes qualitative changes, always and in every respect, according to and because of the transpositional motion of its constituents. And we are to identify those constituents by reasoning about them. For example, when something changes and becomes bitter after it has initially been sweet, or black after it has been white for this to happen it must needs be the case that its constituent masses have shifted around so that their relative order and arrangements have changed and received new ordering structures. And this could not happen in any other way except by the masses moving from place to place <relative to each other.> Similarly, again, for something to become soft after it has been hard, or hard after it has been soft, it must needs be the case that the particles, of which it is made up, have moved from place to place. Indeed, when the particles move apart, a thing becomes soft, and when they gather closer together, it becomes hard. From all this it follows that the motion whereby change is effectuated is not different, in genus, from the motion by means of which something moves from one place to another.

This is a pretty good summary of the modern schoolbook perspective of non-nuclear chemistry, although the electrons are not really said to have positions.

This kind of stuff takes science down a peg I think, in terms of what it can take credit for, just as similar stuff takes Christianity down a peg. Reading pre-Christian philosophy confirms my belief that Christians give God credit for informing us of certain ethical principles, when it fact people figure out those principles just fine on their own without needing them divinely revealed to them through sacred texts. But stuff like this shows me that some stuff which science takes credit for, such as this perspective on matter, was already there. Science turned it into a predictive model, it gets credit for that, but not for the underlying concepts. I used to think the mortality of the soul was something I could credit the scientific perspective for. Science provided evidence for it, certainly, but Epicurus shows there was a very good argument to be made for it before science as well. Science gets no credit for adding it to the library of thinkable or believable things, that's for certain.

C wants to become a rich doctor, her boyfriend wants to live on the moon.

Unlike C, I want a life off the beaten path, rather than something where you know pretty much exactly how it’s going to go and a lot of people have done it before, so you have confidence in your chances of success because you’ve seen other people do the same thing. But unlike C’s boyfriend, I acknowledge that the consequence of this is that I have no idea where I’ll be in 20 years, the moon or whatever.

One thing you can do to compress an image is do a wavelet transform, delete low coefficients, and transform back.

Suppose you’re using an orthogonal wavelet basis, and let’s call the matrix of the transformation W. I’ll denote the transpose by W’.

You have an image which is represented as a vector x. What is the squared error if we reconstruct an image from wavelet coefficients v?

||W’ v - x||²

||W’ v - W’ W x||²

(W’ v - W’ W x)’ (W’ v - W’ W x)

(v - W x)’ W W’ (v - W x)

||v - W x||²

So, whatever error you introduce to the wavelet coefficients by deleting small ones, that’s how much error you introduce to the image itself which is reconstructed from those coefficients. And that’s why, if you must delete a certain number of coefficients, it is best to delete the smallest ones.

I don’t think people really do compression like this, because of course you don’t have to be so binary about whether you keep a coefficient. You instead decdie how many bits you want to use to store a coefficient. JPEG uses a cosine transform rather than a wavelet transform, and uses fewer bits to store higher frequency coefficients. JPEG2000, which is based on wavelets, maybe it does something similar.

BUT if we imagine that we’re doing it like this, we’re deleting small coefficients and keeping the rest as is, basically what we’re doing is we’re projecting the image on a subset of the wavelet basis. And what I showed above implies that, to minimize the squared reconstruction error, the optimal subset of size k are the k vectors with the highest coefficients.

This framing seems analogous to principle components analysis. But in principle components analysis, I would start with a large set of images. And I would decide to use the same basis of size k for each of them (instead of, for each image, using whichever k basis vectors had the highest coefficients in that specific image). And then, instead of picking these vectors from a predefined basis like a wavelet basis, I would just find the optimum k vectors, out of all possible vectors, to minimize the summed reconstruction error across all images.

on a sunny day, you’ll have a shadow. Until you enter the shadow of a building, where you’re illuminated not directly by the sun, but by reverberating light reflecting from surfaces all around you, which has no single source, and thus will not cast a shadow.

Except in Houston, with all of these reflective glass buildings, your shadow will actually change direction as you walk around. When you’re in the shadow of a building, and can’t see the sun, you may still be illuminated from the other direction by the reflection of the sun from a glass building, which is bright enough and coherent enough to cast its own shadow.

There’s this really specific feeling of, “jesus, that’s what you guys are appreciating from Western culture?” Like when you realize how influential Coldplay is in Taiwan.

I figure that’s how Asians feel when they see a Western philosopher writing “in our silly Western philosophy, we do this. But in enlightened Asian philosophy, they do that.” There’s a lot of Asian philosophers and I completely trust the kind of people who make statements like this to pick the ones that Asians feel kind of embarrassed about

my grandma was telling me about a guy who would have maybe been fired for sexual harassment today, a salesman who sold to the store she worked at, who was always trying to touch her, and making comments. Her solution was, she stopped going to work the days he was going to be there. It seems like the slimes are punished more efficiently now, but how much has the average person changed? My grandma was saying, the average guy wasn’t that bad, but was patronizing, like... I don’t quite get the meaning, but like you need a lot of help and he knows what’s best for you.

I think there’s probably a program you could write that finds low values of SHA256(SHA256(x)) really fast. I mean, a program you could write if you knew it, I don’t think anybody can think of such a program. But since it’s possible, and solves proof of work for adding bit coin blocks, it’s like a real life cheat code. You enter the code into your computer and then you have a lot of money.

fucked up with google maps today. Went to the right address in the wrong town. Ended up taking a lyft which cost >$30, and the whole trip took like 3 hours when it should have taken like 30 min to 1 hour

daniel dennet is conscious but david chalmers is a p zombie.

so a transformation from Rⁿ to Rⁿ preserves Euclidean distances:

||v - w|| = ||v’ - w’||

From that you can prove it preserves Euclidean norms of vectors:

||v|| = ||v’||

And then you expand ||v’ - w’||² and see that it preserves dot products:

v’ . w’ = v . w

Geometrically, that means if you preserve all distances, you also have to preserve all angles, which makes sense. Then... OK, it has to be linear. You can find that

||(s v)’ - s v’||² = 0

just by expanding the left side and canceling. Then you show that

||v’ + w’||² = ||v + w||²

which you do just by expanding and unexpanding, and that allows you to prove

||(v + w)’ - (v’ + w’)||² = 0

again by just expanding and canceling. So that means this transformation is linear.

And then... well, it also turns out that it’s orthogonal. Because, ok let’s call the matrix of this transformation A. Then,

v^T v = ||v||² = ||v’||² = ||Av||² = v^T A^T A v

v^T (A^T A - I) v = 0

Alright? So, that last thing on the bottom, it’s a quadratic polynomial in n variables which puts some upper bound on how many zeros it can have, or like how those zeros can be shaped or something, but it’s actually zero everywhere, so it must be the zero polynomial, meaning all the coefficients are zero, and guess what’s the matrix containing the coefficients, it’s A^T A - I. So.

A^T A = I

A is orthogonal. Alright. Well, it turns out that its got to have a determinant of 1 or -1. Why? Well, because of some determinant rules that I don’t know how to prove.

1 = det I = det (A^T A) = det(A^T) det(A) = det(A) det(A)

So, det(A)² = 1, so it’s either 1 or -1.

So in linear algebra, a big fact about linear transformations from Rⁿ to Rⁿ is that some of them are diagonalizable, and what this means is that in some ways these matrix multiplications act just like ordinary multiplications.

The eigenvalues of the rotation matrix make too much sense. You can sort of do rotation with ordinary multiplication. You can rotate in the complex plane. Turns out the eigenvalues of the matrix of rotation clockwise by θ are exp(iθ) and exp(-iθ), which rotate by θ in the complex plane.

jadagul reblogged your post and added:

The special orthogonal group.

Wikipedia:

In mathematics, the orthogonal group in dimension n, denoted O(n), is the group of distance-preserving transformations of a Euclidean space of dimension n that preserve a fixed point, where the group operation is given by composing transformations. ...
An important subgroup of O(n) is the special orthogonal group, denoted SO(n), of the orthogonal matrices of determinant 1. This group is also called the rotation group, because, in dimensions 2 and 3, its elements are the usual rotations around a point (in dimension 2) or a line (in dimension 3).

So THAT’S the rotations!!

You might ask, "What does it mean to become a supple leopard?" It's a good question that warrants an explanation. I've long been fascinated with the idea of a leopard: powerful, fast, adaptable, stealthy...badass.

Kelly Starrett & Glen Cordoza - Becoming a Supple Leopard: The Ultimate Guide to Resolving Pain, Preventing Injury, and Optimizing Athletic Performance

Profile

impossiblewizardry

November 2021

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Syndicate

Page Summary

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

No cut tags

Page generated Feb. 7th, 2026 11:23 am

Recent Entries

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

November 2021

Syndicate

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags