Sum-product in finite fields via entropy

Hi everyone! After a long break from writing here, I thought that 2026, the era of Chat GPT and AI generated content, would be a good time to revive my blog. Inspired by a paper of Mathe–O’Regan, I want to kick things off in the new year by explaining a neat way to use entropy to prove that

\displaystyle \max\{|A+A|,\ |AA|\}\ \ge\ |A|^{6/5-o(1)}

holds for all subsets A \subset \mathbb{F}_{p} with |A| \ll \sqrt p. Here \mathbb{F}_p is the prime field, and the o(1) is hiding the usual polylogarithmic losses. I’ll actually be a bit sloppy about logs in this post and perhaps not just about logs.. e.g. adding accents to names is a pain in WordPress, so I won’t do that (many apologies in advance to Mathe, Erdos, Halasz, Kollar, Renyi Szemeredi, Szonyi, if any of you will be seeing this).

Long story short, the 6/5 exponent is an “old friend” in the finite-field sum–product world: it was established by Roche–Newton, Rudnev, and Shkredov, where it is obtained using a remarkable point–plane incidence theorem in \mathbb{P}^3(\mathbb{F}_p) by Rudnev. In turn, this incidence theorem relied on a customized version of a line-line incidence result by Guth and Katz for points and lines in 3D space, the main ingredient in their celebrated work on the Erdos distinct distances problem. The latter is a result over the reals, but it was shown by Kollar that this incidence theorem in 3D also holds over prime fields. I won’t get into this here, but a very nice short proof/exposition of Rudnev’s theorem can be found in a note of de Zeeuw, where a more direct reduction to the Guth–Katz-Kollar incidence results is provided.

In the continuous setting, Mathe–O’Regan used Shannon entropy inequalities in combination with a strong radial direction set theorem of Orponen-Shmerkin-Wang to prove quantitative results for the so-called discretized sum–product problem in the plane (you might have heard me mention the OSW theorem before, in connection with my work with Alex Cohen and Dmitrii Zakharov on the Heilbronn triangle problem). For the discretized sum-product problem, these bounds for the discretized sum-product problem have since been superseded by recent work of Ren and Wang on the Furstenberg set conjecture. However, I find the entropy viewpoint extremely attractive for many reasons, so the goal here will to be to advertise a bit this viewpoint by rederiving the 6/5 sum–product exponent in \mathbb{F}_p from:

(1) a “collision entropy” bound coming from the Guth-Katz-incidence theorem, and

(2) an entropy inequality in the spirit of Mathe–O’Regan.

Of course, given (1), the fact that this eventually gives the same exponent as Roche–Newton–Rudnev–Shkredov is not a coincidence, however I hope this new packaging of the Guth-Katz incidence input in the sum-product story (circumventing the need for Rudnev’s point-plane incidence theorem or the Stevens-de Zeeuw quantitative Szemeredi-Trotter theorem) will be interesting to someone (and hopefully that this someone will reach out to work with me on pushing things further…).

1. Shannon entropy and collision entropy

First, some basic preliminaries. Let U be a discrete random variable with distribution \mu. Its Shannon entropy is

\displaystyle H(U):=-\sum_u \mu(u)\log\mu(u),

and its collision entropy (also called Renyi–2 entropy I found out) is

\displaystyle H_2(U):=-\log\sum_u \mu(u)^2 = -\log\mathbb{P}(U=U'),

where U' is an independent copy of U. One always has

\displaystyle H(U)\ \ge\ H_2(U).

Also, if U is supported on a finite set S, then H(U)\le \log|S|. In particular, if X is uniform on A then H(X)=\log|A|.

In a few words, turns out one can sometimes upper bound \mathbb{P}(P=P') (and hence the entropy H(U)) using tools from incidence geometry in a very natural way, and this can be quite powerful. See for example a recent paper that Zach Hunter, Daniel Zhu and I wrote on a Halasz-type theorem for permutation anticoncentration, where we also do something like this in a completely different context.


2. Entropy lower bounds via Minkowski distance energy

Turns out, the right geometric parameter to consider in this story is the rectangular area

\displaystyle \mathcal{R}(p,q):=(q_1-p_1)(q_2-p_2),\qquad p,q\in (\mathbb{F}_p)^2.

Given a finite point set \mathcal{P}\subset(\mathbb{F}_p)^2, define the multiplicity function

\displaystyle \nu_{\mathcal{P}}(t):=\bigl|\{(p,q)\in\mathcal{P}^2:\ \mathcal{R}(p,q)=t\}\bigr|,

and the corresponding “Minkowski distance energy”:

\displaystyle Q(\mathcal{P})\;:=\;\sum_{t\in \mathbb{F}_p\setminus\{0\}} \nu_{\mathcal{P}}(t)^2.

Over \mathbb{R}, Roche–Newton and Rudnev proved the bound Q(\mathcal{P})\ll N^3\log N for N=|\mathcal{P}| (this is Proposition 4 in their paper). The proof is exactly an “incidence in 3D” argument: they reduce Q(\mathcal{P}) to counting intersections in a family of \sim N^2 lines and apply the Guth–Katz line-intersection theorem.

The same reduction almost works over \mathbb{F}_p, and the needed line-line intersection bound over finite fields is available thanks to Kollár’s algebraic variant of Guth–Katz (in the regime where the line family is not too large compared to p). For more details, see Rudnev’s paper or de Zeeuw’s note linked above. In our application, the line family has size \sim |\mathcal{P}|^2\sim |A|^4, so the natural condition |L|\ll p^2 becomes exactly |A|\ll \sqrt p.

Now here is how Q(\mathcal{P}) controls collisions of P. Take

\displaystyle \mathcal{P}:=(A\times A)\ \cup\ (-A\times -A)\ \subset (\mathbb{F}_p)^2.

Sample a point p=(X_1,X_3)\in A\times A and sample q=(-X_2,-X_4)\in -A\times -A. Then

\displaystyle \mathcal{R}(p,q)=(-X_2-X_1)(-X_4-X_3)=(X_1+X_2)(X_3+X_4).

For convenience, let’s shift perspective slightly and let’s consider the following setup instead. Start with a uniform random variable X on A, and let X_1,X_2,X_3,X_4 be i.i.d.\ copies of X. Then define P=(X_1+X_2)(X_3+X_4). In any case, P is a rectangular area between a random point in A\times A and a random point in -A\times -A. Now, let P' be an independent copy of P. Write n=|A|. The collision probability is

\displaystyle \mathbb{P}(P=P')=\sum_t \mathbb{P}(P=t)^2.

If we let \nu_{\mathrm{cross}}(t) denote the number of pairs (p,q)\in (A\times A)\times(-A\times -A) with \mathcal{R}(p,q)=t, then

\displaystyle \mathbb{P}(P=t)=\frac{\nu_{\mathrm{cross}}(t)}{n^4},\qquad\text{so}\qquad \mathbb{P}(P=P')=\frac{1}{n^8}\sum_t \nu_{\mathrm{cross}}(t)^2.

But \nu_{\mathrm{cross}}(t)\le \nu_{\mathcal{P}}(t) (since cross-pairs are a subset of all pairs in \mathcal{P}^2), hence

\displaystyle \sum_{t\ne 0}\nu_{\mathrm{cross}}(t)^2 \ \le\ \sum_{t\ne 0}\nu_{\mathcal{P}}(t)^2 \ =\ Q(\mathcal{P}).

The only remaining annoyance is the t=0 term. But

\displaystyle \mathbb{P}(P=0)\le \mathbb{P}(X_1+X_2=0)+\mathbb{P}(X_3+X_4=0).

Since X_1,X_2 are uniform on A, there are at most n pairs (a,b)\in A^2 with a+b=0, so \mathbb{P}(X_1+X_2=0)\le 1/n, and similarly for X_3+X_4. Thus \mathbb{P}(P=0)\le 2/n, so \mathbb{P}(P=0)^2\le 4/n^2.

Putting everything together, we get the clean bound

\displaystyle \mathbb{P}(P=P')\ \le\ \frac{Q(\mathcal{P})}{n^8}+\frac{4}{n^2}.

Now insert the incidence input: in the range n\ll \sqrt p, the finite-field version of Roche–Newton–Rudnev’s rectangular quadruple bound should give Q(\mathcal{P})\ll |\mathcal{P}|^3\log|\mathcal{P}|\asymp n^6\log n. With that in hand, it follows that

\displaystyle \mathbb{P}(P=P')\ \ll\ \frac{\log n}{n^2},

and hence the collision entropy satisfies

\displaystyle H_2(P)=-\log\mathbb{P}(P=P')\ \ge\ 2\log n - O(\log\log n).

Finally, H(P)\ge H_2(P), so we also get

\displaystyle H(P)\ \ge\ 2\log n - O(\log\log n).

This is the only place where geometry enters. Everything after this is purely “entropy bookkeeping”.

As an aside, the same argument works to show that

\displaystyle H((A-A)/(A-A))\ \ge\ 2\log |A| - O(\log\log |A|).

This could be regarded as an entropy version of Szonyi’s direction set theorem (this and its counterpart over the reals might be interesting for independent reasons).

It would also be interesting to compare this more closely with the appropriate part of the Mathe-O’Reagan argument from the continuous setting, where an inequality like the above is derived more directly from the Orponen-Shmerkin-Wang direction set theorem (the continuous analogue of Szonyi’s theorem).


3. An upper bound for H(P)

Next, I want to prove the second key entropy inequality that turns the lower bound on H(P) into a sum–product estimate.

Let X_1,X_2,X_3,X_4 be i.i.d. random variables taking values in a field F (we’ll apply this with F=\mathbb{F}_p and X uniform on A, but I want to emphasize for a moment that this part is field independent). Define

\displaystyle P:=(X_1+X_2)(X_3+X_4).

The inequality we want is given by the following proposition.

\displaystyle \textbf{Proposition.} Let X,Y be i.i.d. copies of the common distribution. Then

\displaystyle H(P)+4H(X)\ \le\ 2H(X+Y)+3H(XY)+O(1).

The proof is a submodularity argument. I’ll first state a convenient lemma/gadget in a clean form, and then plug it in for a couple of different encodings.

A submodularity gadget/strategy

Lemma. Let U,V,Z,W be discrete random variables. Assume:

  • Z is a function of U and also a function of V (i.e. H(Z\mid U)=H(Z\mid V)=0), and
  • W is determined by $(U,V)$ up to at most K possibilities (i.e. H(W\mid U,V)\le \log K).

Then

\displaystyle H(Z)+H(W)\ \le\ H(U)+H(V)+\log K.

Proof. Submodularity says H(U,Z)+H(V,Z)\ge H(Z)+H(U,V,Z). Since Z is a function of U and of V, we have H(U,Z)=H(U) and H(V,Z)=H(V), and also H(U,V,Z)=H(U,V). Thus

\displaystyle H(U)+H(V)\ \ge\ H(Z)+H(U,V).

On the other hand,

\displaystyle H(W)\le H(U,V)+H(W\mid U,V)\le H(U,V)+\log K.

Combine the two inequalities and rearrange. \square

Applying the gadget: two encodings of (X_1,X_2,X_3,X_4)

Set

\displaystyle S_1:=X_1+X_2,\qquad S_2:=X_3+X_4,\qquad Z:=P=S_1S_2.

We define two “encodings” of the underlying variables:

  • Encoding 1: U:=(S_1,S_2).
  • Encoding 2: V:=(X_1X_3,\ X_1X_4,\ X_2X_3).

Clearly Z is a function of U. The interesting point is that Z is essentially a function of V because of a little identity. Write V=(a,b,c) where

\displaystyle a=X_1X_3,\qquad b=X_1X_4,\qquad c=X_2X_3.

If a\ne 0, we can recover the missing cross-term X_2X_4 via

\displaystyle X_2X_4=\frac{(X_1X_4)(X_2X_3)}{X_1X_3}=\frac{bc}{a}.

Therefore, on the event a\ne 0, we can compute

\displaystyle Z=(X_1+X_2)(X_3+X_4)=X_1X_3+X_1X_4+X_2X_3+X_2X_4=a+b+c+\frac{bc}{a},

so indeed Z is a function of V on this “good” event.

Next we check reconstruction. Let

\displaystyle W:=(X_1,X_2,X_3,X_4).

Assume we are in the event

\displaystyle E:=\{S_1\ne 0,\ S_2\ne 0,\ a=X_1X_3\ne 0\}.

Then from (U,V) we can solve explicitly for X_1 and X_3:

\displaystyle X_1S_2=X_1(X_3+X_4)=X_1X_3+X_1X_4=a+b,

so if S_2\ne 0 then X_1=(a+b)/S_2. Similarly,

\displaystyle X_3S_1=(X_1+X_2)X_3=X_1X_3+X_2X_3=a+c,

so if S_1\ne 0 then X_3=(a+c)/S_1. Once you know X_1 and X_3, the remaining two are immediate:

\displaystyle X_2=S_1-X_1,\qquad X_4=S_2-X_3.

So on E, the tuple W is actually determined uniquely by (U,V). In other words, H(W\mid U,V,E)=0.

What about the bad event E^c? Since we will apply this with X uniform on a finite set A, each degeneracy has probability O(1/|A|):

  • \mathbb{P}(S_1=0)\le 1/n and \mathbb{P}(S_2=0)\le 1/n (at most one choice of partner gives sum 0),
  • \mathbb{P}(a=0)=\mathbb{P}(X_1X_3=0)\le 2/n (only if 0\in A).

So \mathbb{P}(E)\ge 1-O(1/n), and in particular for n large it’s bounded away from 0. Conditioning away E^c therefore changes any Shannon entropy in this discussion by at most O(1) bits, which is harmless for power-type bounds. So, applying the submodularity gadget on the good event and absorbing the conditioning error into an O(1) term, we get

\displaystyle H(Z)+H(W)\ \le\ H(U)+H(V)+O(1).

Now we just bound each term:

  • Since X_1,X_2,X_3,X_4 are independent and identically distributed, H(W)=H(X_1,X_2,X_3,X_4)=4H(X).
  • Since (X_1,X_2) is independent of (X_3,X_4), the sums S_1 and S_2 are independent and each has the same law as X+Y. So H(U)=H(S_1,S_2)=H(S_1)+H(S_2)=2H(X+Y).
  • Finally, by subadditivity, H(V)\le H(X_1X_3)+H(X_1X_4)+H(X_2X_3)=3H(XY).

Putting everything together gives exactly

\displaystyle H(P)+4H(X)\ \le\ 2H(X+Y)+3H(XY)+O(1),

as claimed.


4. End-game: convert entropy bounds into sum-product bounds

Now we specialize to our setting: X is uniform on A, so H(X)=\log n. Also

\displaystyle H(X+Y)\le \log|A+A|,\qquad H(XY)\le \log|AA|,

simply because X+Y is supported on A+A and XY is supported on AA.

From the Minkowski/rectangular-energy input we have

\displaystyle H(P)\ \ge\ 2\log n - O(\log\log n).

Plug this into the entropy inequality from the previous section:

\displaystyle (2\log n)+4\log n \ \le\ 2\log|A+A|+3\log|AA|+O(\log\log n).

Rearranging and exponentiating gives

\displaystyle |A+A|^2\,|AA|^3 \ \gtrsim\ |A|^{6-o(1)}.

So

\displaystyle \max\{|A+A|,\ |AA|\}\ \gtrsim\ |A|^{6/5-o(1)},

which is what we wanted.


As a final remark, it is perhaps worth mentioning (better late than never) that the best result for the sum-product problem over finite fields is that for all A \subset \mathbb{F}_{p} with |A| \lesssim p^{1/2}, we have that

\displaystyle \max\{|A+A|,\ |AA|\}\ \gtrsim\ |A|^{5/4},

and it is due to Mohammadi-Stevens. Funny enough, this inequality can be proven for finite sets A \subset \mathbb{R} with a one line application of the Szemeredi-Trotter theorem (the argument goes back to a beautiful paper of Elekes), however for the 5/4 exponent in finite fields one has to do a lot more work (it relies on the quantitative Szemeredi-Trotter theorem in \mathbb{F}_{p}, and thus also on Rudnev’s point-plane incidence bound and the Guth-Katz-Kollar technology, but also on some delicate higher energy estimates). As a next step, it seems quite natural to see whether there’s a way to recast this result via entropy, in a way that unifies it with the above framework.

Edit: While the argument works seamlessly for the reals, it seems that one needs to be a bit more careful over finite fields, since Kollar indeed established the \mathbb{F}_{p} version of the Guth-Katz line-line incidence theorem (i.e. the 2-rich version), but the Minkowski distance energy bound of Roche-Newton and Rudnev might actually require the full Guth-Katz incidence theorem. That being said, to get the 6/5 exponent in finite fields one only needs the Rudnev point-plane incidence theorem (and de Zeeuw shows in his note that this rests solely on the line-line incidence theorem of Guth-Katz-Kollar). So there should be an easy fix of the whole thing. Will edit more when/if I understand out how to do the entropy version of 5/4. 🙂

Discrete Geometry REU at Yale

As the title already suggests, this is a different kind of blog post; nevertheless, I am hoping that some of you will still find it interesting!

I’m very happy to announce that Hong Liu and I have decided to put together a small REU on extremal problems in discrete geometry for next summer. This will take place at Yale between May 31st and July 28th, and it will be part of Yale’s larger SUMRY program.

Information regarding logistics and, in particular, the applications process is available on the SUMRY website, but please do not hesitate to get in touch if you have any questions. Below a very rough description of what it will be about.

Description. Questions in discrete geometry typically involve finite sets of points, lines, circles, planes, or other simple geometric objects, and many of them are very natural and worth studying for their own sake. Some of them, such as the structure of 3-dimensional convex polytopes, go back to the antiquity, while others are motivated by connections with different areas of modern mathematics, in particular combinatorics, harmonic analysis, number theory, and topology. Every now and then, a new phenomenon is discovered in one place and this leads to surprising progress on all fronts. Over the past few years in particular, several breakthrough results have been discovered, answering or nearly answering some of the oldest problems in the field. In this REU project we will try to capitalize a little bit on this momentum. We will study some of these recent developments in the area (and some other remarkable developments from adjacent areas, e.g. extremal combinatorics), and then aim to make progress on a few new problems. Possible topics could include: incidence geometry, (hyper)graph theory, Ramsey theory, convex geometry, topological combinatorics, or coding theory.

We also hope to have several visitors and other fun activities, please consider joining us!

Difference sets without integers missing digits

One of the most famous problems in additive combinatorics is the one of determining the size of the largest subset of A \subset [N] whose difference set contains no non-zero perfect squares of integers. Here [N] = \left\{1,\ldots,N\right\}. Answering a question of Lovasz, Sarkozy and Furstenberg independently showed such a set must always have asymptotic density zero. Sarkozy’s proof was based on the circle method, and actually gave the quantitative bound

\displaystyle |A| \leq \frac{N}{(\log N)^{1/3+o(1)}},

whereas Furstenberg’s approach relied on ergodic theory. Nowadays, there are quite a variety of proofs of the qualitative result |A| = o(N), and the problem of finding strong quantitative estimates has become more or less known as the Furstenberg-Sarkozy problem.

This blog post is not going to be about this problem, but I would be remiss to proceed further to talk about a variant of it without mentioning a beautiful recent paper due to Thomas Bloom and James Maynard, where they show that all sets A whose difference set contains no nonzero perfect square must always satisfy

\displaystyle |A| \ll \frac{N}{(\log N)^{c \log \log \log N}},

for some absolute constant c > 0. Here \ll is the usual Vinogradov notation (i.e. you can read \ll as Big O) . This result manages to improve a bit upon a remarkably stubborn previous bound due to Pintz, Steinger and Szemeredi, which only had an extra log in the exponent of the denominator but which resisted throughout the recent years, despite the multiple developments around the related theorem of Roth about subsets of [N] without three-term arithmetic progressions.

Without getting into too much detail about how the two problems are connected, it is perhaps at least worth discussing a couple of interesting distinguishing features. One first remarkable thing about these strong quantitative bounds for the Fursternberg-Sarkozy problem is that they are of noticeably better quality than what is currently known for the size of the largest subset of [N] without nontrivial three term arithmetic progressions, despite both worlds following a Fourier-analytic density increment argument that shares the same core philosophy (see the recent exciting paper of Bloom and Sisask for more information on the latter). Furthermore, there is no Behrend-type construction available for the Furstenberg-Sarkozy problem, the largest known example of a subset of [N] with no perfect squares in the difference set only having size \approx N^{0.7334}. This is due to Lewko, who refined a construction due to Ruzsa. It is a popular belief among additive combinatorialists that there might exist an absolute constant \epsilon > 0 with the property that all sets A \subset [N] without nonzero perfect squares in A-A must satisfy

\displaystyle |A| = O\left(N^{1-\epsilon}\right),

but that currently seems well out of reach. Moreover, it is not entirely inconceivable that this (already rather bold) conjecture could even hold with \epsilon = 1/4.

There is a lot more to say of course, but, before I get completely carried away with this, I would now like to switch gears and discuss an intriguing variant of the above problem, which asks about the size of the largest A \subset [N] such that the difference set A-A contains no nonzero integer with only digits 0 and/or 1 when represented in base 3. This question was raised by Terence Tao and can be found as Problem 13 in a very nice survey by Thai Hoang Le.

Despite the seemingly haphazard nature of the set of integers without digits equal to 2 in base 3, which from now on I will denote by H_{3}, I’d like to begin by emphasizing that this is a very natural (and rather cool) variant of the Furstenberg-Sarkozy problem for several reasons. First and foremost, like the perfect squares and the many other “intersective” sets for which similar stories hold (such as the set of cubes or values of various classes of polynomials), H_{3} is also sparse set that contains 0 in a rather natural way. It is also not difficult to note that its structure is perfectly suited for an application of the powerful Density Hales-Jewett theorem, which yields the (morally) qualitative result |A| = o(N). In the same fashion, the problem of finding a more reasonable quantitative bound therefore arises.

At the same time, unlike the squares and the other intersective sets, for which the Fourier transform exhibits analogous analytic behavior, the H_{3} situation is less algebraic in flavor, and it seems a bit more complicated to use its structure in order to adapt any sensible density increment strategy. Nevertheless, (unrelated) recent work of Maynard on primes with restricted digits comes to mind, where analytic properties of a similar set turn out to be manageable enough to establish that there are infinitely many primes without a fixed digit in base 10 or higher -so perhaps there is some hope. I started thinking about this problem a bit a while ago, and, while I didn’t have much luck with the Fourier analytic approach at the time (or rather didn’t have enough energy to work on it more seriously..), I had the pleasant surprise to realize that the quantitative problem exhibits a rather different behavior than the Furstenberg-Sarkozy problem, and that establishing (some type of) quantitative upper bound is a much easier problem than it might look initially. I thought the other day that it might be nice to share this story here, since, as usual, it’s been a while since my last post!

Indeed, let me begin by sketching a surprisingly simple argument, in the style of Blichfeld’s proof of Minkowski’s theorem, which readily shows that

\displaystyle |A| = O\left(\frac{N}{\log N}\right)

holds for all sets A \subset [N] with (A - A) \cap H_{3} = \left\{0\right\}. The main idea is to consider the following translates of the set A: for each i \in \left\{1,\ldots,\lfloor \log_{3}{N}\rfloor \right\}, define

\displaystyle A_{i}:=A+x_{i} = \left\{a+x_{i}: a \in A\right\},

where x_{i} = 1+3+\ldots+3^{i-1}. Note that for each i \leq \lfloor \log_{3}{N}\rfloor , the set A_{i} is contained in \left\{1,\ldots,2N\right\}. Moreover, it is easy to see that the hypothesis on A also implies that for each i \neq j, the sets A_{i} and A_{j} are pairwise disjoint. Hence, \lfloor \log_{3}{N}\rfloor \cdot |A| \leq 2N, which implies the claim.

Needless to say, this is not of the same quality as the bound of Maynard-Bloom (or even the one of Pintz, Steinger and Szemeredi) for the Furstenberg-Sarkozy problem, but it is already superior to what one may get by attempting to adapt the initial density increment strategies (e.g. the original one of Sarkozy). In fact, the upper bound already almost matches the quality of the best known quantitative result for Roth’s theorem (although that seems like a pure coincidence).

Regarding lower bounds, one can consider the set A in [N] which consists of all the numbers with digits 0,1,\ldots,\frac{3^{k}-1}{2} in base 3^{k} with constant sum of digits, for some fixed k \geq 1. It is then not difficult to see that for an appropriate chosen constant, this yields a set whose difference set contains no integers missing digit 2 when written in base 3 and of size size |A| \approx (3^{k}-1)/2)^{\log_{3^k}{N}}/3^k \log_{3^k}(N). Perhaps a bit less coincidentally this time, when k = \sqrt{\log_{3}{N}}, this yields a construction with

\displaystyle |A| \approx \frac{N}{\exp(\sqrt{\log_{3}{N}})},

which matches the size of the Behrend 3AP-free set.

All in all, this means that the size of the largest subset of [N] with A-A containing no nonzero integer without digit 2 in base 3 is somewhere between N / \log N and N/ \exp(\sqrt{\log N}). This is a smaller gap than the one between the best upper and lower bounds for the Fursterberg-Sarkozy problem, but it would still be very nice if we could close it even further –especially since it seems quite likely that one should be able to do better than N / \log N for the upper bound using some Fourier analysis.

Another intriguing direction seems to be the analogous question for longer progressions. For fixed k \geq 2, suppose A \subset [N] is a set which contains no nontrivial k-term arithmetic progressions with common difference in H_3, i.e. no pattern of the form

\displaystyle x, x+y,\ldots,x+(k-1)y,

where y is a nonzero element of H_3. A standard application of the Density Hales-Jewett theorem yields the more general result that

\displaystyle |A| = o_{k} (N).

However, can one prove any interesting quantitative bounds when k \geq 3? The analogous generalization of the Furstenberg-Sarkozy problem has been studied extensively, see for example this paper of Prendiville (and the relevant information therein). Based on the k=2 situation, it seems however quite likely that one may not need to use any kind of (higher order) Fourier analysis in order to come up with a reasonable upper bound (possibly even of the form N/(\log N)^{c} for some constant c=c(k) > 0).

Last but not least, it is perhaps worth mentioning that the finite field versions of all these problem are also quite interesting. The problem of determining the size r_{2,\left\{0,1\right\}^{n}}\left(\mathbb{F}_{3}^{n}\right) of the largest subset A \subset \mathbb{F}_{3}^{n} with A-A avoiding all the vertices of the hypercube \left\{0,1\right\}^{n} which have at least one nonzero coordinate is a classical problem in extremal combinatorics. The best known results can be summarized in the following one line:

\displaystyle \frac{2^{n-1}}{\sqrt{n}} \leq r_{2,\left\{0,1\right\}^{n}}\left(\mathbb{F}_{3}^{n}\right) \leq \frac{15}{16} \cdot 2^{n}.

The upper bound comes from a paper of Huang, Klurman, and myself, while the lower bound is due to Alon, and in fact is the main source of inspiration for the construction of the large A \subset [N] with (A-A) \cap H_{3} = \left\{0\right\}. See Theorem 5 in the survey of Le already mentioned above to find out how this goes, although, if you like the problem(s) so far or agadmator chess videos, you might also want to pause for a bit and try to find it yourself at this point! Showing that r_{2,\left\{0,1\right\}^{n}}\left(\mathbb{F}_{3}^{n}\right) \leq 2^{n} is also an elegant application of the polynomial method.

For k \geq 3, the size r_{k,\left\{0,1\right\}^{n}}\left(\mathbb{F}_{3}^{n}\right) of the largest subset A \subset \mathbb{F}_{3}^{n} with no nontrivial arithmetic progressions with common difference in the hypercube \left\{0,1\right\}^{n} is quite mysterious. An application of the Density Hales-Jewett theorem yields, like in the H_{3} story, a qualitative bound of the form r_{k,\left\{0,1\right\}^{n}}\left(\mathbb{F}_{3}^{n}\right) = o_{k}(3^n). That might be the best known result (as far as I can tell), however, in light of the case k=2 and the exciting developments around Croot-Lev-Pach and the cap set problem it seems quite reasonable to expect that one could do much better. For example, someone (Peter Pach?) told me a while ago at a conference that Seva Lev believes

r_{3,\left\{0,1\right\}^{n}}\left(\mathbb{F}_{3}^{n}\right) = O(c^n)

must hold for some absolute constant c < 3.

Later edit: I’d also like to sketch an alternate proof for fact that if A \subset [N] is so that (A-A) \cap H_3 = {0}, then

\displaystyle |A| = O\left(N/\sqrt{\log_{3}{N}}\right).

This is a worse bound by a square root of the log in the denominator than the one discussed above, but perhaps it has some better chances of generalizing for the problem about sets without longer progressions without common difference in H_3. The idea is to proceed like in Erdos’ argument for the classical Littlewood-Offord problem. Note that the intersection of A with H_3 has size at most O\left(N^{\log_{3}{2}} / \sqrt{\log_{3}{N}}\right). This follows from Sperner’s theorem. Indeed, for each number x in H_{3} \cap [N], let (x_1,\ldots,x_{\log_{3}{N}}) be the vector of \left\{0,1\right\}-digits in the base 3 representation of x, and consider the subset

\displaystyle S_{x} = \left\{i \in \left\{ 1,\ldots,\lfloor\log_{3}{N}\rfloor\right\}: x_i = 1\right\}.

Since (A-A) \cap H_3= \left\{0\right\}, it is easy to see that for every two x \neq y in A \cap H_3, the corresponding sets S_{x} and S_{y} are not contained in one another; hence \left\{S_{x}: x \in A \cap H_3\right\} forms an antichain. It thus indeed follows by Sperner’s lemma that A \cap H_3 has size at most


\displaystyle |A \cap H_3| \leq {\log_{3}{N} \choose (\log_{3}{N})/2} \approx 2^{\log_{3}{N}} / \sqrt{\log_{3}{N}} = N^{\log_{3}{2}} / \sqrt{\log_{3}{N}}.


Next, note that the same inequality applies to every shift A \cap (H_{3}+x) by the same argument. In particular,

\displaystyle |A \cap (H_3+x)| \leq \frac{N^{\log_{3}{2}}}{ \sqrt{\log_{3}{N}}}.

Since we can cover A by \approx N/ N^{\log_{3}{2}} translates of H_3, the claim follows by applying this inequality for each piece.

Sidon sets and sum-product phenomena

Happy new year to everyone who is still following this blog! For this new post (coming after a nontrivial break), I thought I should perhaps try something different, and so I decided that it might be a good idea to discuss an open problem which I like, and which has been on the back of my mind lately.

Long story short, it’s a problem that Oleksiy Klurman and I came up with together a few years ago, and which we excitedly shared with some of our friends at various workshops and conferences, but we never really got a chance to write it down anywhere (mostly because we couldn’t prove anything particularly interesting about it..). In the meantime, one of us started a blog, so we thought the other day that we might finally have an opportunity to say some things about this problem in a sufficiently formal way and perhaps voice some of our frustrations.

To put this into context, let me start by introducing some notation and discussing some classical results about Sidon sets. For a set of real numbers A, let s_{+}(A) denote the size of the largest subset A' of A with the property that there are no nontrivial solutions to the equation x+y=z+w, where x,y,z,w \in A' are pairwise distinct from each other. Such a set A' is called an additive Sidon subset of A, so s_{+}(A) thus denotes the size of the largest additive Sidon subset of A. One of the most classical problems in additive combinatorics is to determine s_{+}(\left\{1,\ldots,N\right\}) for sufficiently large positive integers N. For brevity, we will denote this function in this case by s_{+}(N).

Let’s discuss a few things about this s_{+}(N). First, since all the pairwise sums live in the interval \left\{1,\ldots,2N\right\}, it is easy to see that {s_{+}(N) \choose 2} \leq 2N, which implies that s_{+}(N) = O(N^{1/2}). In 1941, Erdos and Turan refined this simple observation and proved that s_{+}(N) \leq N^{1/2} + O(N^{1/4}). This was further sharpened by Lindstrom, who established the inequality

\displaystyle s_{+}(N) < N^{1/2} + N^{1/4} + 1,

which is the best known upper bound for s_{+}(N). In order to produce a good lower bound for s_{+}(N), one needs to construct a large subset of \left\{1,\ldots,N\right\} with all of its two element subset sums distinct. In their paper already cited above, Erdos and Turan proceeded as follows. Fix a prime p, and let (k^2) be the unique integer in \left\{1,\ldots,p-1\right\} congruent to k^2 modulo p. It is not hard to check that \left\{2pk + (k^2): 1 \leq k < p\right\} is a Sidon set contained in the interval \left\{2p+1,\ldots,2p(p-1)+1\right\}. In particular, for N = \lceil 2p(p-1)+1\rceil and p sufficiently large this construction already leads to the fact that

\displaystyle s_{+}(N) \geq \frac{N^{1/2}}{\sqrt{2}}(1+o(1))

holds for all N sufficiently large. This is essentially a Freiman isomorphism from the y=x^2 parabola in \mathbb{F}_{p}^{2}. Three different constructions due to Singer, Bose, and Ruzsa, all with a similar algebraic flavor, lead to a slightly better lower bound for s_{+}(N) than the one above, which is of the form

\displaystyle s_{+}(N) \geq N^{1/2}(1+o(1)).

While all of these results are quite classical, determining whether the correct size of s_{+}(N) is closer to the upper bound or to the lower bound (i.e. if the lower order term in Lindstrom’s upper bound is required) and what all these extremal Sidon sets might look like are still major open problems. We refer the interested reader to this consult this nice survey of O’Bryant and this beautiful blog post of Gowers for more information.

Similarly, one can play the same game with multiplication. For a set of real numbers A, let us now define s_{\times}(A) to be the size of the largest subset A' of A with the property that there are no nontrivial solutions to the equation xy=zw, where x,y,z,w \in A' are pairwise distinct from each other. Naturally, we will call A' a multiplicative Sidon subset of A, so like before s_{\times}(A) simply stands for the size of the largest multiplicative Sidon subset of A.

When A = \left\{1,\ldots,N\right\}, the story of s_{\times}(N):=s_{\times}(\left\{1,\ldots,N\right\}) turns out to be a bit more pleasant than the story of s_{+}(N). At first glance, this is perhaps not too surprising. First of all, it is easy to see that s_{\times}(N) is much larger than s_{+}(N). For example, one can consider A' to be the set of prime numbers inside the interval \left\{1,\ldots,N\right\}. This is a multiplicative Sidon set and, by the Prime number theorem, |A'| = \Theta(N/\log N); therefore, we know that s_{\times}(N) = \Omega(N/\log N), which is somewhat promising. Indeed, in 1938 (three years before his additive Sidon paper with Turan), Erdos had in fact already started the study this quantity and managed to further improve upon this simple construction by showing that

\displaystyle s_{\times}(N) \geq \pi(N) + c\frac{N^{3/4}}{(\log N)^{3/2}},

for some absolute constant c >0, where \pi(N) stands as usual for the number of primes in \left\{1,\ldots,N\right\}. Moreover, in the same paper he also proved that s_{\times}(N) \leq \pi(N) + O(N^{3/4}), upper bound which 31 years later he himself refined to s_{\times}(N) \leq \pi(N) + O\left(\frac{N^{3/4}}{(\log N)^{3/2}}\right), thus establishing the correct order of magnitude of the lower order term, namely

\displaystyle s_{\times}(N) = \pi(N) + \Theta\left(\frac{N^{3/4}}{(\log N)^{3/2}}\right).

So, to (abruptly) summarize: while for s_{+}(N) we are still fairly troubled regarding the correct order of magnitude and the nature of the largest additive Sidon subsets in \left\{1,\ldots,N\right\}, for the multiplicative Sidon story we know quite well what s_{\times}(N) is, and we also know that these extremal sets are intimately connected with the primes.

But what do we know about s_{+}(A) and s_{\times}(A) for other sets of reals A? When A is multiplicatively structured, say A = \left\{2,2^{2},\ldots,2^{n}\right\}, we have that s_{+}(A) = n and s_{\times}(A) = s_{+}(\left\{1,\ldots,n\right\}) = \Theta(n^{1/2}), so the roles are in some sense reversed. Needless to say, this situation resembles quite well the dichotomy that motivated the celebrated sum-product conjecture of Erdos and Szemeredi, which roughly states that no set has a polynomial saving for both the size of the sum set and the size of the product set, over the trivial quadratic upper bound. So perhaps, one can formulate an analogous conjecture.. (and, of course, that’s what we did!).

Question. (KP) For every set of reals A, is there an absolute constant c > 0 such that

\displaystyle \max\left\{s_{+}(A),s_{\times}(A)\right\} \geq c|A|^{1-\epsilon}

for every \epsilon > 0?

It is perhaps important to mention that we couldn’t find this anywhere in the literature, but that we did not look terribly hard for a reference (as I said, we more or less just asked around!). So, not only that we’d be (very) excited if someone can solve it, but we’d also be quite grateful if anyone knows a reference where something along these lines has been already asked.

In the remaining time/space, I’d like to draw attention towards a particularly tantalizing special case, which seems already quite tricky. Suppose A is a set of reals with small doubling, namely |A+A| = K|A|, where K may possibly depend on |A|. We know that there are such sets with s_{+}(A) = \Theta(|A|^{1/2}), so the question here is: can one prove that we always have s_{\times}(A) = \Omega(|A|^{1-\epsilon}) for every \epsilon > 0?

By using Solymosi’s theorem that every set A satisfies E^{\times}(A) = O\left(|A+A|^2 \log |A|\right), where E^{\times}(A) stands for the number quadruples (a,b,c,d) \in A^4 with ab=cd and a, b, c, d pairwise distinct, it is not difficult to prove that every set A with |A+A| = K|A| contains a multiplicative Sidon subset of size \Omega_{K}(|A|^{2/3-\epsilon}) for every \epsilon > 0. Indeed, one can take a random subset B of A, obtained by choosing each element of A to be part of B with probability p \in [0,1] independently, and then removing one element from each multiplicative quadruple in B. The end result is some smaller (random) set A' \subset B with no multiplicative quadruples, which on average will have size

\mathbb{E}[|A'|] \geq p|A| - c p^{4} |A+A|^{2}\log |A|,

for some absolute constant c > 0. Thus, there must exist a multiplicative Sidon subset of A of at least that size. Using the fact that |A+A| = K|A| and optimizing for p in terms of |A| then proves the claim. And surprise, surprise, this is more or less the only nontrivial thing that we could say! We think that one can maybe hope to use ideas from the proof of Solymosi’s sum-product theorem more directly rather than the result itself as a blackbox in order to do better than |A|^{2/3}, but we couldn’t find any such argument. Even when K is small enough to allow the application of Freiman’s theorem, we didn’t know how to use the generalized arithmetic progression structure of A to do any better (except perhaps in some situations where one adds some further assumptions on A). So, any ideas would be very welcome!

That being said, I’d also like to add that a friendlier direction which seems to be still worth investigating is the “dual” regime where |AA|=K|A|, where K is an absolute positive constant. By the multiplicative variant of Freiman’s theorem, A must in this case lie in a multiplicative subgroup of \mathbb{C}^{*} of small rank r = r(K) (bounded only in terms of K), so by quantitative variants of Schmidt’s subspace theorem such as the one due to Amoroso and Viada it is also not too difficult to find “almost Sidon sets” of size \Omega(|A|^{1-\epsilon}). To be more precise, for every \epsilon > 0, there exists some positive integer k=k(\epsilon) and a subset A' of A of size \Omega(|A|^{1-\epsilon}) with the property that each sum of two distinct elements in A' has multiplicity at most k. This also follows from a probabilistic deletion argument in the style of the one above, although it is a bit more technical. Nevertheless, it seems likely that finite subsets of multiplicative subgroups of \mathbb{C}^{*} of small rank might actually have linear size additive Sidon subsets for some “easier” reasons (e.g. the primordial subset \left\{2,2^2,\ldots,2^n\right\}, which is contained in a rank 1 multiplicative subgroup, is Sidon itself due to the uniqueness of binary representations).

Later edit: In a comment, Oliver Roche-Newton already provided a simple construction which shows that answer to the original question cannot be yes as for any \epsilon < 1/4: there exists a set A with no additive/multiplicative Sidon subsets of size greater than |A|^{3/4}. This set however does not have small additive or multiplicative doubling, so the two particular cases described above could still be true. It also still remains to see whether there exists \alpha > 0 such that

\displaystyle \max\left\{s_{+}(A),s_{\times}(A)\right\} \geq c|A|^{1/2+\alpha}

for some absolute constant c>0. This would already be quite interesting (and in the spirit of the sum-product theorem of Erdos and Szemeredi). That being said, it is perhaps also useful to add that

\displaystyle \min\left\{s_{+}(A),s_{\times}(A)\right\} = \Omega\left(|A|^{1/2}\right)

follows from a more general result of Komlos, Sulyok and Szemeredi which implies that s_{+}(A) = \Omega(s_{+}\left(\left\{1,\ldots,|A|\right\}\right) holds for all sets of integers A. It is not difficult to see that once we have this over the integers, s_{+}(A) = \Omega(|A|^{1/2}) also holds for all sets of reals A (by using a diophantine approximation argument, for example). Since s_{\times}(A)=s_{+}\left(\left\{\log a\ :\ a \in A\right\}\right) (ignoring small issues), this also implies s_{\times}(A) = \Omega(|A|^{1/2}).

Later edit 2: In the meantime, I’ve also become aware of improved constructions by Oliver Roche-Newton+Audie Warren and (independently) Ben Green+Sarah Peluse which show that the original conjecture cannot be true as stated for any \epsilon < 1/3: there exists a set A with no additive/multiplicative Sidon subsets of size greater than |A|^{2/3}. Moreover, these constructions also have small additive doubling, so they also show the (rather curious) fact that the 2/3 exponent obtained via probabilistic deletion and Solymosi’s theorem which I sketched (for the size of the largest multiplicative Sidon subset in A when |A+A|=K|A|) is actually sharp. More updates to come. (I hope!)

Discrete Mathematics lecture notes

It’s been a while since I posted any mathematics on this blog, but, in my defense, it has been a crazy year! I hope everyone who reads this is well and that all of your friends and families are safe and healthy.

In a first attempt to revive the situation, I’d like to post some lectures notes I finished putting together recently for a somewhat atypical introductory discrete mathematics course I’ve been teaching (online) at Yale this semester. They are inspired by the material in the lovely book of Jiri Matousek and Jaroslav Nesetril entitled “Invitation to Discrete Mathematics” (2nd ed.), but they do not quite follow the same structure and contain many different twists, so I am hoping they will be of interest even to people who already have the book. That being said, it is perhaps important to also add that these lecture notes are not proofread very seriously and more often than not skip various stories and proofs (as an incentive for my students to show up to class!), but it is very likely that I will be updating these in the near future. In any case, I hope you enjoy!

Chapter 1: Introduction to combinatorics

Chapter 2: The Principle of Inclusion-Exclusion

Chapter 3: Intersecting sets and poset madness

Chapter 4: Introduction to graph theory

Chapter 5: Extremal graph / Ramsey theory

Chapter 6: The Probabilistic Method

If, like me, you prefer to see such materials altogether in one single file, you can do this by clicking here. Have fun!

If anyone also happens to be interested in using them for various purposes (e.g. to teach a similar course at some point) and would like to have the source files and other relevant things (e.g. the homeworks), I’d be happy to provide them. Just shoot me an e-mail!

Configurations with many incidences and no triangles

Given a set \mathcal{P} of points and a set \mathcal{L} of lines, both in \mathbb{R}^{2}, an incidence is a pair (p,\ell) \in \mathcal{P} \times \mathcal{L}. Answering a question of Erdos and Purdy, Szemeredi and Trotter proved the following important theorem:

\displaystyle I(\mathcal{P},\mathcal{L}) = O\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+|\mathcal{L}|\right).

The Szemeredi-Trotter theorem is one of the most remarkable results in incidence geometry not only because it is extremely useful in applications all across combinatorics, but also because it is one of the few optimal incidence theorems available. Indeed, there are two somewhat different constructions achieving the equality (up to constants): one due to Elekes and another one, less well-known, due to Erdos himself.

In this blog post, we will discuss an interesting result of Solymosi, which showcases something that these two constructions have in common (and as a matter of fact, something that all configurations with \Theta\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+\mathcal{L}\right) incidences have in common).

Theorem. (Solymosi) Let \mathcal{P} be a set of points and let \mathcal{L} be a set of lines, both in \mathbb{R}^{2}. Suppose that

\displaystyle I(\mathcal{P},\mathcal{L})=\Theta\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+|\mathcal{L}|\right).

Then, there must exist three non-collinear points in \mathcal{P}, with the property that every pair of points among them determines a line which is from the set \mathcal{L}.

From now on, we will also no-so-surprisingly refer to such local configurations simply as triangles, since that’s what they actually are in the plane, however note that this could easily become confusing if one is not careful, since graph theoretically these configurations actually represent copies of a cycle of length six in the incidence graph \mathcal{P} \times \mathcal{L}. Solymosi’s theorem can be therefore rephrased as saying that for every set of points \mathcal{P} and every set of lines \mathcal{L} which do not determine any triangles, we have that

\displaystyle I(\mathcal{P},\mathcal{L})=o\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+|\mathcal{L}|\right).

We will actually not dwell too much on the proof of Solymosi’s theorem, since in this post I would like to instead focus on a lower bound construction. Nonetheless, I’d like to at least record the strategy, since the idea is very simple and interesting. As a starting point, recall that the Szemeredi-Trotter theorem can be proved in a few ways by amplifying the trivial bounds I(\mathcal{P},\mathcal{L}) \leq |\mathcal{P}| + |\mathcal{L}|^{2} and I(\mathcal{P},\mathcal{L}) \leq |\mathcal{P}| + |\mathcal{L}|^{2}. Most famously, this has been done in an absolutely beautiful indirect manner by Szekely via the so-called crossing number inequality; however, more modernly, this has also been achieved in various other more practical ways by using partitioning theorems (such as the Guth-Katz polynomial partitioning theorem but also using older partitioning results such as the simplicial cutting theorem of Matousek (used also by Solymosi); see for instance these lecture notes of Guth or this blog post of Tao). Solymosi’s main observation is that under the additional assumption that the incidence graph \mathcal{P} \times \mathcal{L} contains no copy of a cycle of length six, one can improve on the trivial inequalities above even before the amplification step. More precisely, one can show for instance that I(\mathcal{P},\mathcal{L}) \leq 2|\mathcal{P}| + o(|\mathcal{L}|^{2}) as follows.

First, let \mathcal{P}' be the set of points in \mathcal{P} which lie on at most two lines from \mathcal{L}; note that

\displaystyle I(\mathcal{P},\mathcal{L}) = I(\mathcal{P}',\mathcal{L}) + I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}) \leq 2|\mathcal{P}| + I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}).

Then, consider the graph G(\mathcal{L}) with vertex set \mathcal{L} and with edges defined as follows: for each point x \in \mathcal{P} \backslash \mathcal{P}', consider the set of lines \mathcal{L}(x) which contain of x, and cover up the vertices corresponding to \mathcal{L}(x) in G(\mathcal{L}) by edge-disjoint triangles (up to a small error). The number of edges in G(\mathcal{L}) thus constructed is then roughly the same as the number of incidences I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}), so if for some reason I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}) = \Omega(|\mathcal{L}|^{2}), this would mean that G(\mathcal{L}) has \Omega(|\mathcal{L}|^2) edge-disjoint triangles. By the triangle removal lemma, having this many edge-disjoint triangles would further imply that our graph G(\mathcal{L}) must actually have \Omega(|\mathcal{L}|^3) triangles in total, and so some of these triangles would most definitely have to come from different points in \mathcal{P}, by design. It is easy to see that this would immediately be in conflict with the hypothesis that there are no cycles of length six in the incidence graph \mathcal{P} \times \mathcal{L}. Therefore, it follows that I(\mathcal{P} \backslash \mathcal{P}',\mathcal{L}) = o(|\mathcal{L}|^{2}), which gives the claim above. Amplfying this bound instead of the trivial bound, one can then recover Solymosi’s theorem.

We will now move on to describe a construction of a set points and a set lines with many incidences and no triangles, which shows that Solymosi’s theorem is in some sense sharp in the regime when |\mathcal{P}| \gg |\mathcal{L}|^{2-\epsilon} or |\mathcal{P}| \gg |\mathcal{L}|^{2-\epsilon} for every \epsilon > 0. The construction is in the style of Elekes’s construction for the original Szemeredi-Trotter theorem, but with an additive combinatorics twist to force out triangles.

A set of positive integers A is called a non-averaging set of order k if for each 1 \leq u,v \leq k, the equation

\displaystyle u x_{1} + v x_{2} = (u+v)x_{3}

has no solutions with x_{1}, x_{2}, x_{3} all in A and all pairwise distinct. Furthermore, let s_{k}(n) denote the size of the largest non-averaging set of order t inside the interval \left\{1,\ldots,n\right\}. In particular, note that s_{1}(n) \leq r_{3}(n), where the latter denotes as usual the size of the largest three-term progression free set inside \left\{1,\ldots,n\right\}. It is well-known that there exists some absolute constant c > 0 such that for every sufficiently large n one has

\displaystyle r_{3}(n) \geq n \exp{(-c \sqrt{\log n})},

thanks to a celebrated construction due to Behrend. Interestingly, it is not too hard to modify this construction to show that \left\{1,\ldots,n\right\} also contains very large non-averaging sets of order k, for every k which is reasonably small compared to n. Indeed, the following nice observation of Stanchescu holds.

Theorem (Stanchescu). For every k \geq 1, there exists some absolute constant c>0 such that

\displaystyle s_{k}(n) \geq n \exp{(-c \log k \sqrt{\log n})}

for every sufficiently large n.

We can now finally get to the construction. Let r and s denote some large positive integers, with a precise relationship between them to be specified later. For now, we shall only think of r as being much larger than s. Let A then be a non-averaging set of order s inside \left\{1,\ldots,r\right\} and satisfying

\displaystyle |A| \geq r \exp{(-c \log s \sqrt{\log r})}

for some absolute constant c > 0. Define the set of points \mathcal{P} \subset \mathbb{R}^{2} by

\displaystyle \mathcal{P} = \left\{(x,y):\ x \in A, y \in \left\{1,\ldots,2rs\right\}\right\}.

Let \ell_{m,n} be the line in \mathbb{R}^{2} defined by the equation y=mx+n, and let \mathcal{L} be the set of lines defined by

\displaystyle \mathcal{L} = \left\{\ell_{m,n}: m \in \left\{1,\ldots,s\right\}, n \in \left\{1,\ldots,rs\right\}\right\}.

Note that |\mathcal{P}| = 2 |A|rs, whereas |\mathcal{L}| = rs^{2}. Furthermore, note that for every line \ell_{m,n} in \mathcal{L} and every x \in A, there is a unique element mx + n in \left\{1,\ldots,2rs\right\} such that (x,mx+n) lies on \ell_{m,n}. In particular, each line \ell_{m,n} determines at least |A| incidences with P, so I(P,L) \geq |A|rs^2.

In particular, if say s \ll \exp((\log r)^{c}) for some c < 1/2, then |A| \gg r^{1-\epsilon} for every \epsilon > 0, and

\displaystyle I(\mathcal{P},\mathcal{L}) \geq |A|rs^{2} \gg r^{2-\epsilon}s^{2} \gg (r^{3}s^{3})^{\frac{2}{3}-\frac{\epsilon}{3}} \geq (|\mathcal{P}||\mathcal{L}|)^{\frac{2}{3}-\frac{\epsilon}{3}}.

We next verify that in this configuration there are no non-collinear three points in \mathcal{P}, with each pair lying on a line from \mathcal{L}. We argue this by contradiction. Suppose there are three lines \ell_{1},\ell_{2},\ell_{3} in \mathcal{L} and three points p_{1},p_{2},p_{3} in \mathcal{P} such that their vertices induce a cycle of length six in \mathcal{P} \times \mathcal{L}, and the points p_1, p_2, p_3 are not collinear. Without loss of generality, suppose the lines \ell_{1} and \ell_{2} meet at p_{1} := (x_1,y_1),, lines \ell_{2} and \ell_{3} meet at p_{2} := (x_2,y_2), and the lines \ell_{3} and \ell_{1} meet at p_{3} := (x_3,y_3).

Also, for each i \in \left\{1,2,3\right\}, let \ell_{i} be described by the equation y=m_{i}x+n_{i}, and without loss of generality assume that m_{1} \geq m_{2} \geq m_{3}. It is easy to check that by definition of p_{1}, we have that (m_{1}-m_{2})x_{1} + n_{1}-n_{2}=0, and we also have similar equations for p_{2} and p_{3}.

Adding up these equations, it follows that

\displaystyle (m_{1}-m_{2})x_{1} + (m_{2}-m_{3})x_{2}=(m_{1}-m_{3})x_{3}.

But x_{1},x_{2},x_{3} are all in A and recall A is a non-averaging set of order s, so this would imply that x_{1}=x_{2}=x_{3}. This is a contradiction, since we assumed that the points p_1, p_2, p_3 are not collinear.

It would interesting to see in the future (maybe a future blog post!) whether the triangle removal lemma could be combined instead with the crossing number inequality proof of Szemeredi-Trotter (rather than the cutting method proof) to give an alternate proof for

\displaystyle I(\mathcal{P},\mathcal{L})=o\left(|\mathcal{P}|^{2/3}|\mathcal{L}|^{2/3}+|\mathcal{P}|+|\mathcal{L}|\right).

Another natural question also seems to be whether one can improve quantitatively on the bound above by proving new bounds for the triangle removal lemma which are specific to this geometric setting. This of course may be difficult since the only data point where this type of thing has been accomplished so far is the arithmetic setup in vector spaces over finite fields.

On the Caro-Tuza-Wei inequality

I’d like to start off this blog with a post that is written jointly with my good friend Fedor Petrov.

To set things up, recall that a k-uniform hypergraph H represents a pair (V(H),E(H)) where V(H) is a finite set of vertices and E is a family of k-element subsets of V(H). When k=2, H is simply called a graph. For v \in V(H), its degree in H, which we will denote as \deg_{H}(v), is defined to be the cardinality of the set of edges in E(H) which contain v. A subset S\subset V(H) is an independent set of the hypergraph H if there is no edge in E(H) which has all of its k elements that describe it as part of S. Equivalently, this is sometimes written as {S \choose k} \cap E(H) = \emptyset. Furthermore, the independence number of H, denoted by \alpha(H), represents the maximum size of an independent set in H, and lower bounding it in terms of the degrees of the vertices in H is the main subject of this post.

Estimating the independence number of (hyper)graphs under various hypotheses is a classical and important topic in graph theory, since many extremal combinatorics problems can be rephrased as inequalities about \alpha(H) for carefully defined hypergraphs H. We will not aim to give a complete background for this general problem, so we refer the reader to The Probabilistic by Alon and Spencer or a nice paper of Dutta, Mubayi, and Subramanian for more appropriate accounts. The story I’d like to tell here begins with the beautiful result of Caro and Wei, who independently proved that for every graph G,

\displaystyle \alpha(G) \geq \sum_{v \in V} \frac{1}{\deg(v) + 1}.

In particular, after an application of the Cauchy-Schwarz inequality, this implies the well-known Turan’s theorem about the largest number of edges a graph without a complete subgraph of a fixed size can have. Before the Caro-Wei theorem, Spencer extended Turan’s theorem to k-uniform hypergraphs, so extending the above inequality to k-uniform hypergraphs (and thus improving Spencer’s theorem) became a natural next step. Indeed, Caro and Tuza were able to prove the following generalization, which I will take the liberty to refer to as the Caro-Tuza-Wei inequality.

Theorem 1 (Caro-Tuza). Let k \geq 1 be an integer and let H=(V,E) be a (k+1)-uniform hypergraph. Then,

\displaystyle \alpha(H)\geqslant \sum_{v\in V} \frac{1}{{\deg(v)+1/k\choose\deg(v)}}.

For k=1 this recovers the Caro-Wei theorem, but the proof is much more involved. In this post, we will give a short probabilistic proof of this theorem, by extending the celebrated proof of the Caro-Wei theorem from The Probabilistic by Alon and Spencer [Theorem 1, page 91]. It is important to mention that this is not the first time when this kind of idea is attempted. In fact, in the paper I mentioned above, Dutta, Mubayi and Subramanian generalized the proof of Caro-Wei from Alon and Spencer in an interesting (but rather complicated) way to show that there exists some positive constant d_{k} such that for every k+1 uniform hypergraph H, we have

\displaystyle \alpha(H) \geq d_{k} \sum_{v \in V(H)} \frac{1}{(\deg(v) + 1)^{1/k}}.

This is an easy (and valuable) consequence of the Caro-Tuza theorem, but it seems that the method of Dutta, Mubayi and Subramanian does not quite yield the precise version of Theorem 1. We present an alternative (simpler) approach which achieves this feature.

Probabilistic proof of the Caro-Tuza-Wei Inequality

For each v \in V, let \xi_v be an independent uniform random variable in [0,1]. Let \xi denote the |V|-tuple (\xi_{v})_{v \in V} \in [0,1]^{|V|}. For every edge e\in E in our k-uniform hypergraph H, remove a vertex v from e for which \xi_v=\max_{u\in e} \xi_u. Let S_{\xi} denote the remaining set of vertices. It is easy to check that S_{\xi} is an independent set in H. We claim that the expected value of S_{\xi} over all choices of \xi = (\xi_{v})_{v \in V} in [0,1]^{|V|} is at least the RHS expression from Theorem 1.

Let’s check this. For each vertex v \in V, we estimate the probability that v remains in S_{\xi}. Denote the degree of v by m. For a fixed value \tau \in [0,1] of \xi_v, and a given edge e containing v, the probability that e does not remove v is equal to the probability that there exists u\in e with \xi_u>\tau, which equals 1-\tau^k. Clearly these events for different edges have positive correlation: if some of them hold, it may only help the others to hold. For a linear (k+1)-uniform hypergraph H (which is a (k+1)-uniform hypergraph with the extra property that every two edges have at most one common vertex) they are truly independent. Thus the probability that they hold simultaneously is not less than (1-\tau^k)^{m}, with equality in the linear case. Therefore, the probability that v survives in S_{\xi} is at least the integral \int_0^1 (1-\tau^k)^m d\tau, which we can compute as follows. After two changes of variables, note that

\displaystyle \int_0^1 (1-\tau^k)^m d\tau = \frac1t\int_0^1 (1-\theta)^m\theta^{1/k-1}d\theta = \frac1t \cdot B(m+1,1/k),

where the function B(x,y) stands for the usual beta function

\displaystyle B(x,y) = \int_{0}^{1} \Theta^{x-1}(1-\Theta)^{y-1} d\Theta,

defined for all complex numbers x and y with positive real parts. If

\displaystyle \Gamma(z) = \int_{0}^{\infty} \Theta^{z-1}e^{-\Theta} d\Theta

denotes the standard gamma function, then it is well-known and easy to check that

\displaystyle B(x,y) = \frac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)}.

Indeed, one can write

\displaystyle \Gamma(x)\Gamma(y)= \int_{0}^{\infty} \Theta^{x-1}e^{-\Theta}d\Theta \int_{0}^{\infty} \theta^{y-1}e^{-\theta}d\theta = \int_{0}^{\infty}\int_{0}^{\infty}\Theta^{x-1}\theta^{y-1}e^{-(\Theta+\theta)}d\Theta d\theta.

Using the substitutions \Theta = vt and \theta=v(1-t), we have

\displaystyle \Gamma(x)\Gamma(y) = \int_{0}^{1}t^{x-1}(1-t)^{y-1}dt \int_{0}^{\infty} v^{x+y-1}e^{-v}dv.

The latter expression is precisely B(x,y)\Gamma(x+y). It thus follows that

\displaystyle \frac{1}{t} \cdot B(m+1,1/t) = \frac{1}{t} \cdot \frac{\Gamma(m+1)\Gamma(1/t)}{\Gamma(m+1+1/t)} = \frac{1}{{m+1/t \choose m}}.

Putting these together, we conclude that for each v \in V,

\displaystyle \mathbb{P}\left(v\ \text{survives in}\ S_{\xi}\right) \geq \int_0^1 (1-\tau^k)^m d\tau = \frac1{{m+1/t\choose m}}.

By linearity of expectation, it then follows that

\displaystyle \mathbb{E}_{\xi \in \mathbb{T}^{|V|}}[|S_{\xi}|] = \sum_{v \in V} \mathbb{P}\left(v\ \text{survives in}\ S_{\xi}\right) \geq \sum_{v\in V} \frac{1}{{\deg(v)+1/k\choose\deg(v)}},

so there is some choice of \xi = (\xi_{v})_{v \in V} \in [0,1]^{|V|} for which the corresponding independent set S:=S_{\xi} has size at least \sum_{v\in V} \frac{1}{{\deg(v)+1/k\choose\deg(v)}}. This concludes the proof.