FFT and NTT

Multiplication of polynomials as list of coefficients.

Consider:
$$
\begin{align*}
A(x) &= a_0 + a_1 x + \cdots + a_n x^n = \sum_{i=0}^{n}a_i x^i \\
B(x) &= b_0 + b_1 x + \cdots + b_n x^n = \sum_{i=0}^{n}b_i x^i
\end{align*}
$$
To calculate:
$$
\begin{align*}
C(x) &= A(x) B(x) = c_0 + c_1 x + \cdots + c_{2n}x^{2n} =\sum_{i=0}^{2n}c_i x^i \\
\text{where }c_k &= a_0 b_k + a_1 b_{k-1} + \cdots + a_k b_0 = \sum_{i=0}^k a_i b_{k-i}
\end{align*}
$$

Complexity: For $c_k \rightarrow O(k)$, for $C(x) \rightarrow O(n^2)$

Can we do it faster? Like $O(n \log n)$?

Multiplication of polynomials as list of values.

Consider: A degree-$n$ polynomial is uniquely characterized by its values at any $n+1$ distinct points $x_0, x_1, \cdots, x_d$.

That is, if we fix any distinct $n+1$ points, we can specify any degree-$n$ polynomial $A(x) = a_0 + a_1 x + \cdots + a_d x^n$ by the $n+1$ values:
$$
A(x_0), A(x_1), \cdots, A(x_n)
$$
Then we can have:
$$
\begin{align*}
C(x) &= A(x) B(x) \\
C(x_i) &= A(x_i) B(x_i) \\
\text{where } i&=0,1,\cdots,2n
\end{align*}
$$
Polynomial multiplication is $O(n)$ using this representation!

But how can we get the coefficients of $C(x)$ ?

Evaluation and Interpolation

Evaluation: $a_0, a_1, \cdots, a_n \rightarrow A(x_0), A(x_1), \cdots, A(x_n)$

$$
A(x) = M_n(x) \cdot a \\
\begin{bmatrix}
A(x_0) \\
A(x_1) \\
\vdots \\
A(x_{n})
\end{bmatrix}=
\begin{bmatrix}
1 & x_0 & x_0^2 & \cdots & x_0^n \\
1 & x_1 & x_1^2 & \cdots & x_1^n \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
1 & x_n & x_n^2 & \cdots & x_n^n
\end{bmatrix}
\cdot
\begin{bmatrix}
a_0 \\
a_1 \\
\vdots \\
a_{n}
\end{bmatrix}
$$

Interpolation: $A(x_0), A(x_1), \cdots, A(x_n) \rightarrow a_0, a_1, \cdots, a_n$
$$
a = M_n^{-1}(x) \cdot A(x) \\
\begin{bmatrix}
a_0 \\
a_1 \\
\vdots \\
a_n
\end{bmatrix}=
\begin{bmatrix}
1 & x_0 & x_0^2 & \cdots & x_0^n \\
1 & x_1 & x_1^2 & \cdots & x_1^n \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
1 & x_n & x_n^2 & \cdots & x_n^n
\end{bmatrix} ^{-1}
\cdot
\begin{bmatrix}
A(x_0) \\
A(x_1) \\
\vdots \\
A(x_n)
\end{bmatrix}
$$

Here $M_n$ is Vandermonde matrix.

Consider $C(x_i) = A(x_i) B(x_i)$, from the two matrix multiplication we can know that: If we have every $A(x_i)$ and $B(x_i)$ we can use $O(n)$ to multiply and get $C(x_i)$, then transfer it into every coefficient $c_i$.

But, using matrix multiplication directly will need time complexity of $O(n^2)$. Can we optimize it to $O(n \log n)$? Yes, and here comes Fast Fourier Tranform (FFT) and Number Theoretic Tranform (NTT).

FFT

Let’s calculate evaluation using FFT first.

Consider a polynomial $P(x)$ with length of even number $n$ s.t.
$$
\begin{align*}
P(x) &= a_0 + a_1 x + a_2 x^2 + \cdots + a_{n-1} x^{n-1} \\
&= P_{\text{even}}(x^2) + x \cdot P_{\text{odd}}(x^2) \\
\text{where } P_{\text{even}}(x) &= a_0 + a_2 x + a_4 x^2 + \cdots + a_{n-2} x^{n/2} \\
P_{\text{odd}}(x) &= a_1 + a_3 x + a_5 x^2 + \cdots + a_{n-1} x^{n/2}
\end{align*}
$$

By doing this, we divide the the length $n$ to two length $n/2$. Then, we have to conquer them each and then combine them together.

Let $A_k$ denote $A(x_k)$.

The question is: How can we divide $A_k$?

And here’s the most essencial part: We use primitive root $\omega$.

Now, think about an complex equation:
$$
z^n = (re^{i\theta})^n = 1
$$
We can get the solutions:
$$
\begin{align*}
r &= 1 \\
\theta &= \frac{2k\pi}{n} \\
\text{where } k &= 0, 1, \cdots, n-1
\end{align*}
$$
Thus,
$$
\begin{align*}
\omega_n &= e^{\frac{2\pi}{n}i} \\
\omega_n^k &= e^{\frac{2\pi}{n}ki} = e^{i\theta} \\
(\omega_n^k)^n &= 1
\end{align*}
$$
To each $k$, we want to calculate:
$$
\begin{align*}
A_k &= P(\omega_n^k) \\
&= P_{\text{even}}(\omega_n^{2k}) + \omega_n^k \cdot P_{\text{odd}}(\omega_n^{2k}) \\
&= P_{\text{even}}(\omega_{n/2}^{k}) + \omega_n^k \cdot P_{\text{odd}}(\omega_{n/2}^{k})
\end{align*}
$$
Since
$$
\omega_n^{2k} = e^{\frac{2\pi}{n} 2ki} = e^{\frac{2\pi}{n/2} ki} = \omega_{n/2}^k
$$
This is called the Discrete Fourier Tranform (DFT) of $A_k$.

And we found that:
$$
\begin{align*}
A_{k+n/2} &= P(\omega_n^{k+n/2}) \\
&= P_{\text{even}}(\omega_{n/2}^{k+n/2}) + \omega_n^{k+n/2} \cdot P_{\text{odd}}(\omega_{n/2}^{k+n/2}) \\
&= P_{\text{even}}(\omega_{n/2}^{k} \cdot \omega_{n/2}^{n/2}) + \omega_n^k \cdot \omega_n^{n/2} \cdot P_{\text{odd}}(\omega_{n/2}^{k} \cdot \omega_{n/2}^{n/2}) \\
&= P_{\text{even}}(\omega_{n/2}^{k}) - \omega_n^k \cdot P_{\text{odd}}(\omega_{n/2}^{k})
\end{align*}
$$
Since
$$
\omega_n^n = e^{\frac{2\pi}{n}ni} = e^{2\pi i} = 1 \\
\omega_n^{n/2} = e^{\frac{2\pi}{n} (n/2)i} = e^{\pi i} = -1
$$

Combining $A_k$ and $A_{k+n/2}$ together for $0 \leq k < n/2$, we can do D&C here! Since $P_{\text{even}}(\omega_{n/2}^{k})$ and $P_{\text{odd}}(\omega_{n/2}^{k})$ can calculate from the half DFT.

The time complexity is $T(n) = 2T(n/2) + O(n) = O(n \log n)$. We get the whole evaluation.

IFFT

To do interpolation, we need Inverse Fast Fourier Tranfrom (IFFT).

Inverse FFT is easy to explain using Linear Algebra.

It’s easy since:
$$
a = M_n^{-1}(\omega) \cdot A(\omega) = \frac{1}{n} M_n(\omega^{-1}) \cdot A(\omega)
$$
And it’s over.

Let $A(\omega) = \text{FFT}(a, \omega)$, then $a = \frac{1}{n} \text{FFT}(A, \omega^{-1})$.

The Whole Progress

Filled $A(x)$ and $B(x)$ to length of $n=2^k$ by $0$ due to easier FFT D&C.
Calculate $A(\omega^k)$ and $B(\omega^k)$ for each $k=0,1,\cdots,n-1$, where $\omega = e^{\frac{2\pi}{n}i}$ using FFT.
Calculate $C(\omega^k) = A(\omega^k) \cdot B(\omega^k)$.
Calculate $c_i$ using IFFT.

C++ Code (FFT)

#include <bits/stdc++.h>
#define MAXN 3000001
#define LL long long
#define INF 0x7fffffff
using namespace std;
const LL MOD = 998244353;
const double Pi=acos(-1.0);

int l, r[MAXN];
int limit = 1;

vector <complex <double>> FFT(vector <complex <double>> F, int type) {
    for(int i = 0; i < limit; i++) 
        if(i < r[i]) swap(F[i], F[r[i]]);
    for(int mid = 1; mid < limit; mid <<= 1) {
        complex <double> Wn(cos(Pi/mid), type*sin(Pi/mid));
        for(int R = mid<<1, j = 0; j < limit; j += R) {
            complex <double> w(1, 0);
            for(int k = 0; k < mid; k++, w = w*Wn) {
                complex <double> x = F[j+k], y = w*F[j+mid+k];
                F[j+k] = x + y;
                F[j+mid+k] = x - y;
            }
        }
    }
    return F;
}

vector <complex <double>> polyMul(vector <complex <double>> A, vector <complex <double>> B) {
    limit = 1, l = 0;
    int n = A.size()-1, m = B.size()-1;
	while(limit <= n + m) limit <<= 1, l++;
    A.resize(limit+1);
    B.resize(limit+1);
	for(int i = 0; i < limit; i++) r[i] = (r[i>>1]>>1)|((i&1)<<(l-1)) ;
	A = FFT(A, 1); 
    B = FFT(B, 1);
    vector <complex <double>> C(limit+1);
	for(int i = 0; i < limit; i++) C[i] = A[i] * B[i];
	C = FFT(C, -1);
	for(int i = 0; i <= n + m; i++) C[i] = (int)(C[i].real()/limit + 0.5);
    return C;
}

int main() {
    int n, m;
	cin >> n >> m;
    vector <complex <double>> A(n+1), B(m+1);
	for(int i = 0; i <= n; i++) cin >> A[i];
	for(int i = 0; i <= m; i++) cin >> B[i];
    // cout << "FUCK" << endl;
	vector <complex <double>> C = polyMul(A, B);
	for(int i = 0; i <= n + m; i++) cout << int(C[i].real()) << " ";
	cout << endl;
    return 0;
}

NTT

Motivation

Still, we want something that can support:

$$
\begin{align*}
P(x) &= a_0 + a_1 x + a_2 x^2 + \cdots + a_{n-1} x^{n-1} \\
&= P_{\text{even}}(x^2) + x \cdot P_{\text{odd}}(x^2) \\
\text{where } P_{\text{even}}(x) &= a_0 + a_2 x + a_4 x^2 + \cdots + a_{n-2} x^{n/2} \\
P_{\text{odd}}(x) &= a_1 + a_3 x + a_5 x^2 + \cdots + a_{n-1} x^{n/2}
\end{align*}
$$

and

$$
\begin{align*}
A_{k+n/2} &= P(\omega_n^{k+n/2}) \\
&= P_{\text{even}}(\omega_{n/2}^{k+n/2}) + \omega_n^{k+n/2} \cdot P_{\text{odd}}(\omega_{n/2}^{k+n/2}) \\
&= P_{\text{even}}(\omega_{n/2}^{k} \cdot \omega_{n/2}^{n/2}) + \omega_n^k \cdot \omega_n^{n/2} \cdot P_{\text{odd}}(\omega_{n/2}^{k} \cdot \omega_{n/2}^{n/2}) \\
&= P_{\text{even}}(\omega_{n/2}^{k}) - \omega_n^k \cdot P_{\text{odd}}(\omega_{n/2}^{k})
\end{align*}
$$

This is based on
$$
\omega_n^n = 1 \\
\omega_n^{n/2} = -1
$$

This can make us do $O(n \log n)$ D&C, which is fast.

Modular Primitive Roots

We pick a prime $p$ such that
$$
p = k \cdot n + 1
$$

So that
$$
g^n \equiv 1 \pmod{p}
$$

has exactly $n$ distinct solutions. Here $g$ is a primitive root modulo $p$.

Thus, we can define the primitive $n$-th root of unity modulo $p$
$$
\omega_n = g^{(p-1)/n} \pmod{p}
$$

Which satisfies:
$$
\begin{align*}
\omega_n^n &\equiv 1 \pmod{p} \\
\omega_n^k &\not\equiv 1 \pmod{p} \\ &\text{for}\ \ 0 < k < n \\
\omega_n^{n/2} &\equiv -1 \pmod{p}
\end{align*}
$$

And we can evaluate:
$$
A_k = P(\omega_n^k) \pmod{p}
$$

NTT Constant

We usually use $p = 998244353$ and $g = 3$ for NTT. Because
$$
998244353 = 119 \cdot 2^{23} + 1
$$

And $3$ is the primitive root modulo $998244353$, which means

The detail of understanding why $\omega_n^{n/2} \equiv -1 \pmod{p}$ needs knowledge of multipicative group, and why $3$ is the primitive root modulo $998244353$ needs knowledge of Euler’s Function. Quite hard for this section.

INTT

Just like FFT, inverse NTT is obtained by using the inverse primitive root $\omega^{-1}$ and multiplying by $n^{-1} \pmod{p}$

Let
$$
A(\omega) = \text{NTT}(a, \omega)
$$

Then the inverse is
$$
a = \frac{1}{n} \text{NTT}(A, \omega^{-1}) \pmod{p}
$$

C++ Code (NTT)

The whole progress is highly similar to FFT, so the code is based on FFT’s code.

#include <bits/stdc++.h>
#define MAXN 3000001
#define LL long long
#define INF 0x7fffffff
using namespace std;
const LL MOD = 998244353;
const LL G = 3;

LL fastPow(LL a, LL n, LL m = MOD) {
	LL ans = 1;
	while(n) {
		if(n & 1) ans *= a, ans %= m;
		a *= a, a %= m, n >>= 1;
	}
	return ans % m;
}

int l, r[MAXN];
int limit = 1;

vector <LL> NTT(vector <LL> F, int type) {
    for(int i = 0; i < limit; i++) 
        if(i < r[i]) swap(F[i], F[r[i]]);
    for(int mid = 1; mid < limit; mid <<= 1) {
        LL Wn = fastPow(G, (MOD-1)/(mid<<1));
        if(type == -1) Wn = fastPow(Wn, MOD-2);
        for(int R = mid<<1, j = 0; j < limit; j += R) {
            LL w = 1;
            for(int k = 0; k < mid; k++, w = w*Wn % MOD) {
                LL x = F[j+k], y = w*F[j+mid+k] % MOD;
                F[j+k] = (x+y) % MOD;
                F[j+mid+k] = (x-y+MOD) % MOD;
            }
        }
    }
    if(type == -1) {
        LL inv = fastPow(limit, MOD-2);
        for(int i = 0; i < limit; i++) F[i] = F[i]*inv % MOD;
    }
    return F;
}

vector <LL> polyMulNTT(vector <LL> A, vector <LL> B) {
    limit = 1, l = 0;
    int n = A.size()-1, m = B.size()-1;
	while(limit <= n + m) limit <<= 1, l++;
    A.resize(limit+1);
    B.resize(limit+1);
	for(int i = 0; i < limit; i++) r[i] = (r[i>>1]>>1)|((i&1)<<(l-1)) ;
	A = NTT(A, 1); 
    B = NTT(B, 1);
    vector <LL> C(limit+1);
	for(int i = 0; i < limit; i++) C[i] = A[i]*B[i] % MOD;
	C = NTT(C, -1);
    return C;
}

int main() {
    int n, m;
	cin >> n >> m;
    vector <LL> A(n+1), B(m+1);
	for(int i = 0; i <= n; i++) cin >> A[i];
	for(int i = 0; i <= m; i++) cin >> B[i];
	vector <LL> C = polyMulNTT(A, B);
	for(int i = 0; i <= n + m; i++) cout << C[i] % MOD << " ";
	cout << endl;
    return 0;
}