520 |
|
|
|a We obtain randomized algorithms for factoring degree n univariate polynomials over $\mathbb{F}_q$ requiring $O(n^{1.5 + o(1)}\,{\rm log}^{1+o(1)} q+ n^{1 + o(1)}\,{\rm log}^{2+o(1)} q)$ bit operations. When ${\rm log}\, q < n$, this is asymptotically faster than the best previous algorithms [J. von zur Gathen and V. Shoup, Comput. Complexity, 2 (1992), pp. 187-224; E. Kaltofen and V. Shoup, Math. Comp., 67 (1998), pp. 1179-1197]; for ${\rm log}\, q \ge n$, it matches the asymptotic running time of the best known algorithms. The improvements come from new algorithms for modular composition of degree n univariate polynomials, which is the asymptotic bottleneck in fast algorithms for factoring polynomials over finite fields. The best previous algorithms for modular composition use $O(n^{(\omega + 1)/2})$ field operations, where $\omega$ is the exponent of matrix multiplication [R. P. Brent and H. T. Kung, J. Assoc. Comput. Mach., 25 (1978), pp. 581-595], with a slight improvement in the exponent achieved by employing fast rectangular matrix multiplication [X. Huang and V. Y. Pan, J. Complexity, 14 (1998), pp. 257-299]. We show that modular composition and multipoint evaluation of multivariate polynomials are essentially equivalent, in the sense that an algorithm for one achieving exponent $\alpha$ implies an algorithm for the other with exponent $\alpha + o(1)$, and vice versa. We then give two new algorithms that solve the problem near-optimally: an algebraic algorithm for fields of characteristic at most $n^{o(1)}$, and a nonalgebraic algorithm that works in arbitrary characteristic. The latter algorithm works by lifting to characteristic 0, applying a small number of rounds of multimodular reduction, and finishing with a small number of multidimensional FFTs. The final evaluations are reconstructed using the Chinese remainder theorem. As a bonus, this algorithm produces a very efficient data structure supporting polynomial evaluation queries, which is of independent interest. Our algorithms use techniques that are commonly employed in practice, in contrast to all previous subquadratic algorithms for these problems, which relied on fast matrix multiplication.
|