What is the polynomial regression equation?

The general form is Y = β₀ + β₁X + β₂X² + β₃X³ + … + βₚXᵖ + ε, where p is the polynomial degree and the βᵢ are the coefficients estimated from the data using least squares.

Polynomial Regression Calculator – Free Online Curve Fitting Tool with R-squared, P-Value & APA Results

📥 Step 1 · Enter Your Data

Paste your X (predictor) and Y (outcome) values, choose a sample dataset, upload a CSV/Excel file, or use manual entry. Comma-separated input is the default.

Variable Pair Name (editable)

X — Predictor Values (comma-separated)

Y — Outcome Values (comma-separated)

💡 Both lists must have the same length. Decimals (e.g., 1.23) are accepted.

Upload File

Accepted: .csv .txt .xlsx .xls — first column = X, second column = Y.

Manual Entry — type values, click "Apply"

Choose a sample dataset

A sample dataset is auto-loaded on first visit. Switching loads new X and Y values into the textareas.

⚙️ Step 2 · Configure the Polynomial

Polynomial Degree (p)

Significance Level (α)

Centre X (reduce multicollinearity)

Confidence Level for Bands

📐 Technical Notes & Formulas

Sub-section A — Formulas Used

Polynomial Model Y = β₀ + β₁X + β₂X² + β₃X³ + … + βₚXᵖ + ε Where: Y = response (outcome) variable X = predictor (independent) variable βᵢ = i-th polynomial coefficient (i = 0, 1, …, p) p = polynomial degree (the order of the polynomial) ε = error term, assumed N(0, σ²) and i.i.d.

OLS Coefficient Estimator β̂ = (XᵀX)⁻¹ XᵀY Where: β̂ = (p+1) × 1 vector of estimated coefficients X = n × (p+1) design matrix containing 1, X, X², …, Xᵖ Y = n × 1 vector of observed responses Xᵀ = transpose of X

Coefficient of Determination R² = 1 − (SSres / SStot) Adjusted R² = 1 − [(1 − R²)(n − 1) / (n − p − 1)] Where: SSres = Σ(yᵢ − ŷᵢ)² (residual sum of squares) SStot = Σ(yᵢ − ȳ)² (total sum of squares) n = sample size p = polynomial degree (number of predictors)

F-Statistic for Overall Model F = (R² / p) / [(1 − R²) / (n − p − 1)] df₁ = p, df₂ = n − p − 1 Where: F = F-statistic for testing all βᵢ = 0 (i ≥ 1) p = polynomial degree (numerator df) n = sample size n−p−1 = denominator df (residual df)

Information Criteria AIC = n · ln(SSres / n) + 2(p + 1) BIC = n · ln(SSres / n) + ln(n) · (p + 1) Where: AIC = Akaike Information Criterion (lower = better fit) BIC = Bayesian Information Criterion (lower = better; stricter penalty) ln = natural logarithm

Sub-section B — Technical Notes

Linear in parameters: Although the curve is non-linear in X, the model is linear in the βᵢ, so OLS applies directly.
Multicollinearity: X, X², X³ … are highly correlated. Centre X (subtract mean) or use orthogonal polynomials to stabilise the fit.
Overfitting: Higher degrees (p ≥ 5) almost always reduce SSres but rarely improve adjusted R² or BIC. Use BIC and CV to choose p honestly.
Extrapolation: Predictions outside the observed X range are unsafe — polynomials swing wildly at the edges.
Assumption checks: Residuals-vs-fitted (linearity / homoscedasticity), Q-Q plot (normality), Durbin-Watson (autocorrelation).
Alternatives: If the curve is biologically meaningful, prefer a non-linear model (logistic, exponential, Michaelis-Menten). If you only need flexibility, splines or LOESS often outperform high-degree polynomials.

🎯 When to Use Polynomial Regression

This free polynomial regression calculator is designed for researchers, students, and analysts who need to fit a curved (non-linear) trend to two-variable data using ordinary least squares. It answers the question: "Does a polynomial of degree p fit my X-Y data better than a straight line, and how well?"

Decision Checklist

Your X (predictor) is continuous
Your Y (outcome) is continuous
A scatter plot shows a clear curved pattern (U, inverted-U, S-shape, or wavy)
You have at least 10–20 observations per polynomial term
You want a flexible smoother for description, not a mechanistic model
Do NOT use if a straight line already fits well — use simple linear regression
Do NOT use if you have a known non-linear functional form — use non-linear regression instead
Do NOT extrapolate beyond the observed range of X

Real-World Examples

Agriculture / Ecology — Modelling crop yield as a function of fertiliser dose, where yield rises, plateaus, and falls (inverted-U).
Biology / Physiology — Reaction time across the human lifespan: fast in young adults, slower in children and the elderly (U-shape, quadratic).
Pharmacology — Drug response curves where therapeutic effect rises with dose, peaks, then drops at toxic levels (cubic).
Economics / Marketing — Diminishing returns of advertising spend on revenue (concave quadratic).
Wildlife Ecology — Animal activity over time of day (bimodal or sinusoidal-like patterns approximated by a quartic).

Sample Size Guidance

Quadratic (p = 2): n ≥ 20
Cubic (p = 3): n ≥ 30–40
Quartic (p = 4): n ≥ 50–60
Higher (p ≥ 5): n ≥ 100, and only if a clear scientific case exists

Decision Tree

Two continuous variables, X and Y
  └─ Scatter plot is approximately linear?
        ├─ Yes → Simple Linear Regression
        └─ No (curved)?
              ├─ Curve is biologically/physically known? → Non-linear Regression
              ├─ Smooth wavy curve, no theory? → Polynomial Regression (THIS TOOL)
              └─ Highly local bumps? → Splines or LOESS

📚 How to Use This Polynomial Regression Calculator (10 Steps)

Enter Your Data. Type or paste comma-separated X and Y values, upload a CSV/Excel file, or use the Manual Entry table. Example: X = "10, 15, 20, 25, …", Y = "8, 14, 22, 33, …".
Choose a Sample Dataset. Five datasets are built-in — start with the Plant Growth vs. Temperature curve (auto-loaded) for an inverted-U shape.
Configure the Polynomial. Choose degree (2–6), α (0.01 / 0.05 / 0.10), centring (recommended), and CI level (90 / 95 / 99 %).
Run the Analysis. Click "▶ Run Polynomial Regression". Results stream in immediately.
Read the Summary Cards. Green = significant; amber = borderline; red = not significant. R², adjusted R², F, and p are shown at a glance.
Read the Full Output. The coefficients table reports each βᵢ with SE, t, and p. The ANOVA panel shows the global F-test.
Examine Both Plots. The fitted polynomial curve shows the model overlay; the residuals-vs-fitted plot shows whether assumptions hold.
Check Assumptions. Linearity-of-residuals, normality, homoscedasticity, and autocorrelation are auto-flagged with pass / warn / fail badges.
Read the Interpretation. The dynamic interpretation paragraphs translate every number into plain English; the five writing-style cards generate APA, thesis, plain-language, abstract, and pre-registration text.
Export Your Results. Download Doc (.txt) for fast pasting into a manuscript, Download PDF for a print-ready report.

❓ Frequently Asked Questions

Q1. What is polynomial regression?

Polynomial regression is a form of regression analysis that models the relationship between an independent variable X and a dependent variable Y as an n-th degree polynomial. It is still a linear model in its parameters and is fitted using ordinary least squares.

Q2. When should I use polynomial regression instead of linear regression?

Use polynomial regression when a scatter plot of your data shows a clear curved (non-linear) pattern — for example, a U-shape, inverted U, or an S-shape — that a straight line cannot capture. A residuals-vs-fitted plot showing a systematic curve is also a strong signal.

Q3. What polynomial degree should I choose?

Start with degree 2 (quadratic) and increase only if the residuals still show a pattern AND the adjusted R² and BIC improve meaningfully. Higher degrees (≥ 5) often overfit and produce wild swings outside the observed data range.

Q4. How do I interpret R-squared in polynomial regression?

R² is the proportion of variance in Y explained by the polynomial model, ranging from 0 to 1. R² = 0.85 means the polynomial explains 85% of the variation in Y. Always also report adjusted R², which penalises adding higher-degree terms.

Q5. What does the polynomial regression equation look like?

The general form is Y = β₀ + β₁X + β₂X² + β₃X³ + … + βₚXᵖ + ε, where p is the polynomial degree and the βᵢ are coefficients estimated from the data using least squares.

Q6. How is polynomial regression different from non-linear regression?

Polynomial regression is linear in its parameters (the βᵢ enter the model linearly), even though the fitted curve is non-linear in X. True non-linear regression (e.g., Y = a · exp(bX)) is non-linear in the parameters themselves and requires iterative numerical optimisation.

Q7. What sample size do I need?

A practical rule is 10–20 observations per parameter estimated. A cubic model has 4 parameters (intercept + 3 slopes), so aim for n ≥ 40–80. Smaller samples raise the risk of overfitting and unstable coefficient estimates.

Q8. How do I report polynomial regression in APA format?

Report the polynomial degree, the equation with all coefficients, R², adjusted R², F-statistic with degrees of freedom, the overall p-value, and a residuals diagnostic. Example: A quadratic regression was fitted, Y = 2.1 + 0.8X + 0.05X², F(2, 17) = 24.6, p < .001, R² = .74.

Q9. Can polynomial regression be used for prediction?

Yes, but only within the observed range of X. Extrapolating beyond the observed range is risky because polynomial curves can swing dramatically outside the data, producing implausible predictions.

Q10. What are the assumptions of polynomial regression?

The same OLS assumptions as linear regression: linearity in parameters, independence of residuals, homoscedasticity (constant variance), normality of residuals, and absence of severe multicollinearity (mitigated by centering X or using orthogonal polynomials).

📖 References

The following references support the statistical methods used in this polynomial regression calculator, covering R-squared interpretation, p-value reporting, polynomial curve fitting, and best practices in regression analysis.

Draper, N. R., & Smith, H. (1998). Applied regression analysis (3rd ed.). Wiley. https://doi.org/10.1002/9781118625590
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models (5th ed.). McGraw-Hill/Irwin. Publisher page
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Routledge. https://doi.org/10.4324/9780203774441
Faraway, J. J. (2014). Linear models with R (2nd ed.). CRC Press. https://doi.org/10.1201/b17144
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning (2nd ed.). Springer. https://doi.org/10.1007/978-1-0716-1418-1
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. Wiley. https://doi.org/10.1002/0471725153
Royston, P., & Sauerbrei, W. (2008). Multivariable model-building: A pragmatic approach to regression analysis based on fractional polynomials. Wiley. https://doi.org/10.1002/9780470770771
American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). https://doi.org/10.1037/0000165-000
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Virtanen, P., Gommers, R., Oliphant, T. E., et al. (2020). SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261–272. https://doi.org/10.1038/s41592-020-0772-5
NIST/SEMATECH. (2013). e-Handbook of statistical methods. National Institute of Standards and Technology. https://www.itl.nist.gov/div898/handbook/
Harrell, F. E. Jr. (2015). Regression modeling strategies (2nd ed.). Springer. https://doi.org/10.1007/978-3-319-19425-7