We're releasing PredictLM-Mini — the smallest open-weight tabular foundation model with calibrated uncertainty. 13M parameters, 54 MB on disk, runs on a laptop. It's distilled from a 26M-parameter PredictLM-Base teacher and stays statistically tied with Base on classification accuracy while losing only ~4 percentage points of R² on regression. Weights are on Hugging Face under Apache-2.0. The MCP server ships day one.
This post explains what's in the model, what it's for, what it isn't, and why we shipped the smaller checkpoint as the recommended default.
What it is
PredictLM-Mini is an in-context learning model for tabular data. You hand it a context — a small set of labeled rows — and a batch of unlabeled queries, and it returns predictions plus calibrated uncertainty in a single forward pass. No fine-tuning loop, no per-dataset hyperparameter search.
from predictlm import PredictLMClassifier
model = PredictLMClassifier.from_pretrained(
"zerooneresearch/predictlm-mini-13m"
)
preds = model.fit(X_ctx, y_ctx).predict(X_new)
probs = model.predict_proba(X_new)Same surface as scikit-learn. The fit call doesn't train — it caches the context. Inference is one forward pass over context + queries.
Architecture
- 12 layers,
d_model = 256,n_heads = 8,d_ffn = 1024 - ALBERT-style cross-layer parameter sharing (
share_factor = 2) — this is where the 13M comes from - Alternating feature-attention / datapoint-attention blocks
- BarDistribution regression head with 1024 quantile bins — full predictive distributions, not point estimates
- Yeo-Johnson preprocessing on continuous targets
The architecture is unchanged from Base; what's different is the parameter sharing and the distillation procedure that produced it.
Evaluation
Benchmarked across 25 OpenML datasets covering regression and classification, seed = 42, paired-bootstrap CIs.
| Model | Params | Reg. R² (mean) | Reg. R² (median) | Cls. acc (mean) | Cls. acc (median) |
|---|---|---|---|---|---|
| PredictLM-Base | 26M | 0.589 | 0.755 | 0.685 | 0.799 |
| PredictLM-Mini | 13M | 0.549 | 0.731 | 0.684 | 0.802 |
| XGBoost | — | 0.561 | 0.744 | 0.679 | 0.793 |
| TabPFN v2 | ~25M | 0.584 | 0.751 | 0.682 | 0.796 |
Mini is statistically tied with Base on classification (delta −0.001, 95% CI [−0.027, +0.029]) and loses 4 percentage points of mean R² on regression (CI [−6.5, −1.5]). Against XGBoost it trends ahead on classification and is within CI on regression.
Why we shipped Mini, not Base
Three reasons:
- CPU inference matters. A 54 MB checkpoint runs comfortably on an Apple M-series laptop in 2–3 seconds per prediction batch. A 105 MB checkpoint doesn't — not without GPU. The agent-native use case (LLM calls PredictLM as a tool) lives on developer machines first.
- Auditability. A 13M-parameter transformer is genuinely understandable. You can inspect attention patterns, ablate heads, and trace failures end-to-end in an afternoon.
- The cost. Distilling Mini from Base took 3.3 hours on a single Tesla T4. That cost us ~$1.30. We can iterate quickly because each downstream variant is cheap.
What this is not
- Not a general-purpose foundation model. PredictLM is trained on synthetic tabular distributions plus a small mix of real OpenML data. It will not write code, summarize text, or reason about anything that isn't a row of features.
- Not a replacement for a properly tuned XGBoost on a dataset you already understand. Where you have time to do feature engineering and run cross-validation, traditional ML usually wins by a small margin. The PredictLM advantage is the first 60 seconds on a new dataset.
- Not safety-critical alone. Calibrated uncertainty is a property of the predictive distribution, not a substitute for human review on regulated decisions. See the EU AI Act page for our position on downstream high-risk use.
What's next
- The Base checkpoint follows next week with the same Apache-2.0 license.
- A LangChain integration writeup is queued for early June.
- An on-policy context distillation paper is under submission for Q3.
Get the model: huggingface.co/zerooneresearch/predictlm-mini-13m. Get the MCP server: pip install predictlm-mcp. File issues at github.com/matej-01RAI/predictlm-mcp.