Skip to main content
Hollerith is a tabular foundation model trained by Monarcha. It is a transformer pretrained to predict directly from tables: fit places your labeled rows in the model’s context, and predict scores new rows against them in a forward pass. No gradient updates happen on your data.

Limits

A job is one fit + predict (or fit + evaluate) call. Jobs above these limits are rejected with a clear error before any compute is spent.
LimitValue
Max rows per job1,000,000
Max columns (features) per job500
Supported targetsclassification and regression
Feature typesnumerical, categorical, missing values handled natively
Rows counts the labeled training rows plus the rows you score in the same call. Classification supports up to 10 classes natively; higher-cardinality targets are handled automatically via hierarchical classification. Per-day limits (reset at 00:00 UTC):
LimitValue
Calls per day10,000
Rows per day1,000,000
Usage is metered in cells, where one cell is a single value (rows × columns for a job). Your plan includes a monthly cell allowance (the free tier includes 1,000,000 cells per month); usage beyond the allowance is billed as overage on paid plans. Track it on the Usage tab.

Where Hollerith is strong

  • Structured business tables. Churn, conversion, risk, pricing, quality control. Mixed numerical and categorical columns, missing values, no preprocessing required.
  • Cold starts and fast iteration. Strong accuracy on a fresh dataset in one call, with no tuning loop. This is where tabular foundation models beat gradient boosting most clearly.
  • Many small to mid-sized prediction problems. One API instead of training and maintaining a separate model per table, per segment, or per customer.

Known limitations

  • Free-text columns. Text columns are accepted but treated as categorical codes (ordinal encoding): the model sees category identity, not meaning. Heavy natural-language columns will underperform until embedding support ships.
  • Time series. Hollerith predicts row-wise. It does not model temporal structure, seasonality, or autocorrelation. You can include time-derived features (lag values, day of week), but it is not a forecasting model.
  • Very wide tables. Accuracy and latency degrade as feature count grows. Past a few hundred informative features, select features first.
  • Domain feature engineering still matters. Hollerith cannot invent signal that is not in the columns. Ratios, aggregates, and joins that encode domain knowledge improve results just as they do for any model.
  • Latency scales with table size. Inference cost grows with the number of context rows. Large fits take longer, and the first call after idle pays a warm-up cost.
  • Distribution shift. Predictions assume new rows are drawn from the same distribution as the training table. Re-fit when your data drifts; fits are cheap, so this is easy to do often.
If your problem hits several of these at once, a tuned gradient boosting pipeline may still win. Run evaluate on your own data and compare; that is what it is for.