Skip to main content
Hollerith mirrors the scikit-learn estimator API. There is no training step. fit hands Hollerith your labeled table in context, and predict scores new rows against it.

fit

fit accepts a labeled DataFrame and the name of the target column.
from hollerith import Hollerith
import pandas as pd

clf = Hollerith()
df = pd.read_csv("iris.csv")

clf.fit(df, target="species")   # in-context, data is purged after the job
The task (classification vs. regression) is inferred from the target column.

predict

predict scores new, unlabeled rows. Output is row-aligned with your input.
preds = clf.predict(df.drop(columns="species"))
print(preds[:5])
# -> ['setosa' 'setosa' 'versicolor' 'virginica' 'setosa']
predict blocks while the job runs. The first call after the service has been idle may take longer while the worker warms up. The SDK waits through it automatically (pass on_warming=... to surface a “warming up” message).

evaluate

Pass evaluate=True to fit to get a held-out metric on your own labeled data. Hollerith splits your set (hold-out or k-fold, chosen by size) and returns one metric.
clf.fit(df, target="species", evaluate=True)
print(clf.evaluation_)
# -> AUC 0.99  ·  5-fold CV on 150 rows

Good to know

  • Ephemeral data. Training rows and rows to score are purged once the job finishes; only hashes and counts are retained for metering.
  • Usage & billing. The console Usage tab shows recent jobs, calls/rows/cells metered this period, and included usage vs. overage spend.
  • Traceable errors. Every API error carries a requestId. Quote it to support.