lal.spark package

Submodules

lal.spark.model module

class lal.spark.model.LALGBSparkBinaryClassifier(**kwargs)

Bases: lal.spark.model._LALModelBase

This is when our training labels are binary.

predict(**kwargs)

We choose most probable label our samples in the testing dataset has.

Parameters
  • sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset

  • sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset

Returns

predict_proba(**kwargs)

This predicts the probability of our test data having any of the available labels in the training dataset

Parameters
  • sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset

  • sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset

Returns

class lal.spark.model.LALGBSparkCategoricalClassifier(**kwargs)

Bases: lal.spark.model._LALModelBase

predict(**kwargs)

We choose most probable label our samples in the testing dataset has.

Parameters
  • sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset

  • sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset

Returns

predict_proba(**kwargs)

This predicts the probability of our test data having any of the available labels in the training dataset

Parameters
  • sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset

  • sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset

Returns

class lal.spark.model.LALGBSparkMultiBinaryClassifier(**kwargs)

Bases: lal.spark.model._LALSparkMultiBase

This is our Multioutput Binary Classifier, where our training labels are all binary.

predict_proba(sdf1, sdf2)
task_base

alias of LALGBSparkBinaryClassifier

class lal.spark.model.LALGBSparkMultiCategoricalClassifier(**kwargs)

Bases: lal.spark.model._LALSparkMultiBase

This is our Multioutput Categorical Classifier, where our training labels are all binary.

predict_proba(sdf1, sdf2)
task_base

alias of LALGBSparkCategoricalClassifier

class lal.spark.model.LALGBSparkMultiRegressorClassifier(**kwargs)

Bases: lal.spark.model._LALSparkMultiBase

This is our Multioutput Regressor, where our training labels are all binary.

task_base

alias of LALGBSparkRegressor

class lal.spark.model.LALGBSparkRegressor(**kwargs)

Bases: lal.spark.model._LALModelBase

This is when our training labels are continuous.

predict(**kwargs)

We predict the possible value our testing dataset will have, based on the continuous variables.

Parameters
  • sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset

  • sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset

Returns

lal.spark.nn module

class lal.spark.nn.KNNCosineMatcher(k)

Bases: lal.spark.nn._CosineDistance, lal.spark.nn._KNNMatcherBase

This is the K-Nearest Neighbor algorithm with the cosine distance measure.

class lal.spark.nn.KNNMahalanobisMatcher(k)

Bases: lal.spark.nn._MahalanobisDistance, lal.spark.nn._KNNMatcherBase

This is the K-Nearest Neighbor algorithm with the mahalanobis distance measure.

class lal.spark.nn.KNNPowerMatcher(p, k)

Bases: lal.spark.nn._PowerDistance, lal.spark.nn._KNNMatcherBase

This is the K-Nearest Neighbor algorithm with the p-norm distance measure.

lal.spark.weights module

class lal.spark.weights.GBMWeightBinaryClassifier(**kwargs)

Bases: lal.spark.weights._LGBMWeightsBase

This object will derive the feature importance weights based on binary output. It will optimize the Gradient Boosting Model based on a classification metric, and then return the featureImportances based on the most optimized result.

class lal.spark.weights.GBMWeightMultiClassifier(**kwargs)

Bases: lal.spark.weights._LGBMWeightsBase

This object will derive the feature importance weights based on a multiclass output. It will optimize the Gradient Boosting Model based on a classification metric, and then return the featureImportances based on the most optimized result.

class lal.spark.weights.GBMWeightRegressor(**kwargs)

Bases: lal.spark.weights._LGBMWeightsBase

This object will derive the feature importance weights based on a continuous output. It will optimize the Gradient Boosting Model based on a regression metric, and then return the featureImportances based on the most optimized result.

Module contents