lal.spark package¶
Submodules¶
lal.spark.model module¶
-
class
lal.spark.model.
LALGBSparkBinaryClassifier
(**kwargs)¶ Bases:
lal.spark.model._LALModelBase
This is when our training labels are binary.
-
predict
(**kwargs)¶ We choose most probable label our samples in the testing dataset has.
- Parameters
sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset
sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset
- Returns
-
predict_proba
(**kwargs)¶ This predicts the probability of our test data having any of the available labels in the training dataset
- Parameters
sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset
sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset
- Returns
-
-
class
lal.spark.model.
LALGBSparkCategoricalClassifier
(**kwargs)¶ Bases:
lal.spark.model._LALModelBase
-
predict
(**kwargs)¶ We choose most probable label our samples in the testing dataset has.
- Parameters
sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset
sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset
- Returns
-
predict_proba
(**kwargs)¶ This predicts the probability of our test data having any of the available labels in the training dataset
- Parameters
sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset
sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset
- Returns
-
-
class
lal.spark.model.
LALGBSparkMultiBinaryClassifier
(**kwargs)¶ Bases:
lal.spark.model._LALSparkMultiBase
This is our Multioutput Binary Classifier, where our training labels are all binary.
-
predict_proba
(sdf1, sdf2)¶
-
task_base
¶ alias of
LALGBSparkBinaryClassifier
-
-
class
lal.spark.model.
LALGBSparkMultiCategoricalClassifier
(**kwargs)¶ Bases:
lal.spark.model._LALSparkMultiBase
This is our Multioutput Categorical Classifier, where our training labels are all binary.
-
predict_proba
(sdf1, sdf2)¶
-
task_base
¶ alias of
LALGBSparkCategoricalClassifier
-
-
class
lal.spark.model.
LALGBSparkMultiRegressorClassifier
(**kwargs)¶ Bases:
lal.spark.model._LALSparkMultiBase
This is our Multioutput Regressor, where our training labels are all binary.
-
task_base
¶ alias of
LALGBSparkRegressor
-
-
class
lal.spark.model.
LALGBSparkRegressor
(**kwargs)¶ Bases:
lal.spark.model._LALModelBase
This is when our training labels are continuous.
-
predict
(**kwargs)¶ We predict the possible value our testing dataset will have, based on the continuous variables.
- Parameters
sdf1 (pyspark.sql.dataframe.DataFrame) – The training dataset
sdf2 (pyspark.sql.dataframe.DataFrame) – The testing dataset
- Returns
-
lal.spark.nn module¶
-
class
lal.spark.nn.
KNNCosineMatcher
(k)¶ Bases:
lal.spark.nn._CosineDistance
,lal.spark.nn._KNNMatcherBase
This is the K-Nearest Neighbor algorithm with the cosine distance measure.
-
class
lal.spark.nn.
KNNMahalanobisMatcher
(k)¶ Bases:
lal.spark.nn._MahalanobisDistance
,lal.spark.nn._KNNMatcherBase
This is the K-Nearest Neighbor algorithm with the mahalanobis distance measure.
-
class
lal.spark.nn.
KNNPowerMatcher
(p, k)¶ Bases:
lal.spark.nn._PowerDistance
,lal.spark.nn._KNNMatcherBase
This is the K-Nearest Neighbor algorithm with the p-norm distance measure.
lal.spark.weights module¶
-
class
lal.spark.weights.
GBMWeightBinaryClassifier
(**kwargs)¶ Bases:
lal.spark.weights._LGBMWeightsBase
This object will derive the feature importance weights based on binary output. It will optimize the Gradient Boosting Model based on a classification metric, and then return the featureImportances based on the most optimized result.
-
class
lal.spark.weights.
GBMWeightMultiClassifier
(**kwargs)¶ Bases:
lal.spark.weights._LGBMWeightsBase
This object will derive the feature importance weights based on a multiclass output. It will optimize the Gradient Boosting Model based on a classification metric, and then return the featureImportances based on the most optimized result.
-
class
lal.spark.weights.
GBMWeightRegressor
(**kwargs)¶ Bases:
lal.spark.weights._LGBMWeightsBase
This object will derive the feature importance weights based on a continuous output. It will optimize the Gradient Boosting Model based on a regression metric, and then return the featureImportances based on the most optimized result.