lightgbm why introduct lightgbm what is lightgbm GOSS EFB GBDT GOSS implementaion

In recent years, with the emergence of big data (in terms of both the number of features and the number of instances), GBDT is facing new challenges, especially in the tradeoff between accuracy and efficiency.

why introduct lightgbm

A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming.

what is lightgbm

GOSS: Gradient-based One-Side Sampling
EFB: Exclusive Feature Bundling

GOSS

ince the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size.

EFB

With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features.

GBDT

GBDT is an ensemble model of decision trees,
In each iteration, GBDT learns the decision trees by fitting the negative gradients (also known as residual errors).

The main cost in GBDT lies in learning the decision trees, and the most time-consuming part in learning a decision tree is to find the best split points

GOSS implementaion

GOSS keeps all the instances with large gradients and performs random sampling on the instances with small gradients.