A 'secret weapon' that has served me very well for learning classifiers is to first learn a good linear classifier. I am almost hesitant to give this away (kidding).
Use the non-thresholded version of that linear classifier output as one additional feature-dimension over which you learn a decision tree. Then wrap this whole thing up as a system of boosted trees (that is, with more short trees added if needed).
One of the reasons why it works so well, is that it plays to their strengths:
(i) Decision trees have a hard time fitting linear functions (they have to stair-step a lot, therefore need many internal nodes) and
(ii) linear functions are terrible where equi-label regions have a recursively partitioned structure.
In the decision tree building process the first cut would usually be on the synthetic linear feature added, which would earn it the linear classifier accuracy right away, leaving the DT algorithm to work on the part where the linear classifier is struggling. This idea is not that different from boosting.
One could also consider different (random) rotations of the data to form a forest of trees build using steps above, but was usually not necessary. Or rotate the axes so that all are orthogonal to the linear classifier learned.
One place were DT struggle is when the features themselves are very (column) sparse, not many places to place the cut.
show comments
lokimedes
When I worked at CERN around 2010, Boosted Decision Trees were the most popular classifier, exactly due to the (potential for) explainability along with its power of expression.
We had a cultural aversion for neural networks back then, especially if the model was used in physics analysis directly.
Times have changed…
Fun fact - single bit neural networks are decision trees.
In theory, this means you can 'compile' most neural networks into chains of if-else statements but it's not well understood when this sort of approach works well.
show comments
hkbuilds
Decision trees are underrated in the age of deep learning. They're interpretable, fast, and often good enough.
I've been using a scoring system for website analysis that's essentially a decision tree under the hood. Does the site have a meta description? Does it load in under 3 seconds? Is it mobile responsive? Each check produces a score, the tree aggregates them. Users understand why they got their score because the logic is transparent.
Try explaining why a neural network rated their website 73/100. Decision trees make that trivial.
show comments
jebarker
The killer feature of DTs is how fast they can be. I worked very hard on a project to try and replace DT based classifiers with small NNs in a low latency application. NNs could achieve non-trivial gains in classification accuracy but remained two orders of magnitude higher latency at inference time.
zelphirkalt
Decision trees are great. My favorite classical machine learning algorithm or group of algorithms, as there are many slight variations of decision trees. I wrote a purely functional (kind of naive) parallelized implementation in GNU Guile: https://codeberg.org/ZelphirKaltstahl/guile-ml/src/commit/25...
Why "naive"? Because there is no such thing as NumPy or data frames in the Guile ecosystem to my knowledge, and the data representation is therefore probably quite inefficient.
show comments
kqr
Experts' nebulous decision making can often be modelled with simple decision trees and even decision chains (linked lists). Even when the expert thinks their decision making is more complex, a simple decision tree better models the expert's decision than the rules proposed by the experts themselves.
I've long dismissed decision trees because they seem so ham-fisted compared to regression and distance-based clustering techniques but decision trees are undoubtedly very effective.
See more in chapter seven of the Oxford Handbook of Expertise. It's fascinating!
I worked (professionally) on a product a few years ago based upon decision tree and random forest classifiers. I had no background in the math and had to learn this stuff which has payed dividends as llms and AI have become hyped. This is one of the best explanations I've seen and has me super nostalgic for that project.
Gonna try to cook up something personal. It's amazing how people are now using regression models basically all the time and yet no-one uses these things on their own.
show comments
xmprt
Interesting website and great presentation. My only note is that the color contrast of some of the text makes it hard to read.
show comments
ssttoo
I just wish we’d stop with the “unreasonable” click-bite. Cheapens an otherwise excellent article, like “7 x (number 6 will surprise you)” of yesteryear
moi2388
That was beautifully presented!
EGreg
Isn’t that exactly how humans (and even animals) operate?
Human societies look for actual major correlations and establish classifications. Except with scientific-minded humans, we often also want, to know the why behind the correlations. David Hume got involved w that… https://brainly.com/question/50372476
Let me ask a provocative question. What, ultimately, is the difference between knowledge and bias?
show comments
bobek
Wow. This page is actually a product of LLM [0]. So they can produce useful stuff after all :)
A 'secret weapon' that has served me very well for learning classifiers is to first learn a good linear classifier. I am almost hesitant to give this away (kidding).
Use the non-thresholded version of that linear classifier output as one additional feature-dimension over which you learn a decision tree. Then wrap this whole thing up as a system of boosted trees (that is, with more short trees added if needed).
One of the reasons why it works so well, is that it plays to their strengths:
(i) Decision trees have a hard time fitting linear functions (they have to stair-step a lot, therefore need many internal nodes) and
(ii) linear functions are terrible where equi-label regions have a recursively partitioned structure.
In the decision tree building process the first cut would usually be on the synthetic linear feature added, which would earn it the linear classifier accuracy right away, leaving the DT algorithm to work on the part where the linear classifier is struggling. This idea is not that different from boosting.
One could also consider different (random) rotations of the data to form a forest of trees build using steps above, but was usually not necessary. Or rotate the axes so that all are orthogonal to the linear classifier learned.
One place were DT struggle is when the features themselves are very (column) sparse, not many places to place the cut.
When I worked at CERN around 2010, Boosted Decision Trees were the most popular classifier, exactly due to the (potential for) explainability along with its power of expression. We had a cultural aversion for neural networks back then, especially if the model was used in physics analysis directly. Times have changed…
Random forests on the same site: https://mlu-explain.github.io/random-forest/
Fun fact - single bit neural networks are decision trees.
In theory, this means you can 'compile' most neural networks into chains of if-else statements but it's not well understood when this sort of approach works well.
Decision trees are underrated in the age of deep learning. They're interpretable, fast, and often good enough.
I've been using a scoring system for website analysis that's essentially a decision tree under the hood. Does the site have a meta description? Does it load in under 3 seconds? Is it mobile responsive? Each check produces a score, the tree aggregates them. Users understand why they got their score because the logic is transparent.
Try explaining why a neural network rated their website 73/100. Decision trees make that trivial.
The killer feature of DTs is how fast they can be. I worked very hard on a project to try and replace DT based classifiers with small NNs in a low latency application. NNs could achieve non-trivial gains in classification accuracy but remained two orders of magnitude higher latency at inference time.
Decision trees are great. My favorite classical machine learning algorithm or group of algorithms, as there are many slight variations of decision trees. I wrote a purely functional (kind of naive) parallelized implementation in GNU Guile: https://codeberg.org/ZelphirKaltstahl/guile-ml/src/commit/25...
Why "naive"? Because there is no such thing as NumPy or data frames in the Guile ecosystem to my knowledge, and the data representation is therefore probably quite inefficient.
Experts' nebulous decision making can often be modelled with simple decision trees and even decision chains (linked lists). Even when the expert thinks their decision making is more complex, a simple decision tree better models the expert's decision than the rules proposed by the experts themselves.
I've long dismissed decision trees because they seem so ham-fisted compared to regression and distance-based clustering techniques but decision trees are undoubtedly very effective.
See more in chapter seven of the Oxford Handbook of Expertise. It's fascinating!
I am surprised r2d3's visual intro is not referenced here (https://r2d3.us/visual-intro-to-machine-learning-part-1/). I think it was the first (if not first, maybe most impactful) example for scroll triggered explainers.
I worked (professionally) on a product a few years ago based upon decision tree and random forest classifiers. I had no background in the math and had to learn this stuff which has payed dividends as llms and AI have become hyped. This is one of the best explanations I've seen and has me super nostalgic for that project.
Gonna try to cook up something personal. It's amazing how people are now using regression models basically all the time and yet no-one uses these things on their own.
Interesting website and great presentation. My only note is that the color contrast of some of the text makes it hard to read.
I just wish we’d stop with the “unreasonable” click-bite. Cheapens an otherwise excellent article, like “7 x (number 6 will surprise you)” of yesteryear
That was beautifully presented!
Isn’t that exactly how humans (and even animals) operate?
Human societies look for actual major correlations and establish classifications. Except with scientific-minded humans, we often also want, to know the why behind the correlations. David Hume got involved w that… https://brainly.com/question/50372476
Let me ask a provocative question. What, ultimately, is the difference between knowledge and bias?
Wow. This page is actually a product of LLM [0]. So they can produce useful stuff after all :)
[0]: https://news.ycombinator.com/item?id=47195123