AB实验的高端玩法系列1 - HelloWorld开发者社区

一直以来机器学习希望解决的一个问题就是'what if'，也就是决策指导：

这类问题之所以难以解决是因为ground truth在现实中是观测不到的，一个已经服了药的患者血压降低但我们无从知道在同一时刻如果他没有服药血压是不是也会降低。

这个时候做分析的同学应该会说我们做AB实验！我们估计整体差异，显著就是有效，不显著就是无效。但我们能做的只有这些么？

当然不是！因为每个个体都是不同的！整体无效不意味着局部群体无效！

以下方法从不同的角度尝试解决这个问题，但基本思路是一致的：我们无法观测到每个用户的treatment effect,但我们可以找到一群相似用户来估计实验对他们的影响。

我会在之后的博客中，从CasualTree的第二篇Recursive partitioning for heterogeneous causal effects开始梳理下述方法中的异同。

整个领域还在发展中，几个开源代码都刚release不久，所以这个博客也会持续更新。如果大家看到好的文章和工程实现也欢迎在下面评论～

Uplift Modelling/Causal Tree

Nicholas J Radcliffe and Patrick D Surry. Real-world uplift modelling with significance based uplift trees. White Paper TR-2011-1, Stochastic Solutions, 2011.[文章链接]
Rzepakowski, P. and Jaroszewicz, S., 2012. Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems, 32(2), pp.303-327.[文章链接]
Yan Zhao, Xiao Fang, and David Simchi-Levi. Uplift modeling with multiple treatments and general response types. Proceedings of the 2017 SIAM International Conference on Data Mining, SIAM, 2017. [文章链接] [Github链接]
Athey, S., and Imbens, G. W. 2015. Machine learning methods for
estimating heterogeneous causal effects. stat 1050(5) [文章链接]
Athey, S., and Imbens, G. 2016. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of
Sciences. [文章链接] [Github链接]
C. Tran and E. Zheleva, “Learning triggers for heterogeneous treatment effects,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019 [文章链接] [Github链接]

Wager, S. & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association .
M. Oprescu, V. Syrgkanis and Z. S. Wu. Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning (ICML), 2019 [文章链接] [GitHub链接]

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey. Double Machine Learning for Treatment and Causal Parameters. ArXiv e-prints [文章链接] [Github链接]
V. Chernozhukov, M. Goldman, V. Semenova, and M. Taddy. Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels. ArXiv e-prints, December 2017.
V. Chernozhukov, D. Nekipelov, V. Semenova, and V. Syrgkanis. Two-Stage Estimation with a High-Dimensional Second Stage. 2018.
X. Nie and S. Wager. Quasi-Oracle Estimation of Heterogeneous Treatment Effects. arXiv preprint arXiv:1712.04912, 2017.[文章连接]
D. Foster and V. Syrgkanis. Orthogonal Statistical Learning. arXiv preprint arXiv:1901.09036, 2019 [文章链接]

C. Manahan, 2005. A proportional hazards approach to campaign list selection. In SAS User Group International (SUGI) 30 Proceedings.
Green DP, Kern HL (2012) Modeling heteroge-neous treatment effects in survey experiments with Bayesian additive regression trees. Public OpinionQuarterly 76(3):491–511.
Sören R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 2019. [文章链接] [GitHub链接]

Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating individual treatment effect: generalization bounds and algorithms. Proceedings of the 34th International Conference on Machine Learning (ICML 2017).[文章链接]
Alaa, A. M., Weisz, M., & van der Schaar, M. (2017). Deep Counterfactual Networks with Propensity-Dropout. ArXiv E-Prints, arXiv:1706.05966.[文章链接]
Shi, C., Blei, D. M., & Veitch, V. (2019). Adapting Neural Networks for the Estimation of Treatment Effects. ArXiv:1906.02120
[文章链接] [Github链接]

最早就是uber的博客在茫茫paper的海洋中帮我找到了方向，如今听说它们AI LAB要解散了有些伤感，作为HTE最多star的开源方，它们值得拥有一个part

Shuyang Du, James Lee, Farzin Ghaffarizadeh, 2017, Improve User Retention with Causal Learning [文章连接]
Zhenyu Zhao, Totte Harinen, 2020, Uplift Modeling for Multiple Treatments with Cost [文章连接]
Will Y. Zou, Smitha Shyam, Michael Mui, Mingshi Wang, 2020, Learning Continuous Treatment Policy and Bipartite Embeddings for Matching with Heterogeneous Causal Effects
Optimization [文章链接]
Will Y. Zou,Shuyang Du,James Lee,Jan Pedersen, 2020, Heterogeneous Causal Learning for Effectiveness Optimization
in User Marketing [文章连接]

持续更新中 ~

本文同步分享在博客"有温度的Data Science~"（CNBlog）。
如有侵权，请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一起分享。