在过去的几十年里,最初由 Wolfram Mathematica 引入的计算笔记已经发展到可以支持科学研究、探索和教育工作流程了。自然,它们还支持数据科学工作流,并且诸如 Jupyter notebooks 和 Databricks notebooks 已经成为了一个很好的工具,它们提供了简单且直观的交互计算环境,能够结合代码来分析富文本数据,并将其可视化来讲述数据故事。这些笔记本是作为提供现代科学交流和创新的最终媒介而设计的。不过,在近几年,我们发现一种趋势:笔记本成为运行生产质量类型的代码媒介,其中这些代码通常用于驱动企业运营。我们看到笔记本平台供应商宣传他们的探索笔记本在生产中的使用。这是一个好的期望——对数据科学家来说,简易化编程却没有得到很好的实现,并且牺牲了可扩展性、可维护性、弹性以及一个长线产品代码所需支持的所有其他品质。因此我们不推荐生产化的笔记本,而是鼓励对数据科学家赋能,使其能够使用正确的编程框架构建预生产代码,从而简化持续交付工具以及端到端机器学习平台的抽象复杂性。
Jupyter Notebooks have gained in popularity among data scientists who use them for exploratory analyses, early-stage development and knowledge sharing. This rise in popularity has led to the trend of productionizing Jupyter Notebooks, by providing the tools and support to execute them at scale. Although we wouldn't want to discourage anyone from using their tools of choice, we don't recommend using Jupyter Notebooks for building scalable, maintainable and long-lived production code — they lack effective version control, error handling, modularity and extensibility among other basic capabilities required for building scalable, production-ready code. Instead, we encourage developers and data scientists to work together to find solutions that empower data scientists to build production-ready machine learning models using continuous delivery practices with the right programming frameworks. We caution against productionization of Jupyter Notebooks to overcome inefficiencies in continuous delivery pipelines for machine learning, or inadequate automated testing.