如何避免Node.js崩溃

Question

9 浏览2023年5月22日

匿名的 2022年10月15日

0 Comments

我几天前才开始尝试使用node.js。我意识到，每当我的程序出现未被处理的异常时，节点就会被终止。这与我接触到的常规服务器容器不同，在那里只有工作线程在出现未处理的异常时死亡，容器仍然能够接收请求。这引发了一些问题：

process.on(\'uncaughtException\')是防范这种情况的唯一有效方法吗？
process.on(\'uncaughtException\')也会在异步处理过程中执行时捕获未处理的异常吗？
是否已经构建了一个模块（例如发送电子邮件或写入文件），可以在未捕获的异常情况下发挥作用？

我会感激任何指针/文章能向我展示在node.js中处理未处理异常的常见最佳实践。

admin 更改状态以发布 2023年5月22日

0

2 答案

匿名的 · Answer 1 · 2022-10-15T20:57:58+00:00

以下是整理和整合了许多不同来源的信息，包括代码示例和选定博客文章中的引用。完整的最佳实践列表在这里

Node.JS错误处理的最佳实践

1. 使用Promise处理异步错误

TL;DR：使用回调风格处理异步错误可能是最快速通往地狱（即嵌套太多）。你能给你的代码最好的礼物是使用可靠的Promise库，提供了更紧凑、熟悉的代码语法，如try-catch。

否则：Node.JS回调风格的函数（err，response）是一种令人担心的方式，因为错误处理与非正式代码混合在一起，嵌套过多，代码模式笨拙

代码示例 - 正确的

doWork()
.then(doWork)
.then(doError)
.then(doWork)
.catch(errorHandler)
.then(verify);

代码示例反模式- 回调风格的错误处理

getData(someParameter, function(err, result){
    if(err != null)
      //do something like calling the given callback function and pass the error
    getMoreData(a, function(err, result){
          if(err != null)
            //do something like calling the given callback function and pass the error
        getMoreData(b, function(c){ 
                getMoreData(d, function(e){ 
                    ...
                });
            });
        });
    });
});

博客引用：“我们遇到了Promise的问题”
（来自关键词“Node Promises”的排名第11的博客PouchDB）

“……事实上，回调还有更阴险的问题：它们剥夺了我们栈的使用，而我们通常在编程语言中经常使用栈。没有栈编写代码就像没有刹车踏板开车一样：直到你去用时才意识到你有多么需要它，但它却不在那里。Promise的全部意义在于在我们转换为异步时重新获得语言本质:返回、抛出和栈。但是你必须知道如何正确使用Promise才能充分利用它们。”

第二点：仅使用内置的 Error 对象

简述：经常会看到将错误抛出为字符串或自定义类型的代码，这会使错误处理逻辑和模块之间的互操作性变得更加复杂。无论是拒绝承诺、抛出异常还是发出错误——使用 Node.JS 内置的 Error 对象可以增加统一性并防止错误信息的丢失。

否则：执行某个模块时，不确定会返回哪种类型的错误会使得对即将发生的异常进行推理和处理变得更加困难。甚至更糟糕的是，使用自定义类型描述错误可能会导致关键错误信息（如堆栈跟踪）丢失！

正确的代码示例：
    //throwing an Error from typical function, whether sync or async
 if(!productToAdd)
 throw new Error("How can I add new product when no value provided?");
//'throwing' an Error from EventEmitter
const myEmitter = new MyEmitter();
myEmitter.emit('error', new Error('whoops!'));
//'throwing' an Error from a Promise
 return new promise(function (resolve, reject) {
 DAL.getProduct(productToAdd.id).then((existingProduct) =>{
 if(existingProduct != null)
 return reject(new Error("Why fooling us and trying to add an existing product?"));
反面教材示例：
//throwing a String lacks any stack trace information and other important properties
if(!productToAdd)
    throw ("How can I add new product when no value provided?");
博客引用：“一个字符串不是错误”
(来源于 devthought 博客，在“Node.JS 错误对象”关键字中排名第6)

“…将字符串传递给代替错误对象会导致模块之间的互操作性降低。它会打破可能正在执行 instanceof Error 检查或想要了解更多有关错误的 API 的契约。正如我们将看到的，除了保存传递给构造函数的消息之外，现代 JavaScript 引擎的错误对象具有非常有趣的属性。”

第三点：区分操作错误与程序员错误

简述：操作错误（例如 API 接收到无效的输入）是指已知的错误情况，其中错误影响已完全理解并且可以经过深思熟虑地处理。另一方面，程序员错误（例如尝试读取未定义的变量）指的是未知的代码故障，应优雅地重新启动应用程序。

否则：当出现一个小错误（操作错误）时，你可以随时重新启动应用程序，但为什么要让约5000在线用户因为一个小错误而失望呢？相反，保持应用程序运行时，如果发生未知问题（程序员错误），可能会导致无法预测的行为。区分这两种情况可以让人们慎重行事，并根据给定的上下文采用平衡的方法。

代码示例-正确方法
    //throwing an Error from typical function, whether sync or async
 if(!productToAdd)
 throw new Error("How can I add new product when no value provided?");
//'throwing' an Error from EventEmitter
const myEmitter = new MyEmitter();
myEmitter.emit('error', new Error('whoops!'));
//'throwing' an Error from a Promise
 return new promise(function (resolve, reject) {
 DAL.getProduct(productToAdd.id).then((existingProduct) =>{
 if(existingProduct != null)
 return reject(new Error("Why fooling us and trying to add an existing product?"));
代码示例-将错误标记为操作（可信）
//marking an error object as operational 
var myError = new Error("How can I add new product when no value provided?");
myError.isOperational = true;
//or if you're using some centralized error factory (see other examples at the bullet "Use only the built-in Error object")
function appError(commonType, description, isOperational) {
    Error.call(this);
    Error.captureStackTrace(this);
    this.commonType = commonType;
    this.description = description;
    this.isOperational = isOperational;
};
throw new appError(errorManagement.commonErrors.InvalidInput, "Describe here what happened", true);
//error handling code within middleware
process.on('uncaughtException', function(error) {
    if(!error.isOperational)
        process.exit(1);
});
博客引用：“否则，你会冒着风险”（来自博客Debugable，对于关键词“Node.JS未捕获的异常”排名第3）

“......由于JavaScript中throw的工作方式的特性，几乎没有任何安全地“从你离开的地方”继续进行的方法，而不会泄漏引用或创建某种未定义的脆弱状态。对抛出的错误做出反应的最安全的方法是关闭过程。当然，在普通的Web服务器中，您可能有许多连接打开，因此不合理的是因为某人触发的错误而突然关闭这些连接。更好的方法是向触发错误的请求发送错误响应，同时让其他请求按照其正常时间完成，并停止在该worker中监听新请求。”

Number4:通过中间件集中处理错误，而不是在其中处理

TL;DR：错误处理逻辑，例如向管理员发送电子邮件和日志记录，应封装在一个专用的和集中的对象中，所有终点（例如Express中间件，cron作业，单元测试）在出现错误时调用该对象。

否则：在单独的地方不处理错误将导致代码重复，可能会导致错误被不正确地处理

代码示例 - 典型的错误流程
//DAL layer, we don't handle errors here
DB.addDocument(newCustomer, (error, result) => {
    if (error)
        throw new Error("Great error explanation comes here", other useful parameters)
});
//API route code, we catch both sync and async errors and forward to the middleware
try {
    customerService.addNew(req.body).then(function (result) {
        res.status(200).json(result);
    }).catch((error) => {
        next(error)
    });
}
catch (error) {
    next(error);
}
//Error handling middleware, we delegate the handling to the centrzlied error handler
app.use(function (err, req, res, next) {
    errorHandler.handleError(err).then((isOperationalError) => {
        if (!isOperationalError)
            next(err);
    });
});
博客引用：“有时下端无法有用地处理错误，除了将错误传递给其调用者”，引用来自博客Joyent，在关键字“Node.JS错误处理”排名第1

“...您可能会在堆栈的几个级别处理相同的错误。当下层无法有用地处理错误，只能将错误传递给其调用者，其将错误传递给其调用者，依此类推。通常，仅有顶级调用方知道适当的响应是什么，无论是重试操作，向用户报告错误还是其他某些操作。但这并不意味着您应该尝试将所有错误报告给单个顶级回调，因为该回调本身无法知道错误发生的上下文。”

Number5：使用Swagger记录API错误

TL;DR：让您的API调用者知道可能返回的错误，以便他们可以考虑周到地处理这些错误，而不会崩溃。这通常使用REST API文档框架（例如Swagger）完成

否则：API客户端可能会决定崩溃并重新启动，仅因为他收到了他无法理解的错误。注意：您的API的调用者可能是您（在微服务环境中非常典型）

博客引用：“您必须告诉您的调用者可能发生的错误”，引用来自博客Joyent，在关键字“Node.JS日志记录”排名第1

…我们已经讨论了如何处理错误，但当您编写新函数时，如何向调用您的函数的代码传递错误呢？…如果您不知道可能发生什么错误或不知道它们的含义，那么您的程序除了意外情况外，将无法正确执行。因此，如果您正在编写新函数，必须告诉调用者可能发生什么错误以及它们的含义

第六条：在陌生人到来时要优雅地关闭进程

TL;DR: 当发生未知错误（开发人员错误，见最佳实践第三条）时-对应用程序的健康状况存在不确定性。通常做法是使用“重启”工具（如Forever和PM2）小心地重新启动进程

否则：当捕获到不熟悉的异常时，某些对象可能处于故障状态（例如全局使用且由于某些内部故障不再触发事件的事件发射器）。未来的所有请求都可能失败或表现得非常疯狂。

代码示例-决定是否崩溃
//deciding whether to crash when an uncaught exception arrives
//Assuming developers mark known operational errors with error.isOperational=true, read best practice #3
process.on('uncaughtException', function(error) {
 errorManagement.handler.handleError(error);
 if(!errorManagement.handler.isTrustedError(error))
 process.exit(1)
});
//centralized error handler encapsulates error-handling related logic 
function errorHandler(){
 this.handleError = function (error) {
 return logger.logError(err).then(sendMailToAdminIfCritical).then(saveInOpsQueueIfCritical).then(determineIfOperationalError);
 }
 this.isTrustedError = function(error)
 {
 return error.isOperational;
 }
博客引用：“关于错误处理有三种思路”（来自jsrecipes博客）

…主要有三种关于错误处理的思路:1.让应用程序崩溃并重启。 2.处理所有可能的错误并永远不崩溃。 3.两者之间的平衡方法

第七条: 使用成熟的日志记录器增加错误的可见性

TL;DR: 一组成熟的日志记录工具（如Winston、Bunyan或Log4J）将加快错误发现和理解。所以请忘记使用console.log。

否则：如果不使用查询工具或良好的日志查看器，只能通过控制台日志或手动查看杂乱的文本文件，你可能要在工作中忙到很晚。

代码示例- 活跃的Winston日志记录器
//your centralized logger object
var logger = new winston.Logger({
 level: 'info',
 transports: [
 new (winston.transports.Console)(),
 new (winston.transports.File)({ filename: 'somefile.log' })
 ]
 });
//custom code somewhere using the logger
logger.log('info', 'Test Log Message with some parameter %s', 'some parameter', { anything: 'This is metadata' });
博客引用：“ 让我们确定一些记录器的要求：
（来自博客strongblog）”

…让我们确定一些记录器的要求：
1. 对每个日志行进行时间戳。这个比较容易理解，你应该能够知道每个日志条目是何时发生的。
2. 日志格式应该易于被人类及机器理解。
3. 允许多个可配置的目标流。例如，您可能会将跟踪日志写入一个文件，但是当出现错误时，将写入相同的文件，然后写入错误文件并同时发送电子邮件……

Number8：使用APM产品发现错误和停机时间

TL;DR：监控和性能产品（也称为APM）积极评估您的代码库或API，因此它们可以自动突出显示您错过的错误、崩溃和慢速部分。

否则：您可能会花费大量精力测量API性能和停机时间，可能永远不会意识到您的最慢代码部分在现实情况下是什么样子，以及这些如何影响UX。

博客引用：“APM产品细分”
（来自博客Yoni Goldberg）

“…APM产品包含三个主要细分：

1. 网站或API监控-外部服务，通过HTTP请求不断监视正常运行时间和性能。可以在几分钟内设置。以下是几个精选竞争者：Pingdom，Uptime Robot和New Relic
2. 代码仪器- 产品系列需要在应用程序中嵌入代理以受益于慢速代码检测功能、异常统计、性能监测等等。以下是几个精选竞争者：New Relic，App Dynamics
3. 运营情报仪表板- 这些产品线专注于为ops团队提供度量标准和策划内容，帮助他们轻松掌握应用程序性能。这通常涉及聚合多个信息来源（应用程序日志、DB日志、服务器日志等）和预先设计仪表板。以下是几个精选竞争者：Datadog，Splunk”

上面是一个简短的版本 - 在这里可以看到更多的最佳实践和示例。

匿名的 · Answer 2 · 2022-10-15T20:57:58+00:00

更新: Joyent现在有自己的指南。以下信息更多是一个总结:

安全地“抛出”错误

理想情况下，我们希望尽可能避免未捕获的错误，因此，我们可以使用以下其中一种方法安全地“抛出”错误，而不是字面上地抛出错误，具体取决于我们的代码架构:

对于同步代码，如果发生错误，返回错误:

// Define divider as a syncrhonous function
var divideSync = function(x,y) {
    // if error condition?
    if ( y === 0 ) {
        // "throw" the error safely by returning it
        return new Error("Can't divide by zero")
    }
    else {
        // no error occured, continue on
        return x/y
    }
}
// Divide 4/2
var result = divideSync(4,2)
// did an error occur?
if ( result instanceof Error ) {
    // handle the error safely
    console.log('4/2=err', result)
}
else {
    // no error occured, continue on
    console.log('4/2='+result)
}
// Divide 4/0
result = divideSync(4,0)
// did an error occur?
if ( result instanceof Error ) {
    // handle the error safely
    console.log('4/0=err', result)
}
else {
    // no error occured, continue on
    console.log('4/0='+result)
}

对于基于回调的(即异步的)代码，回调的第一个参数是 err，如果出现错误，err 就是错误，如果没有出现错误，err 是 null。任何其他参数都跟随 err 参数:

var divide = function(x,y,next) {
    // if error condition?
    if ( y === 0 ) {
        // "throw" the error safely by calling the completion callback
        // with the first argument being the error
        next(new Error("Can't divide by zero"))
    }
    else {
        // no error occured, continue on
        next(null, x/y)
    }
}
divide(4,2,function(err,result){
    // did an error occur?
    if ( err ) {
        // handle the error safely
        console.log('4/2=err', err)
    }
    else {
        // no error occured, continue on
        console.log('4/2='+result)
    }
})
divide(4,0,function(err,result){
    // did an error occur?
    if ( err ) {
        // handle the error safely
        console.log('4/0=err', err)
    }
    else {
        // no error occured, continue on
        console.log('4/0='+result)
    }
})

对于可能在任何地方发生错误的事件式代码，我们可以使用 error 事件，而不是抛出错误:

// Definite our Divider Event Emitter
var events = require('events')
var Divider = function(){
    events.EventEmitter.call(this)
}
require('util').inherits(Divider, events.EventEmitter)
// Add the divide function
Divider.prototype.divide = function(x,y){
    // if error condition?
    if ( y === 0 ) {
        // "throw" the error safely by emitting it
        var err = new Error("Can't divide by zero")
        this.emit('error', err)
    }
    else {
        // no error occured, continue on
        this.emit('divided', x, y, x/y)
    }
    // Chain
    return this;
}
// Create our divider and listen for errors
var divider = new Divider()
divider.on('error', function(err){
    // handle the error safely
    console.log(err)
})
divider.on('divided', function(x,y,result){
    console.log(x+'/'+y+'='+result)
})
// Divide
divider.divide(4,2).divide(4,0)

安全地“捕获”错误

有时，可能仍然有代码在某个地方抛出错误，如果我们不安全地捕获它，就会导致未捕获的异常和我们的应用程序潜在崩溃。根据我们的代码架构，我们可以使用以下其中一种方法来捕获它:

当我们知道错误发生的位置时，我们可以将该部分包装在一个 node.js 域中

var d = require('domain').create()
d.on('error', function(err){
    // handle the error safely
    console.log(err)
})
// catch the uncaught errors in this asynchronous or synchronous code block
d.run(function(){
    // the asynchronous or synchronous code that we want to catch thrown errors on
    var err = new Error('example')
    throw err
})

如果我们知道在同步代码中发生错误的位置，并且由于某些原因无法使用域（可能是由于 node 的旧版本），我们可以使用 try catch 语句:

// catch the uncaught errors in this synchronous code block
// try catch statements only work on synchronous code
try {
    // the synchronous code that we want to catch thrown errors on
    var err = new Error('example')
    throw err
} catch (err) {
    // handle the error safely
    console.log(err)
}

但是，不要在异步代码中使用 try...catch，因为异步抛出的错误不会被捕获:

try {
    setTimeout(function(){
        var err = new Error('example')
        throw err
    }, 1000)
}
catch (err) {
    // Example error won't be caught here... crashing our app
    // hence the need for domains
}

如果您确实想在异步代码中使用 try..catch，请在运行 Node 7.4 或更高版本时使用 async/await 原生地编写异步函数。

try...catch 需要注意的另一件事是在 try 语句中包装完成回调的风险，如下所示:

var divide = function(x,y,next) {
    // if error condition?
    if ( y === 0 ) {
        // "throw" the error safely by calling the completion callback
        // with the first argument being the error
        next(new Error("Can't divide by zero"))
    }
    else {
        // no error occured, continue on
        next(null, x/y)
    }
}
var continueElsewhere = function(err, result){
        throw new Error('elsewhere has failed')
}
try {
        divide(4, 2, continueElsewhere)
        // ^ the execution of divide, and the execution of 
        //   continueElsewhere will be inside the try statement
}
catch (err) {
        console.log(err.stack)
        // ^ will output the "unexpected" result of: elsewhere has failed
}

随着代码变得更加复杂，这种情况很容易发生。因此，最好使用域或返回错误来避免(1)异步代码中未捕获的异常(2)try catch捕获不想要的执行。在允许适当线程处理而不是 JavaScript 的异步事件机器风格的语言中，这个问题就不那么重要了。

最后，在未包装在域或 try catch 语句中发生未捕获错误的情况下，我们可以使用 uncaughtException 监听器使我们的应用程序不崩溃（但这样做可能会使应用程序处于未知状态）:

// catch the uncaught errors that weren't wrapped in a domain or try catch statement
// do not use this in modules, but only in applications, as otherwise we could have multiple of these bound
process.on('uncaughtException', function(err) {
    // handle the error safely
    console.log(err)
})
// the asynchronous or synchronous code that emits the otherwise uncaught error
var err = new Error('example')
throw err