.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at .. http://www.apache.org/licenses/LICENSE-2.0 .. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================== Logging in PySpark ================== .. currentmodule:: pyspark.logger Introduction ============ The :ref:`pyspark.logger` module facilitates structured client-side logging for PySpark users. This module includes a :class:`PySparkLogger` class that provides several methods for logging messages at different levels in a structured JSON format: - :meth:`PySparkLogger.info` - :meth:`PySparkLogger.warning` - :meth:`PySparkLogger.error` - :meth:`PySparkLogger.exception` The logger can be easily configured to write logs to either the console or a specified file. Customizing Log Format ====================== The default log format is JSON, which includes the timestamp, log level, logger name, and the log message along with any additional context provided. Example log entry: .. code-block:: python { "ts": "2024-06-28 19:53:48,563", "level": "ERROR", "logger": "DataFrameQueryContextLogger", "msg": "[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/.../spark/python/test_error_context.py:17\n", "context": { "file": "/path/to/file.py", "line": "17", "fragment": "divide" "errorClass": "DIVIDE_BY_ZERO" }, "exception": { "class": "Py4JJavaError", "msg": "An error occurred while calling o52.showString.\n: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error. SQLSTATE: 22012\n== DataFrame ==\n\"divide\" was called from\n/path/to/file.py:17 ...", "stacktrace": [ { "class": null, "method": "deco", "file": ".../spark/python/pyspark/errors/exceptions/captured.py", "line": "247" } ] }, } Setting Up ========== To start using the PySpark logging module, you need to import the :class:`PySparkLogger` from the :ref:`pyspark.logger`. .. code-block:: python from pyspark.logger import PySparkLogger Usage ===== Creating a Logger ----------------- You can create a logger instance by calling the :meth:`PySparkLogger.getLogger`. By default, it creates a logger named "PySparkLogger" with an INFO log level. .. code-block:: python logger = PySparkLogger.getLogger() Logging Messages ---------------- The logger provides three main methods for log messages: :meth:`PySparkLogger.info`, :meth:`PySparkLogger.warning` and :meth:`PySparkLogger.error`. - **PySparkLogger.info**: Use this method to log informational messages. .. code-block:: python user = "test_user" action = "login" logger.info(f"User {user} performed {action}", user=user, action=action) - **PySparkLogger.warning**: Use this method to log warning messages. .. code-block:: python user = "test_user" action = "access" logger.warning("User {user} attempted an unauthorized {action}", user=user, action=action) - **PySparkLogger.error**: Use this method to log error messages. .. code-block:: python user = "test_user" action = "update_profile" logger.error("An error occurred for user {user} during {action}", user=user, action=action) Logging to Console ------------------ .. code-block:: python from pyspark.logger import PySparkLogger # Create a logger that logs to console logger = PySparkLogger.getLogger("ConsoleLogger") user = "test_user" action = "test_action" logger.warning(f"User {user} takes an {action}", user=user, action=action) This logs an information in the following JSON format: .. code-block:: python { "ts": "2024-06-28 19:44:19,030", "level": "WARNING", "logger": "ConsoleLogger", "msg": "User test_user takes an test_action", "context": { "user": "test_user", "action": "test_action" }, } Logging to a File ----------------- To log messages to a file, use the :meth:`PySparkLogger.addHandler` for adding `FileHandler` from the standard Python logging module to your logger. This approach aligns with the standard Python logging practices. .. code-block:: python from pyspark.logger import PySparkLogger import logging # Create a logger that logs to a file file_logger = PySparkLogger.getLogger("FileLogger") handler = logging.FileHandler("application.log") file_logger.addHandler(handler) user = "test_user" action = "test_action" file_logger.warning(f"User {user} takes an {action}", user=user, action=action) The log messages will be saved in `application.log` in the same JSON format.