1. Why Physics-Informed?
Battery systems are governed by well-established physical principles—mass conservation, charge neutrality, diffusion, electrochemical kinetics. Traditional "black-box" neural networks ignore these constraints and must attempt to rediscover them from scratch using vast amounts of data. This approach suffers from three chronic problems that Physics-Informed Machine Learning (SciML/PINN) directly addresses.
1.1 Overcoming Data Inefficiency
High-quality aging and performance datasets for batteries are expensive and time-consuming to acquire. By encoding known physics into the loss function, we provide a powerful form of regularization. This allows the model to learn from sparse data points, as the physics "fills in the gaps." As a result, PINNs can achieve high accuracy with a fraction of the data required by purely data-driven models.
1.2 Ensuring Physical Plausibility
A purely data-driven model, trained on a limited operational range, can make wildly unphysical predictions when extrapolating. For example, it might predict a battery's capacity increasing during a high-current discharge, violating the laws of thermodynamics. PINNs solve this by constraining the entire solution space. The physics residual in the loss function acts as a "guardrail," ensuring that any prediction, even in unseen domains, must conform to the governing equations.
1.3 Enhancing Interpretability
Standard neural networks are often "black boxes," making it difficult to understand their internal reasoning. In contrast, a PINN learns a continuous surrogate model of the system's state variables. This means we can probe the trained model to visualize and extract hidden states that are difficult or impossible to measure experimentally, such as the Li-ion concentration profile across an electrode or the evolution of SEI layer thickness over time. This transforms the model from a simple predictor into a tool for scientific discovery.
2. PINN Framework — Nuts & Bolts
The core of a PINN is a standard feed-forward neural network, but its training and application are fundamentally different from traditional deep learning. It's designed not just to fit data, but to obey the laws of physics.
2.1 The Neural Network as a Function Approximator
Thanks to the Universal Approximation Theorem, a sufficiently deep neural network can approximate any smooth, continuous function. In a PINN, the network \(\text{NN}_\theta(t, \mathbf{x})\) is trained to be a surrogate for the solution of a PDE, \(u(t, \mathbf{x})\). It takes the independent variables (time \(t\), spatial coordinates \(\mathbf{x}\)) as inputs and outputs the predicted value of the solution \(u_\theta\). This approach is mesh-free and provides a solution that is continuous and differentiable everywhere.
2.2 Enforcing Physics via Collocation Points
How do we make the network obey a physical law? We define the law as a differential equation, and its residual, \(r(t, \mathbf{x})\), which should be zero for an exact solution.
\[r(t, \mathbf{x}) := \mathcal{N}[u_\theta(t, \mathbf{x})] \quad (\text{e.g., } \mathcal{N}[u] = \frac{\partial u}{\partial t} - D \frac{\partial^2 u}{\partial x^2})\]
We then sample thousands of collocation points across the entire spatio-temporal domain. These are points where we don't have data, but where we know the physics must hold. The physics loss, \(\mathcal{L}_{\text{PDE}}\), is the mean squared error of the residual at these points, driving it towards zero everywhere.
Collocation Sampling Strategies
The choice of sampling strategy for collocation points is crucial for stable training. While a uniform grid is simple, it can be inefficient. Quasi-random sequences provide much better domain coverage for the same number of points, preventing alignment with coordinate axes and ensuring a more uniform exploration of the solution space.
Advanced methods like Residual-Based Adaptive Refinement (RAR) go a step further, iteratively adding new collocation points in regions where the PDE residual is currently highest, focusing the network's attention where it's struggling the most.
2.3 Handling Boundary and Initial Conditions
Boundary (BC) and Initial (IC) conditions are critical for a unique PDE solution. PINNs can enforce them in two main ways:
Soft Constraints (Penalty Method)
This is the most common approach. We treat the BC/IC as another loss term, penalizing the model for deviating from the required values at the domain boundaries. For a boundary condition \(u(t, x_b) = g(t)\), the loss is:
\[\mathcal{L}_{\text{BC}} = \frac{1}{N_b} \sum_{i=1}^{N_b} \| u_\theta(t_i, x_{b,i}) - g(t_i) \|^2\]
This method is flexible but relies on proper weighting in the composite loss function.
Hard Constraints (By Construction)
A more elegant approach is to design the network's output to satisfy the BC/IC by construction. This is done by multiplying the raw network output by a carefully chosen function that is zero at the boundaries. For example, to enforce a Dirichlet BC \(u(t,0)=A\) and \(u(t,1)=B\) on a domain \(x \in [0,1]\), we can define a transformed output \(\hat{u}_\theta\):
\[ \hat{u}_\theta(t,x) = (1-x)A + xB + x(1-x)\text{NN}_\theta(t,x) \]
This formulation guarantees that \(\hat{u}_\theta(t,0)=A\) and \(\hat{u}_\theta(t,1)=B\) regardless of the output of \(\text{NN}_\theta\), removing the need for a boundary loss term entirely.
# PyTorch pseudocode for a hard-constraint transformation
def forward(self, t, x):
nn_output = self.network(torch.cat([t, x], dim=1))
# Enforce u(t,0)=A and u(t,1)=B
A, B = 0.0, 1.0
transformed_output = (1 - x) * A + x * B + x * (1 - x) * nn_output
return transformed_output
2.4 The Scope of Solvable PDEs
The PINN framework is highly general and can be applied to a wide range of differential equations encountered in science and engineering.
| PDE Type | Example Name | Equation | Application Area |
|---|---|---|---|
| 1D, Time-dependent | Heat/Diffusion Eq. | \(\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}\) | Li-ion diffusion in 1D |
| 2D, Steady-state | Poisson's Eq. | \(\frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} = f(x,y)\) | Electrostatics, heat distribution |
| 2D, Time-dependent | Wave Eq. | \(\frac{\partial^2 u}{\partial t^2} = c^2 \left(\frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2}\right)\) | Acoustics, electromagnetics |
| Higher-order | Navier-Stokes Eq. | \(\frac{\partial \mathbf{v}}{\partial t} + (\mathbf{v} \cdot \nabla)\mathbf{v} = -\frac{1}{\rho}\nabla p + \nu \nabla^2 \mathbf{v}\) | Fluid dynamics in electrolytes |
3. Composite Loss Design
A PINN's objective is a multi-task loss, typically a weighted sum of terms for the data, the PDE residual, and the boundary/initial conditions:
The balance between these terms is the most critical aspect of successful PINN training. If the PDE loss dominates too early, the model might ignore the data and converge to a trivial solution (e.g., zero). If the data loss dominates, the model may overfit and violate physics. Choosing the weights (\(\lambda\)) is a key challenge.
3.1 Static Weighting Strategies
- Manual Tuning: The simplest approach, but requires extensive trial and error. It's often difficult to find a single set of weights that works well throughout the entire training process.
- Scheduled Annealing: A more structured approach where weights change over time. A common strategy is to start with a high weight on the data term (\(\lambda_d\)) to anchor the model to observations, and then gradually increase the weight on the physics term (\(\lambda_p\)) to enforce the physical laws.
3.2 Adaptive Weighting Strategies
Modern approaches automate the balancing act by dynamically adjusting the weights during training based on the behavior of the gradients.
Gradient-Normalization (GradNorm)
The core idea is to prevent any single loss term from producing overwhelmingly large gradients that dominate the training updates. GradNorm dynamically adjusts the weights \(\lambda_k\) to keep the gradient norms for each loss term \(\mathcal{L}_k\) on a similar scale.
\[ \text{Goal: } \|\nabla_\theta (\lambda_k \mathcal{L}_k)\| \approx \text{Average Gradient Norm} \]
This ensures a more balanced "tug-of-war" between the different objectives, leading to more stable training.
# Pseudocode for one step of Gradient Normalization
for each loss term L_k:
grad_norm_k = compute_gradient_norm(L_k, model.parameters())
avg_grad_norm = average(all grad_norm_k)
for each loss term L_k:
loss_ratio = avg_grad_norm / grad_norm_k
# Update the weight for this loss term (with some learning rate alpha)
lambda_k = lambda_k * (loss_ratio ** alpha)
Neural Tangent Kernel (NTK) Weighting
A more advanced technique based on the insight that different loss terms can cause the network to learn at very different speeds. The Neural Tangent Kernel (NTK) can be used to estimate these learning rates. NTK-based weighting adjusts the \(\lambda\) values so that all loss terms contribute roughly equally to the training dynamics, ensuring that the model doesn't get "stuck" learning only the "easiest" part of the problem.
\[ \lambda_k(t+1) = \lambda_k(t) \exp \left( \frac{\text{diag}(\hat{\Theta})}{\text{diag}(\hat{\Theta}_k)} - 1 \right) \]
Here, \(\hat{\Theta}\) is the trace of the NTK for the total loss and \(\hat{\Theta}_k\) is for the individual loss term. This method essentially re-weights the losses so that their effective learning rates are equalized.
4. Electrochemical Equations to Encode
While simple ODEs can be encoded, the real power of PINNs in electrochemistry comes from their ability to solve systems of coupled, non-linear PDEs that describe battery behavior. The Doyle-Fuller-Newman (DFN) model, also known as the Pseudo-2D (P2D) model, is the canonical example.
4.1 The Doyle-Fuller-Newman (DFN) Model Framework
The DFN model is not a single equation, but a system of coupled equations describing ion transport and reaction kinetics across the different components of a Li-ion cell: the negative electrode (anode), separator, and positive electrode (cathode).
A Minimal DFN Model for PINNs
A PINN for a DFN model would typically solve for four key state variables: \(c_s(t,x,r)\), \(c_e(t,x)\), \(\phi_s(t,x)\), and \(\phi_e(t,x)\). The governing equations are enforced as residuals in the loss function:
| Component | Physics | Governing Equation |
|---|---|---|
| Solid Phase (Anode/Cathode) | Li-ion diffusion in active material particles (spherical coordinates). | \(\frac{\partial c_s}{\partial t} = \frac{1}{r^2} \frac{\partial}{\partial r} \left( r^2 D_s \frac{\partial c_s}{\partial r} \right)\) |
| Electrolyte Phase | Li-ion transport via diffusion and migration in the electrolyte. | \(\epsilon_e \frac{\partial c_e}{\partial t} = \frac{\partial}{\partial x} \left( D_e^{\text{eff}} \frac{\partial c_e}{\partial x} \right) + \frac{a_s(1-t_+^0)}{F} j_{\text{int}}\) |
| Interface (Pore Walls) | Electrochemical reaction kinetics. | \(j_{\text{int}} = i_0 \left( \exp\left(\frac{\alpha_a F \eta}{RT}\right) - \exp\left(-\frac{\alpha_c F \eta}{RT}\right) \right)\) |
| Potential Fields | Charge conservation (Ohm's law) in solid and electrolyte phases. | \(\nabla \cdot (\sigma_s^{\text{eff}} \nabla \phi_s) = -a_s F j_{\text{int}}\) \(\nabla \cdot (\kappa_e^{\text{eff}} \nabla \phi_e + \dots) = a_s F j_{\text{int}}\) |
4.2 Mapping Physics to the Neural Network
To implement this, we define the inputs and outputs of our neural network and the variables it needs to predict.
Variable & Unit Definitions
- \(c_s\): Li-ion concentration in solid (mol·m⁻³)
- \(c_e\): Li-ion concentration in electrolyte (mol·m⁻³)
- \(\phi_s\): Solid phase potential (V)
- \(\phi_e\): Electrolyte phase potential (V)
- \(j_{\text{int}}\): Volumetric transfer current density (A·m⁻³)
- \(\eta\): Overpotential (V)
- \(D_s, D_e\): Diffusion coefficients (m²·s⁻¹)
Network Input-Output Mapping
A single neural network can be trained to predict all state variables simultaneously. The independent variables of the system are the inputs to the network, and the state variables are the outputs.
| Role | Variables | Description |
|---|---|---|
| Network Inputs | \(t, x, r\) | Time, spatial position across the cell, and radial position within a particle. |
| Network Outputs | \(c_s, c_e, \phi_s, \phi_e\) | The four primary state variables the PINN learns to approximate. |
| Derived Quantities | \(j_{\text{int}}, \eta\), etc. | Calculated from the network outputs and their gradients using the physical equations (e.g., \(\eta = \phi_s - \phi_e - U\)). |
5. Hybrid Architectures in SciML
For very stiff or multiscale problems where pure PINNs can struggle, hybrid architectures offer robust and efficient alternatives by combining the strengths of traditional numerical solvers and neural networks.
5.1 Grey-Box Models: Learning the Unknown Physics
In many systems, we have reliable models for some physical processes but not for others. A grey-box model uses a traditional numerical solver (e.g., Finite Volume Method) for the well-understood parts and inserts a neural network to learn a difficult-to-model "closure term."
Example: Learning Battery Kinetics
Consider modeling a porous electrode. The macro-scale diffusion and migration in the electrolyte can be handled efficiently by a standard FVM solver. However, the interfacial reaction kinetics (the Butler-Volmer equation) might be complex, non-ideal, or dependent on unknown degradation states. In a grey-box approach, the FVM solver would compute the concentration and potential fields at each time step and pass them to a neural network. The NN then predicts the local reaction rate (\(j_{int}\)), which is fed back into the solver as a source term for the next time step.
This approach leverages the stability and accuracy of numerical solvers while using the flexibility of neural networks to capture complex, data-driven phenomena where first-principles models are inadequate.
5.2 Neural ODEs: Learning the Dynamics
For systems described by Ordinary Differential Equations (ODEs), particularly in time-series analysis, Neural ODEs are a powerful choice. Instead of modeling the state \(x(t)\) directly, a neural network \(f_\theta\) is used to learn its derivative, \(dx/dt = f_\theta(x,t)\). This network is then embedded within a high-quality adaptive ODE solver (like Runge-Kutta) which handles the time integration. This avoids the need for time-based collocation points and can be more robust for stiff or complex temporal dynamics.
5.3 Operator Learners: Learning the Solution Operator
Operator learners like Fourier Neural Operator (FNO) and DeepONet represent a paradigm shift. Instead of solving a single problem instance, they learn the entire solution operator \(\mathcal{G}\) that maps from a space of input functions (e.g., initial conditions, boundary conditions) to the solution function: \(u = \mathcal{G}(a)\).
After a long, offline training phase on thousands of simulation examples, a trained operator can perform inference for a *new* input function almost instantly (\(\ll 1\) second). This makes them incredibly powerful for building fast surrogate models for design optimization, uncertainty quantification, and control applications where many forward solves are needed.
5.4 Performance Trade-offs: Speed vs. Accuracy
The choice of architecture involves a trade-off between inference speed, training cost, and data requirements.
6. Training Strategies & AD
What: Training a PINN involves minimizing a composite loss function using gradient descent.
Why: Standard training can be unstable; specialized strategies for sampling and optimization are crucial for success.
How: Use Automatic Differentiation to compute PDE residuals, sample collocation points wisely, and use modern optimizers.
6.1 Automatic Differentiation (AD)
AD is the engine that makes PINNs possible. It's a technique used by frameworks like PyTorch and JAX to compute exact derivatives of any function, no matter how complex.
① The framework builds a computational graph of all operations. → ② It applies the chain rule backwards through this graph (reverse-mode AD). → ③ This yields the exact gradient of the output (e.g., \(u_\theta\)) with respect to any input (e.g., \(t, x\)) or parameter (\(\theta\)), allowing us to define PDE residuals without manual derivation.
6.2 Collocation Sampling
Smart sampling of collocation points is vital for stable and efficient training. The goal is to focus the network's attention where it's needed most.
# Generating Sobol sequence points is easy
from scipy.stats import qmc
sampler = qmc.Sobol(d=2, scramble=True)
sample = sampler.random_base2(m=10) # 2^10 = 1024 points
# ... scale sample to your domain ...
# For full code, see github.com/user/repo
6.3 Domain Decomposition
For problems with different physical domains or sharp interfaces (e.g., anode/separator/cathode), a single PINN can struggle. Domain decomposition is a powerful strategy to handle this.
① Train a separate neural network for each subdomain. → ② Enforce the primary physics (PDEs) within each subdomain as usual. → ③ Add extra loss terms that enforce continuity and flux conservation at the interfaces between the subdomains, ensuring a smooth and physically correct global solution.
6.4 Optimizer & Hyperparameter Tips
The final piece of the puzzle is the optimization algorithm and its settings.
| Hyperparameter | Typical Range | Guideline |
|---|---|---|
| Optimizer | Adam, L-BFGS | Start with Adam for global search, then fine-tune with L-BFGS. |
| Learning Rate | 1e-3 to 1e-4 | Use a scheduler (e.g., exponential decay) to reduce over time. |
| \(\lambda_{PDE}\), \(\lambda_{data}\) | 0.1 to 100 | Use adaptive methods (Ch. 3) or anneal during training. |
| Batch Size | 256 to 4096 | Larger batches give more stable gradients. |
Pre-Training Checklist
- All inputs/outputs are non-dimensionalized and scaled to \(\sim \mathcal{O}(1)\).
- Collocation points cover the entire domain, preferably with a quasi-random sequence.
- The network architecture is sufficiently deep/wide for the problem's complexity.
- Loss weights (\(\lambda\)) are set, either manually or with an adaptive scheme.
- The learning rate scheduler is configured.
PINN training stability depends on two pillars: proper scaling and balanced sampling.
7. Case Study: Hybrid Model for Battery Prognosis
This case study examines a state-of-the-art hybrid model that combines a Variational Autoencoder (VAE) with a PINN to predict battery State-of-Health (SoH) and Remaining Useful Life (RUL).
7.1 The Challenge: Predicting Battery Lifespan
Accurately predicting how a battery will degrade is extremely difficult. Every battery is slightly different due to manufacturing variations, and their aging paths are highly sensitive to usage patterns. Traditional models struggle because they either require vast amounts of run-to-failure data (which is expensive) or they fail to capture the stochastic, cell-to-cell variability.
7.2 The Approach: A VAE-PINN Hybrid
This research introduces a powerful hybrid architecture to tackle the problem by leveraging the strengths of both generative and physics-informed models.
- ① The VAE (Data Interpreter): A Variational Autoencoder first takes high-dimensional battery data (like a voltage curve from a single cycle) and compresses it into a low-dimensional latent space. This space represents the fundamental "health state" of the battery, effectively filtering out noise and capturing the unique signature of each cell.
- ② The PINN (Physics Enforcer): A Physics-Informed Neural Network then operates on this latent health space. It learns the trajectory of the health state over time, but its learning is constrained by a known physical degradation model (e.g., an empirical ODE for capacity fade). The PINN ensures that the predicted degradation path is physically plausible.
In essence, the VAE handles the "what" (what is the current health?), and the PINN handles the "how" (how does this health evolve according to physics?).
7.3 Key Findings and Impact
The results of this hybrid approach demonstrate a significant leap in battery prognosis:
- Exceptional Data Efficiency: The model accurately predicted the RUL of batteries using data from only the first 100 cycles, outperforming traditional data-driven models that require much more data.
- Robustness to Variation: By learning a probabilistic latent space, the VAE component effectively handled the inherent cell-to-cell variations, leading to more reliable predictions across a fleet of batteries.
- Accurate Long-Term Forecasting: The physics constraints from the PINN component prevented the model from making unrealistic long-term predictions, a common failure mode for pure ML models. The model could accurately forecast capacity fade far into the future.
7.4 Source
He, J., He, S., Zhang, S. et al. Variational autoencoder-enhanced physics-informed neural networks for battery state-of-health and remaining useful life prediction. Nat Commun 15, 4088 (2024). https://doi.org/10.1038/s41467-024-48779-z
8. Lab: Modeling Battery Degradation
Goal: We have sparse measurements of a battery's capacity over its cycle life. We will build a PINN to learn a continuous capacity degradation function, \(C(t)\), constrained by a simple physical law: \( \frac{dC}{dt} = -kC \), where \(k\) is an unknown degradation rate constant that the model will also learn as a parameter.
9. Practical Tips & Debug Checklist
- Non-dimensionalize Everything: Before training, scale all variables (time, space, concentrations) to be of a similar order of magnitude (\(\sim \mathcal{O}(1)\)). This is the single most important trick for stable training.
- Check Your Gradients: If training is unstable, manually inspect the magnitude of the gradients from each loss term. If one is orders of magnitude larger than the others, it will dominate. Adjust loss weights accordingly.
- Verify on a Known Problem: Before tackling an unknown system, always verify your PINN implementation on a problem with a known analytical solution to ensure the code is correct.
- Use Gradient Checkpointing: For large models or high-order derivatives, use gradient checkpointing (e.g., `torch.utils.checkpoint`) to trade compute for a significant reduction in memory usage.
10. Key References & Toolkits
- Raissi, Perdikaris, & Karniadakis, "Physics-informed neural networks...", J. Comp. Phys. 2019.
- Karniadakis et al., "Physics-informed machine learning", Nature Reviews Physics 2021.
- Lu L. et al., "DeepXDE: A deep learning library for solving differential equations", SIAM Review 2021.
- Toolkits: DeepXDE, NVIDIA SimNet, SciML.ai (Julia), SciANN.