Common Problems and Diagnosis
At the bottom of this page you’ll find tables of common SCHISM error messages and their solutions.
If you encounter an error message that is not listed, please report it to the developers with the error message and any relevant details about your run (e.g., input files, command used to run, etc.). This will help us improve the documentation and assist other users who may encounter the same issue in the future. Or if you’re comfortable editing the documentation, you can add the error message and solution to the appropriate table yourself. See the Contributing section for more details on how to contribute to the documentation.
Where does SCHISM write error messages?
If SCHISM is running and encounters an error, it will write the error message to the outputs/fatal.error file. If this file has any content, then you can guarantee the run has failed.
At times you’ll get errors in the job submission system (e.g., SLURM) that are not written to the fatal.error file. In this case, you can check the job’s standard output and error files for any messages.
Common Problems
SIGSEGV or segmentation fault
This can be due to low disk space or insufficient memory. On linux use ulimit -s unlimited to allow for unlimited stack size. If the problem persists, check the fatal.error file for more details.
The runs seems very slow compared to normal
Dry boundary
Dry boundary will appear in the fatal.error file and will say something along the lines of “ABORT: STEP: wetted cross section length on open bnd”.
This most often appears in Clifton Court and in the Yolo Toedrain boundaries. The issue is when the boundary is trying to take water out of the system (positive value) and it cannot because of insufficent water available in the boundary cell. Most boundaries are artificially deepened to avoid this, but the error can occur when running a non-historic/hindcast simulation and the Yolo Toedrain boundary is trying to replicate the tidal boundary in the wrong timing with the rest of the system. Similarly if the Clifton Court SWP allocated diversion is out of sync with the water levels in the system you’ll need to run the ccf_gate utility in bdschism to properly adjust the timing of the gates.
SCHISM Errors & Solutions
File with Error |
Error Message |
Solution/Notes |
|---|---|---|
<RUNDIR>outputsfatal.error |
0: ABORT: init: nc dim mismatch, 410577 540011 951102 23 2 410577 540011 951102 2 2 |
The run is looking for 2 dimensions (per the run) and the hotstart file has 3 (23 nVert) |
<RUNDIR>outputsfatal.error |
0: ABORT: init: nc dim mismatch, 459056 547088 1006697 23 2 459056 547088 1006697 23 3 |
The run is looking for 3 tracers and finding 2 in the hotstart file |
<RUNDIR>outputsfatal.error |
122: ABORT: you must have sflux_inputs_file! |
SCHISM wants a file named sflux_inputs.txt under the sflux directory. Contents: &sflux_inputs
/
|
<RUNDIR>outputsfatal.error |
0: ABORT: MISC: elev2D.th.nc |
Ended up being a capitalization issue (my filename was elev2d.th.nc Could have been that file was missing, run ‘gen_elev2d’ bdschism utility |
<RUNDIR>outputsfatal.error |
0: ABORT: STEP: tracer nudging nc(2) |
Nudging data has run out. This could be that your obs-coastal run has finished running and you need to restart and switch to coastal only (see nudging warm-up documentation) |
<RUNDIR>outputsfatal.error |
30: ABORT: Inconsistent # of nodes at open boundary 13 50 n 58 |
This is an error at the Yolo boundary which has 58 nodes in hgrid but bctides doesn’t reflect the correct number of nodes at that boundary. The Yolo boundary is boundary #13 and looks for 50 nodes but finds 58 in hgrid Solution would be to change 50 to 58 in the bctides.in.2d and 3d files, OR to run the prepare_schism bctides utility |
<RUNDIR>outputsfatal.error |
0: ABORT: Wind scaling factor must be positive: 200175 -2.00000000000000 |
Info from File convert_polygons --input windfactor.yaml --output windfactor/windfactor.shp
convert_polygons --input windfactor/windfactor.shp --output windfactor/windfactor.yaml
Then copy the edited polygon into the |
<RUNDIR>outputsfatal.error |
26: ABORT: no appropriate time exists for: sflux_air_1 time_now = 2454983.33333333 first time available = 2458246.00000000 last time available = 2458484.95833331 got_bracket = F got_suitable_bracket = F |
when make_links_full.py was ran the byear/eyear needs to be set appropriately for the model time period |
<RUNDIR>outputsfatal.error |
|
Clifton Court Forebay (CCF) is drying out. From |
<RUNDIR>outputsfatal.error |
112: ABORT: STEP: wetted cross section length on open bnd <=0; boundary ndx= 14 , length= 0.000000000000000E+000 |
By looking at bctides you can see that the 14th index is north_bay. Checking the flux.th for this timestep & the 13th column (no flux for ocean) |
<RUNDIR>outputsfatal.error |
0: ABORT: step: time_series in elev2D.th.nc (2) |
elev2d.th.nc has time indices that aren’t compatible with the run period being requested in param.nml |
<RUNDIR>outputsfatal.error |
46: ABORT: MAIN: orphaned block face: 1 |
This was an issue of a flow line crossing a no-grid space and then entering another new grid element (e.g., when there’s a gap between the channel and an island). Needed to modify this line in |
<RUNDIR>outputsfatal.error |
49: ABORT: EVAL_CUBIC: Falied to find: 1 NaN NaN NaN |
this turned out to be an issue of vgrid.in.3d not having the appropriate number of layers. Check your minmax layer function that’s used in prepare_schism to generate the vgrid.in.3d file. I had some that were 0. This can also be checked with visit looking at nlayer.gr3 |
<RUNDIR>outputsfatal.error |
IBILINEAR: no roots / QUICKSEARCH: Cannot find a vert. level: |
This stops after running for a while. In |
<RUNDIR>outputsfatal.error |
82: ABORT: Check block nodes (0): |
Possible causes: (1) hotstart file wasn’t made for this grid - need to re-run hotstart; (2) error arises in |
<RUNDIR>outputsfatal.error |
86: ABORT: STEP: wrong sign vsink 1 1 1.9980000E-02 9.9900002E-03 |
vsink needs to have a negative sign at all times. Even zeros need to be “-0” |
<RUNDIR>outputsfatal.error |
0: ABORT: MISC: uv3D.th dt wrong |
The time variable in uv3D.th is not correct. be sure your param.nml aligns with the gen_uv3d output |
<RUNDIR>outputsfatal.error |
21: ABORT: Left out horizontal flux (3): 173 2 |
Check that the uv3D.th file is large enough (should be hundreds of MBs). If it’s not large, then your interpolate_variables or gen_uv3d command did not succeed |
<RUNDIR>outputsfatal.error |
166: ABORT: INIT: ntrs(3)<=0 |
This was an issue of calling the wrong SCHISM binaries. The error happened when I called pschism_PREC_EVAP_GOTM_GEN_TVD-VL and should’ve ran pschism_PREC_EVAP_GOTM_TVD-VL SCHISM is looking for three tracers (Salinity, temperature, and a generic tracer) and is not finding an initializaiton designation. |
<RUNDIR>outputsfatal.error |
107: ABORT: Impossible 82: 10848.9750000000 |
This issue arises from when I’m trying to run a hotstart from the staout files. I think this was because the nsteps_from_cold wasn’t correct. There was a mismatch between time and nsteps_from_cold and iths. Once those were fixed and it was a proper multiple of ihot step and step_nu_tr then it worked. |
<RUNDIR>outputsfatal.error |
0: ABORT: MISC: elev time_series2 |
??? |
<RUNDIR>outputsfatal.error |
0: ABORT: INIT: illegal sav_: 48412 2.000000000000000E-002 -957.588019930000 |
The sav_*.yaml files might not have been suitable for your grid. Check for extreme values interpolated from -999 (NA) |
<RUNDIR>outputsfatal.error |
0: ABORT: wtiminc < dt |
This had to do with the analysis module |
<RUNDIR>outputsfatal.error |
51: ABORT: Tr. obc nudging factor wrong: 3.00000000000000 1 |
This is with a run using generic tracers ( 3 0 1 1 2 2 ! coyote
1.
0.074318
1.
0. ! Generic Tracer Concentration
! Generic Tracer Relax
|
<RUNDIR>outputsfatal.error |
0: ABORT: Check tracer_nudge.gr3 |
Could be mismatch of |
<RUNDIR>outputsfatal.error |
0: ABORT: MISC: vsource.th start time wrong |
|
<RUNDIR>outputsfatal.error |
0: ABORT: init: nc time1 |
something is wrong with the initialization of the hotstart.nc file. Try recombining hotstarts and resubmitting simulation |
<RUNDIR>outputsfatal.error |
260: ABORT: STEP: station elev error 136 -6.7100000000000000 |
From |
<RUNDIR>outputsfatal.error |
63: ABORT: init: hotstart.nc not found |
The odd thing is the |
<RUNDIR>outputsfatal.error |
2: ABORT: EQSTATE: Impossible dry (7): -Infinity Infinity 1 50225 |
This was after running the barotropic then running the baroclinic model from an interpolated |
Batch Job Submission: <RUNDIR>*.log |
0: ABORT: MSGP: need at least 1 compute process |
Likely an mpi issue. Check your job submission files for tasks, nodes, etc. |
Batch Job Submission: <RUNDIR>*.log |
forrtl: severe (24): end-of-file during read, unit 251, file …../simulations/suisun_lhc_1/.//outputs/staout_1 |
This typically means that the |
Batch Job Submission: <RUNDIR>*.log |
forrtl: severe (24): end-of-file during read, unit 9, file …/schism/azure_dsp_2024_lhc_v3/simulations/cache_lhc_5/.//outputs/flux.out Image PC Routine Line Source pschism_PREC_EVAP 00000000006BDB73 Unknown Unknown Unknown pschism_PREC_EVAP 000000000059E52B other_hot_init_ 222 misc_subs.F90 pschism_PREC_EVAP 0000000000483A33 schism_init_ 7029 schism_init.F90 pschism_PREC_EVAP 00000000004102FE MAIN__ 112 schism_driver.F90 pschism_PREC_EVAP 000000000041012D Unknown Unknown Unknown libc-2.28.so 000014E681019D85 __libc_start_main Unknown Unknown pschism_PREC_EVAP 000000000041004E Unknown Unknown Unknown srun: error: cn01: task 0: Exited with exit code 24 |
This is typically due to the |
Batch Job Submission: <RUNDIR>*.log |
forrtl: severe (59): list-directed I/O syntax error, unit 32, file …./schism/roundtrip/suisun-suisun/.//fluxflag.prop |
Need to make sure the fluxflag.prop matches your hgrid.gr3 |
Batch Job Submission: <RUNDIR><RUNNAME>.e<RUNNUMBER> (ex: v111_noage.e2568) |
forrtl: severe (59): list-directed I/O syntax error, unit 51, file …./if_202211_agelessdemo/.//flux.th |
Change |
Batch Job Submission: <RUNDIR><RUNNAME>.e<RUNNUMBER> (ex: v111_noage.e2568) |
forrtl: severe (29): file not found, unit 3011, file ….././/¡9‰A_tra¡9‰Aulve¡9‰A ¡9‰A.th |
Not positive that this is the solution, but this error arose when I was running baroclinic after the internal obs nudging period was done, and the TEM/SAL_nu_roms.nc files with ROMS only files may have been corrupted. I think my machine restarted while writing those files. So I re-created them and this resolved the issue. |
Batch Job Submission: <RUNDIR><RUNNAME>.e<RUNNUMBER> (ex: v111_noage.e2568) |
forrtl: severe (29): file not found, unit 9, file …./schism_repos/if_202212_jonesdemo/.//outputs/flux.out |
The simulation must be a hot start run, ihot = 2 in param.nml, because it’s looking for flux.out to build on. The error message suggests that flux.out under outputs is not found. A hot start run needs a few output files from a previous run. One of them is flux.out , a flow output file. |
Batch Job Submission: varspoolmail<USERNAME> |
PBS Job Id: 2564.hpc4 Job Name: v111_noage Post job file processing error; job 2564.hpc4 on host cn01 |
Storage issue on HPC4 |
Batch Job Submission: varspoolmail<USERNAME> |
PBS Job Id: 2569.hpc4 Job Name: v111_noage Execution terminated Exit_status=255 |
See next error |
Batch Job Submission: console |
qsub: directive error: -M <email_address>, <username>@localhost |
This could be that your # Change the number of cores you want to use.
# This needs to agree with what you are asking above in the PBSPro setting.
n_cores=144
cd $PBS_O_WORKDIR
module load schism/5.10_intel2022.1
mpiexec --rsh=ssh -n $n_cores bash ./schism.sh
|
Batch Job Submission: console |
Job id Name User Time Use S Queue —————- —————- —————- ——– - —– 2984.hpc4 v112_RAL_pAge_T <username> 0 Q workq |
Won’t leave this condition, Q=”queued”. Fix by running |
Batch Job Submission: console (from qstat) |
Job id Name User Time Use S Queue —————- —————- —————- ——– - —– 3424.hpc4 DSP_baseline_20 <username> 00:00:02 R workq |
Stuck on this with no progress in |
not noted |
[vsource.th missing] |
Need a vsource.th file. Check if your file is mis-named, or if you need to generate the elapsed file. |
SCHISM Utility Errors & Solutions
Function |
Error Message |
Solution/Notes |
|---|---|---|
?? |
FutureWarning: ‘+init=<authority>:<code>’ syntax is deprecated. ‘<authority>:<code>’ is the preferred initialization method. |
?? |
combine_hotstart7 |
forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read Image PC Routine Line Source combine_hotstart7 000000000041BCF9 for__io_return Unknown Unknown combine_hotstart7 0000000000436CEC for_read_int_lis_ Unknown Unknown combine_hotstart7 00000000004359F3 for_read_int_lis Unknown Unknown combine_hotstart7 0000000000416CDC Unknown Unknown Unknown combine_hotstart7 000000000040B482 Unknown Unknown Unknown combine_hotstart7 000000000040B3A2 Unknown Unknown Unknown libc-2.17.so 00002AE015D983D5 __libc_start_main Unknown Unknown combine_hotstart7 000000000040B2A9 Unknown Unknown Unknown |
Need to go to ifort -O2 -cpp -CB -mcmodel=medium -assume byterecl -g -traceback -o combine_hotstart7.exe ../UtilLib/argparse.f90 combine_hotstart7.f90 -I$NETCDF/include -I$NETCDF_FORTRAN/include -L$NETCDF_FORTRAN/lib -L$NETCDF/lib -lnetcdf -lnetcdff
Note that your |
combine_hotstart7 |
combine_hotstart7.exe: error while loading shared libraries: libnetcdf.so.13: cannot open shared object file: No such file or directory |
Restart shell and reload module |
combine_hotstart7 |
forrtl: severe (29): file not found, unit 10, file SIMULATIONDIR/outputs/local_to_global_000000 Image PC Routine Line Source combine_hotstart7 0000000000426F7C Unknown Unknown Unknown combine_hotstart7 000000000040C569 combine_hotstart7 83 combine_hotstart7.f90 combine_hotstart7 000000000040C0AB MAIN__ 35 combine_hotstart7.f90 combine_hotstart7 000000000040BFCD Unknown Unknown Unknown libc-2.28.so 00001532DA1ACD85 __libc_start_main Unknown Unknown combine_hotstart7 000000000040BEEE Unknown Unknown Unknown |
This is an issue with Azure not uploading the blob |
combine_hotstart7 |
istep= 751200 Global quantities: 456254 380408 837202 nvars= 0 32764 static info: 1.929871795483076E-316 0 0 -1827807652 pre- and post-comb conflict |
For me this was an issue that I was asking for a hotstart from a timestep that didn’t exist. |
combine_hotstart7 |
combine_hotstart7 –iteration 801600 istep= 801600 Global quantities: 565229 470472 1036261 nvars= 21 14 static info: 72144000.0000000 801600 1 340800 last dim wrong: 5 0 1574 2412 |
From if(idims(dimids(ndims))/=npse_lcl) then
print*, 'last dim wrong:',m,irank,npse_lcl,idims(dimids(ndims))
stop
endif
The problem: Variable #5 in rank 0’s hotstart file has |
gen_elev2d (bdschism) |
ValueError: pt_reyes has gaps larger than fill limit |
Happened when trying to run: gen_elev2d --outfile elev2D.th.nc --hgrid=hgrid.gr3 --stime=2009-1-1 --etime=2009-7-1 --slr 0.0 noaa_download/noaa_pryc1_9415020_water_level_2009_2010.csv noaa_download/noaa_mtyc1_9413450_water_level_2009_2010.csv
The first 15 days of January in pryc are empty, in |
interpolate_variables7 |
ERROR: No such file or directory extract_mod: (1) |
Run interpolate_variables8 |
interpolate_variables8 |
ERROR: forrtl: severe (24): end-of-file during read, unit 19, file /scratch/dms/tomkovic/schism_repos/if_202211_agelessdemo/outputs_tropic/vgrid.fg |
|
interpolate_variables8 |
0 pts have no immediate parent elements; see fort.12 time= 1800.00000000000 3600.00000000000 5400.00000000000 7200.00000000000 9000.00000000000 10800.0000000000 12600.0000000000 14400.0000000000 16200.0000000000 failed to open(1) |
No U/V velocities written out to |
interpolate_variables8 |
forrtl: severe (174): SIGSEGV, segmentation fault occurred |
Two solutions: (1) low disk space; (2) enter |
interpolate_variables8 |
forrtl: severe (59): list-directed I/O syntax error, unit 21, file SIMULATIONDIR/outputs_tropic/interpolate_variables.in Image PC Routine Line Source interpolate_varia 000000000043BC7B for_read_seq_lis_ Unknown Unknown interpolate_varia 000000000043B141 for_read_seq_lis Unknown Unknown interpolate_varia 000000000040B5A1 Unknown Unknown Unknown interpolate_varia 000000000040B49D Unknown Unknown Unknown libc-2.28.so 0000145AB09F0D85 __libc_start_main Unknown Unknown interpolate_varia 000000000040B3BE Unknown Unknown Unknown |
I just had an entirely wrong interpolate_variables.in (the text of a param.nml was copied into the interpolate_variables.in file) |
interpolate_variables8 |
After header: 416486 357942 774965 23 48 3 303885 303780 303674 9.9999998E-03 652974.100000000 4171586.53000000 3.548943 1800.000 Inverted z-levels at: 0 3 0.0000000E+00 0.0000000E+00 2.993011 |
?? unresolved. re-ran barotropic |
interpolate_variables8 |
failed to open out2d |
?? unresolved. |
prepare_schism (schimpy) |
in console: Mesh contains areas smaller than the failure threshold. Consult the log or printout above for areas and warnings in prepare_schism.log: INFO:SCHISM:Checking for elements with small areas. Thresholds: warn=10.0, fail=4.0 WARNING:SCHISM:Global element: 222056 Area: 1.461 Centroid: 616324.9,4207169.2 |
Looking at the coordinates, found small cells (little slivers with areas ~1m2) that were a result of dangling arcs in the SMS |
prepare_schism (schimpy) |
in console: scrit file ‘C:\Users\….\prepare_schism-script.py’ is not present |
An issue with the way conda turns scripts into mini executables. Need to re-install schimpy (pip or conda). |
read_output10_xyz |
The following floating-point exceptions are signalling: IEEE_INVALID_FLAG |
?? unresolved. |