Common Problems and Diagnosis
At the bottom of this page you’ll find tables of common SCHISM error messages and their solutions.
If you encounter an error message that is not listed, please report it to the developers with the error message and any relevant details about your run (e.g., input files, command used to run, etc.). This will help us improve the documentation and assist other users who may encounter the same issue in the future. Or if you’re comfortable editing the documentation, you can add the error message and solution to the appropriate table yourself. See the Contributing section for more details on how to contribute to the documentation.
Where does SCHISM write error messages?
If SCHISM is running and encounters an error, it will write the error message to the outputs/fatal.error file. If this file has any content, then you can guarantee the run has failed.
At times you’ll get errors in the job submission system (e.g., SLURM) that are not written to the fatal.error file. In this case, you can check the job’s standard output and error files for any messages.
Common Problems
SIGSEGV or segmentation fault
This can be due to low disk space or insufficient memory. On linux use ulimit -s unlimited to allow for unlimited stack size. If the problem persists, check the fatal.error file for more details.
The runs seems very slow compared to normal
Dry boundary
Dry boundary will appear in the fatal.error file and will say something along the lines of “ABORT: STEP: wetted cross section length on open bnd”.
This most often appears in Clifton Court and in the Yolo Toedrain boundaries. The issue is when the boundary is trying to take water out of the system (positive value) and it cannot because of insufficent water available in the boundary cell. Most boundaries are artificially deepened to avoid this, but the error can occur when running a non-historic/hindcast simulation and the Yolo Toedrain boundary is trying to replicate the tidal boundary in the wrong timing with the rest of the system. Similarly if the Clifton Court SWP allocated diversion is out of sync with the water levels in the system you’ll need to run the ccf_gate utility in bdschism to properly adjust the timing of the gates.
SCHISM Errors & Solutions
File with Error |
Error Message |
Solution/Notes |
|---|---|---|
<RUNDIR>outputsfatal.error |
0: ABORT: init: nc dim mismatch, 410577 540011 951102 23 2 410577 540011 951102 2 2 |
The run is looking for 2 dimensions (per the run) and the hotstart file has 3 (23 nVert) |
<RUNDIR>outputsfatal.error |
0: ABORT: init: nc dim mismatch, 459056 547088 1006697 23 2 459056 547088 1006697 23 3 |
The run is looking for 3 tracers and finding 2 in the hotstart file |
<RUNDIR>outputsfatal.error |
122: ABORT: you must have sflux_inputs_file! |
SCHISM wants a file named sflux_inputs.txt under the sflux directory. Contains: &sflux_inputs / |
<RUNDIR>outputsfatal.error |
0: ABORT: MISC: elev2D.th.nc |
Ended up being a capitalization issue (my filename was elev2d.th.nc Could have been that file was missing, run ‘gen_elev2d’ bdschism utility |
<RUNDIR>outputsfatal.error |
0: ABORT: STEP: tracer nudging nc(2) |
Nudging data has run out. This could be that your obs-coastal run has finished running and you need to restart and switch to coastal only (see nudging warm-up documentation here) |
<RUNDIR>outputsfatal.error |
30: ABORT: Inconsistent # of nodes at open boundary 13 50 n 58 |
This is an error at the Yolo boundary which has 58 nodes in hgrid but bctides doesn’t reflect the correct number of nodes at that boundary. The Yolo boundary is boundary #13 and looks for 50 nodes but finds 58 in hgrid Solution would be to change 50 to 58 in the bctides.in.2d and 3d files, OR to run the prepare_schism bctides utility (link here) |
<RUNDIR>outputsfatal.error |
0: ABORT: Wind scaling factor must be positive: 200175 -2.00000000000000 |
Looks like i-th element (200175 and it’s neighbors) have values of -2.000 Use convert_polygon: yaml to shapefile to see windfactor.yaml polygons in GIS to evaluate what needs additional coverage, edit polygons and save convert_polygons –input windfactor.yaml –output windfactor/windfactor.shp edited and saved as shp then converted back to yaml with convert_polygons convert_polygons –input windfactor/windfactor.shp –output windfactor/windfactor.yaml then copy the edited polygon into the windfactor.yaml input file to re-run prepare_schism main_bay_delta.yaml |
<RUNDIR>outputsfatal.error |
26: ABORT: no appropriate time exists for: sflux_air_1 time_now = 2454983.33333333 first time available = 2458246.00000000 last time available = 2458484.95833331 got_bracket = F got_suitable_bracket = F |
when make_links_full.py was ran the byear/eyear needs to be set appropriately for the model time period |
<RUNDIR>outputsfatal.error |
23: ABORT: STEP: wetted cross section length on open bnd <=0; boundary ndx= 5 , length= 0.000000000000000E+000 |
Clifton Court Forebay (CCF) is drying out. From bctides.in you can see that swp is the fifth boundary index. One possible solution is to disable the strict gate operation timeseries based on historical reporting and instead allows the gate to operate on a more “relaxed” operational rule. In hydraulics.in: A better solution is to run the ccf_gate utility in bdschism |
<RUNDIR>outputsfatal.error |
112: ABORT: STEP: wetted cross section length on open bnd <=0; boundary ndx= 14 , length= 0.000000000000000E+000 |
By looking at bctides you can see that the 14th index is north_bay. Checking the flux.th for this timestep & the 13th column (no flux for ocean) |
<RUNDIR>outputsfatal.error |
0: ABORT: step: time_series in elev2D.th.nc (2) |
elev2d.th.nc has time indices that aren’t compatible with the run period being requested in param.nml |
<RUNDIR>outputsfatal.error |
46: ABORT: MAIN: orphaned block face: 1 |
For me this was an issue of a flow line crossing a no-grid space and then entering another new grid element. For instance when there’s a gap between the channel and an island Needed to modify this line in flow_station_xsects.yaml |
<RUNDIR>outputsfatal.error |
49: ABORT: EVAL_CUBIC: Falied to find: 1 NaN NaN NaN |
this turned out to be an issue of vgrid.in.3d not having the appropriate number of layers. Check your minmax layer function that’s used in prepare_schism to generate the vgrid.in.3d file. I had some that were 0. This can also be checked with visit looking at nlayer.gr3 |
<RUNDIR>outputsfatal.error |
IBILINEAR: no roots QUICKSEARCH: Cannot find a vert. level: |
This stops after running for a while. In mirror.out you see: done solver; etatot= NaN ; average |eta|=
Then another timestep starts and you get the fatal.error message (QUICKSEARCH) and if you look into the HPC5 error stdout message you see IBILINEAR. Solution: The model has blown up, I checked for small/tiny cells and this was the issue? fingers crossed |
<RUNDIR>outputsfatal.error |
82: ABORT: Check block nodes (0): |
For me I think this was an issue of a hotstart file that wasn’t made for this grid. Need to re-run hotstart. That is true, but that wasn’t the case. |
<RUNDIR>outputsfatal.error |
86: ABORT: STEP: wrong sign vsink 1 1 1.9980000E-02 9.9900002E-03 |
vsink needs to have a negative sign at all times. Even zeros need to be “-0” |
<RUNDIR>outputsfatal.error |
0: ABORT: MISC: uv3D.th dt wrong |
The time variable in uv3D.th is not correct. be sure your param.nml aligns with the gen_uv3d output |
<RUNDIR>outputsfatal.error |
21: ABORT: Left out horizontal flux (3): 173 2 |
Check that the uv3D.th file is large enough (should be hundreds of MBs). If it’s not large, then your interpolate_variables or gen_uv3d command did not succeed |
<RUNDIR>outputsfatal.error |
166: ABORT: INIT: ntrs(3)<=0 |
This was an issue of calling the wrong SCHISM binaries. The error happened when I called pschism_PREC_EVAP_GOTM_GEN_TVD-VL and should’ve ran pschism_PREC_EVAP_GOTM_TVD-VL SCHISM is looking for three tracers (Salinity, temperature, and a generic tracer) and is not finding an initializaiton designation. |
<RUNDIR>outputsfatal.error |
107: ABORT: Impossible 82: 10848.9750000000 |
This issue arises from when I’m trying to run a hotstart from the staout files. I think this was because the nsteps_from_cold wasn’t correct. There was a mismatch between time and nsteps_from_cold and iths. Once those were fixed and it was a proper multiple of ihot step and step_nu_tr then it worked. |
<RUNDIR>outputsfatal.error |
0: ABORT: MISC: elev time_series2 |
??? |
<RUNDIR>outputsfatal.error |
0: ABORT: INIT: illegal sav_: 48412 2.000000000000000E-002 -957.588019930000 |
The sav_*.yaml files might not have been suitable for your grid. Check for extreme values interpolated from -999 (NA) |
<RUNDIR>outputsfatal.error |
0: ABORT: wtiminc < dt |
This had to do with the analysis module |
<RUNDIR>outputsfatal.error |
51: ABORT: Tr. obc nudging factor wrong: 3.00000000000000 1 |
This is with a run using generic tracers (pschism_PREC_EVAP_GOTM_GEN_TVD-VL) and I didn’t specify any nudging for the generic tracer (schism source code in link) I think it was an issue that for the tracers, if there’s a constant value there also has to be a relax value: 3 0 1 1 2 2 ! coyote 1. 0.074318 1. 0. ! Generic Tracer Concentration ! Generic Tracer Relax |
<RUNDIR>outputsfatal.error |
0: ABORT: Check tracer_nudge.gr3 |
Could be mismatch of hgrid and gr3? Trying to just make an empty GEN_nudge.gr3… didn’t work. The issue is that the number of elements in hgrid.gr3 doesn’t match that in {TRACER}_nudge.gr3. Need to re-run create_nudging. Check with “head hgrid.gr3” |
<RUNDIR>outputsfatal.error |
0: ABORT: MISC: vsource.th start time wrong |
|
<RUNDIR>outputsfatal.error |
0: ABORT: init: nc time1 |
something is wrong with the initialization of the hotstart.nc file. Try recombining hotstarts and resubmitting simulation |
<RUNDIR>outputsfatal.error |
260: ABORT: STEP: station elev error 136 -6.7100000000000000 |
else !wet … if(k0==0) then write(errmsg,*)’STEP: station elev error’,i,zstal(i) call parallel_abort(errmsg) Indicating the station is wet, but …. k0=0? from station.in: 136 578096.00 4151478.40 -6.71 ! dumbr upper “San Francisco Bay at Old Dumbarton Bridge” This looks like it is a wet/dry issue or something about elevation, but the lines in question can do a calculation for any healthy state. So this is caused by a nan or an OS/MPI breakdown. Check the tail of your mirror.outfile for etatot and look for infinite/nan values. Otherwise combine hotstarts (combine_hotstart –latest). |
<RUNDIR>outputsfatal.error |
63: ABORT: init: hotstart.nc not found |
The odd thing is the hotstart.nc is RIGHT THERE. Check if it’s corrupted with “ncdump -h hotstart.nc” |
<RUNDIR>outputsfatal.error |
2: ABORT: EQSTATE: Impossible dry (7): -Infinity Infinity 1 50225 |
This was after running the barotropic then running the baroclinic model from a interpolated uv.3d.th I think this is an issue with my hotstart builds. I updated schimpy post version 1.15.0 and re-ran my hotstart script. Check nlayer.gr3 in VisIt and see if there are any cells with 0 layers. This can be a result of a bad minmaxlayer.shp file |
Batch Job Submission: <RUNDIR>*.log |
0: ABORT: MSGP: need at least 1 compute process |
Likely an mpi issue. Check your job submission files for tasks, nodes, etc. |
Batch Job Submission: <RUNDIR>*.log |
forrtl: severe (24): end-of-file during read, unit 251, file …../simulations/suisun_lhc_1/.//outputs/staout_1 |
This typically means that the staout_1 file doesn’t extend to the timestep that’s being set in hotstart.nc but hotstart is 41472000 and the staout goes to 41475600. So that shouldn’t be the issue. checked that station.in station numbers match length of staout_* files. all good there. checked that param.nml was right. This ended up being an issue that the param.nml was for the barotropic which has a different nspool_sta and is looking for a different output from staout_* |
Batch Job Submission: <RUNDIR>*.log |
forrtl: severe (24): end-of-file during read, unit 9, file …/schism/azure_dsp_2024_lhc_v3/simulations/cache_lhc_5/.//outputs/flux.out Image PC Routine Line Source pschism_PREC_EVAP 00000000006BDB73 Unknown Unknown Unknown pschism_PREC_EVAP 000000000059E52B other_hot_init_ 222 misc_subs.F90 pschism_PREC_EVAP 0000000000483A33 schism_init_ 7029 schism_init.F90 pschism_PREC_EVAP 00000000004102FE MAIN__ 112 schism_driver.F90 pschism_PREC_EVAP 000000000041012D Unknown Unknown Unknown libc-2.28.so 000014E681019D85 __libc_start_main Unknown Unknown pschism_PREC_EVAP 000000000041004E Unknown Unknown Unknown srun: error: cn01: task 0: Exited with exit code 24 |
This is typically due to the flux file not being long enough for the requested hotstart date. This can be a result of a cloud service not copying out the flux.out file as needed and being short of the hotstart file generated. Evaluate the flux.out and staout files to find the appropriate timestep to create a hotstart from. |
Batch Job Submission: <RUNDIR>*.log |
forrtl: severe (59): list-directed I/O syntax error, unit 32, file …./schism/roundtrip/suisun-suisun/.//fluxflag.prop |
Need to make sure the fluxflag.prop matches your hgrid.gr3 |
Batch Job Submission: <RUNDIR><RUNNAME>.e<RUNNUMBER> (ex: v111_noage.e2568) |
forrtl: severe (59): list-directed I/O syntax error, unit 51, file …./if_202211_agelessdemo/.//flux.th |
Change flux.th filename to flux_datetime.th I used: model_time to_elapsed –start ‘2009-5-1’ flux_datetime.th –out flux.th Then did the same with salt.th and temp.th |
Batch Job Submission: <RUNDIR><RUNNAME>.e<RUNNUMBER> (ex: v111_noage.e2568) |
forrtl: severe (29): file not found, unit 3011, file ….././/¡9‰A_tra¡9‰Aulve¡9‰A ¡9‰A.th |
Not positive that this is the solution, but this error arose when I was running baroclinic after the internal obs nudging period was done, and the TEM/SAL_nu_roms.nc files with ROMS only files may have been corrupted. I think my machine restarted while writing those files. So I re-created them and this resolved the issue. |
Batch Job Submission: <RUNDIR><RUNNAME>.e<RUNNUMBER> (ex: v111_noage.e2568) |
forrtl: severe (29): file not found, unit 9, file …./schism_repos/if_202212_jonesdemo/.//outputs/flux.out |
The simulation must be a hot start run, ihot = 2 in param.nml, because it’s looking for flux.out to build on. The error message suggests that flux.out under outputs is not found. A hot start run needs a few output files from a previous run. One of them is flux.out , a flow output file. |
Batch Job Submission: varspoolmail<USERNAME> |
PBS Job Id: 2564.hpc4 Job Name: v111_noage Post job file processing error; job 2564.hpc4 on host cn01 |
Storage issue on HPC4 |
Batch Job Submission: varspoolmail<USERNAME> |
PBS Job Id: 2569.hpc4 Job Name: v111_noage Execution terminated Exit_status=255 |
See next error |
Batch Job Submission: console |
qsub: directive error: -M <email_address>, <username>@localhost |
This could be that your launch.pbs file has some errors. # Change the number of cores you want to use. # This needs to agree with what you are asking above in the PBSPro # setting. n_cores=144 cd $PBS_O_WORKDIR module load schism/5.10_intel2022.1 #MV2_DEFAULT_TIME_OUT=20 mpiexec -n $n_cores bash ./schism.sh mpiexec –rsh=ssh -n $n_cores bash ./schism.sh |
Batch Job Submission: console |
Job id Name User Time Use S Queue —————- —————- —————- ——– - —– 2984.hpc4 v112_RAL_pAge_T <username> 0 Q workq |
Won’t leave this condition, Q=”queued” which you can typically fix by running “ulimit -s unlimited” When that won’t work: check that the number of nodes in launch.pbs is appropriate for the machine you’re on. for HPC4: #PBS -l select=6:ncpus=24:mpiprocs=24 |
Batch Job Submission: console (from qstat) |
Job id Name User Time Use S Queue —————- —————- —————- ——– - —– 3424.hpc4 DSP_baseline_20 <username> 00:00:02 R workq |
← stuck on this with no progress in mirror.out or any error messages elsewhere qdel job, then see .e3424 error messsages (in this case it couldn’t find the hgrid.gr3 file) |
not noted |
[vsource.th missing] |
Need a vsource.th file. Check if your file is mis-named, or if you need to generate the elapsed file. |
The error message arises in schism_init.f90 within the hydraulics.in handling |
||
Checking structures. The hydraulics.in was for a different grid. Replaced it with the updated grid. |
||
Or could be that the space between Grantline partial and barrier (no mesh type/blank) was accidentally included as a mesh element. |
||
SCHISM Utility Errors & Solutions
Function |
Error Message |
Solution/Notes |
|---|---|---|
?? |
FutureWarning: ‘+init=<authority>:<code>’ syntax is deprecated. ‘<authority>:<code>’ is the preferred initialization method. |
?? |
combine_hotstart7 |
forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read Image PC Routine Line Source combine_hotstart7 000000000041BCF9 for__io_return Unknown Unknown combine_hotstart7 0000000000436CEC for_read_int_lis_ Unknown Unknown combine_hotstart7 00000000004359F3 for_read_int_lis Unknown Unknown combine_hotstart7 0000000000416CDC Unknown Unknown Unknown combine_hotstart7 000000000040B482 Unknown Unknown Unknown combine_hotstart7 000000000040B3A2 Unknown Unknown Unknown libc-2.17.so 00002AE015D983D5 __libc_start_main Unknown Unknown combine_hotstart7 000000000040B2A9 Unknown Unknown Unknown |
Need to go to /home/<USERNAME>/schism_build/src/Utility/Combining_Scripts/combine_hotstart7.f90 to get command below to compile combine_hotstart7: ifort -O2 -cpp -CB -mcmodel=medium -assume byterecl -g -traceback -o combine_hotstart7.exe ../UtilLib/argparse.f90 combine_hotstart7.f90 -I$NETCDF/include -I$NETCDF_FORTRAN/include -L$NETCDF_FORTRAN/lib -L$NETCDF/lib -lnetcdf -lnetcdff Note that your NETCDF and NETCDF_FORTRAN variables need to be set in your .bash_profile file in /home/<USERNAME> and then you can run source <PATH>/.bash_profile |
combine_hotstart7 |
combine_hotstart7.exe: error while loading shared libraries: libnetcdf.so.13: cannot open shared object file: No such file or directory |
Restart shell and reload module |
combine_hotstart7 |
forrtl: severe (29): file not found, unit 10, file SIMULATIONDIR/outputs/local_to_global_000000 Image PC Routine Line Source combine_hotstart7 0000000000426F7C Unknown Unknown Unknown combine_hotstart7 000000000040C569 combine_hotstart7 83 combine_hotstart7.f90 combine_hotstart7 000000000040C0AB MAIN__ 35 combine_hotstart7.f90 combine_hotstart7 000000000040BFCD Unknown Unknown Unknown libc-2.28.so 00001532DA1ACD85 __libc_start_main Unknown Unknown combine_hotstart7 000000000040BEEE Unknown Unknown Unknown |
I think this is an issue with Azure not uploading the blob ./outputs contents to the compute node. In essence yes. The compute node wasn’t able to allocate enough memory to start the job correctly |
combine_hotstart7 |
istep= 751200 Global quantities: 456254 380408 837202 nvars= 0 32764 static info: 1.929871795483076E-316 0 0 -1827807652 pre- and post-comb conflict |
For me this was an issue that I was asking for a hotstart from a timestep that didn’t exist. |
combine_hotstart7 |
|
From combine_hotstart7.f90: if(idims(dimids(ndims))/=npse_lcl) then
The problem: Variable #5 in rank 0’s hotstart file has nResident_elem (2412) as its last dimension when it should have nResident_side (1574). I tried just marching back in hotstarts until I did not get this error. |
gen_elev2d (bdschism) |
ValueError: pt_reyes has gaps larger than fill limit |
Happened when trying to run gen_elev2d –outfile elev2D.th.nc –hgrid=hgrid.gr3 –stime=2009-1-1 –etime=2009-7-1 –slr 0.0 noaa_download/noaa_pryc1_9415020_water_level_2009_2010.csv noaa_download/noaa_mtyc1_9413450_water_level_2009_2010.csv The first 15 days of January in pryc are empty, in schimpy/gen_elev2d.py max_gap is set to 5. Use replacement data in meantime. Otherwise need to fill missing data. |
interpolate_variables7 |
ERROR: No such file or directory extract_mod: (1) |
Run interpolate_variables8 |
interpolate_variables8 |
ERROR: forrtl: severe (24): end-of-file during read, unit 19, file /scratch/dms/tomkovic/schism_repos/if_202211_agelessdemo/outputs_tropic/vgrid.fg |
vgrid.in.3d (symbolically linked to vgrid.fg) is made in 5.8 format This was due to a typo in main_bay_delta.yaml of: vgrid_version: 5.10 Where it needed to be: vgrid_version: ‘5.10’ |
interpolate_variables8 |
0 pts have no immediate parent elements; see fort.12
failed to open(1) |
No U/V velocities written out to out2d_*.nc. “failed to open(1)” was the error. Looked at schism/interpolate_variables8.f90 at master · schism-dev/schism (github.com) and found it wasn’t finding an .nc file it was expecting issue with param.nml inputs “iof_hydro (#)” #s have been re-interpreted to mean different things. Checked with “ncdump out2d_1.nc” and saw there was no U/V outputs Need to re-run barotropic with correct specification of iof_hydro (#’s) - note this worked Unless the horizontalVel_X and _Y.nc files are not found in the outputs dir |
interpolate_variables8 |
forrtl: severe (174): SIGSEGV, segmentation fault occurred |
Two solutions:
|
interpolate_variables8 |
forrtl: severe (59): list-directed I/O syntax error, unit 21, file SIMULATIONDIR/outputs_tropic/interpolate_variables.in Image PC Routine Line Source interpolate_varia 000000000043BC7B for_read_seq_lis_ Unknown Unknown interpolate_varia 000000000043B141 for_read_seq_lis Unknown Unknown interpolate_varia 000000000040B5A1 Unknown Unknown Unknown interpolate_varia 000000000040B49D Unknown Unknown Unknown libc-2.28.so 0000145AB09F0D85 __libc_start_main Unknown Unknown interpolate_varia 000000000040B3BE Unknown Unknown Unknown |
I just had an entirely wrong interpolate_variables.in (the text of a param.nml was copied into the interpolate_variables.in file) |
interpolate_variables8 |
After header: 416486 357942 774965 23 48 3 303885 303780 303674 9.9999998E-03 652974.100000000 4171586.53000000 3.548943 1800.000 Inverted z-levels at: 0 3 0.0000000E+00 0.0000000E+00 2.993011 |
?? unresolved. re-ran barotropic |
interpolate_variables8 |
failed to open out2d |
?? unresolved. |
prepare_schism (schimpy) |
in console: Mesh contains areas smaller than the failure threshold. Consult the log or printout above for areas and warnings in prepare_schism.log: from prepare_schism.log: INFO:SCHISM:Checking for elements with small areas. Thresholds: warn=10.0, fail=4.0 WARNING:SCHISM:Global element: 222056 Area: 1.461 Centroid: 616324.9,4207169.2 |
Looking at the coordinates, found small cells (little slivers with areas ~1m2) that were a result of dangling arcs in the SMS .map file. Removed dangling arcs and rebuilt grid. Then the issue was for cells that were just below the threshold. Needed to create arcs around the small elements to force grid to create > 4m2 cell areas |
prepare_schism (schimpy) |
in console: scrit file ‘C:Users.…prepare_schism-script.py’ is not present |
an issue with the way conda turns scripts into mini executables. Need to re-install schimpy (pip or conda) |
read_output10_xyz |
The following floating-point exceptions are signalling: IEEE_INVALID_FLAG |
?? unresolved. |