General troubleshooting
After diagnosing the problem If you needed to go down to the instrument please correct/update any connection information to Specific Device IOC page.
My IOC does not show up in the IOC list in IBEX
To update the list of IOCs in IBEX (which is stored in the database) you will need to run make iocstartups
from C:\Instrument\Apps\EPICS\
- this will concatenate all the config.xml
files in the ioc
directory for each IOC, then filter it and push it to the database.
My IOC fails to make with “Permission Denied”
This is likely not to do with permissions but that the file is in use elsewhere. Try running stop_ibex_server.bat and if that doesn’t work find out what else may be using the file (LockHunter is quite useful).
My IOC fails to build with epics/base/master/configure/RULES.Db:201: *** target pattern contains no '%'
This could be caused by the IOC attempting to copy in invalid db files during the build check in XXIOC-01App/Db/Makefile
that any dbs listed there exist.
I am having an Issue with a Specific IOC
Some IOC have important details which may help see if it is listed in Specific Device IOC.
checking stream device/asyn serial port settings
At the IOC command prompt, type:
dbior("drvAsyn",2)
ioc fails to connect to serial COM port
Errors like
2022/03/29 17:19:08.570 L0 -1 autoConnect could not connect: \\.\COM106 Can't open: Access is denied.
access is denied probably means another process (an IOC or terminal emulator like hyperterm) is already using the port. After you stop the offending process, it may be a few seconds before you can connect.
It doesn’t work What Should I Do?
Connect over hyperterminal (see Connect over hyperterminal)
Check command set is correct (look at documentation)
Start the IOC and check it is not in SIM mode
Set the IOC mask (see What is Passing between the IOC and the Stream Device)
If you don’t think records are processing correctly, try setting the
TPRO
(trace processing) field of the relevant records to a non-zero value. This will print a message to the IOC log whenever the record processes.
Any write to the PV fails
If any caput command to a pv fails with a message similar to
Old : TE:NDW1801:ICEFRDGE_01:MIMIC:SKIP:SP SKIP0
New : CA.Client.Exception..................................TE:NDW1801:ICEFRDGE_01:MIMIC:SKIP.:SP. .SKIP0.
.........
Warning: "Channel write request failed"
Context: "op=1, channel=TE:NDW1801:ICEFRDGE_01:MIMIC:SKIP:SP, type=DBR_STRING, count=1, ctx="TE:NDW1801:ICEFRDGE_01:MIMIC:SKIP:SP""
Source File: ../oldChannelNotify.cpp line 160
Current Time: Tue Nov 19 2019 09:56:21.490881300
...............................................................…
Then you may want to check if said PV is disabled somewhere in your db file through its DISP field by another PV.
Connect over hyperterminal
Start hyper-terminal set up connection to device. Find a command that is returns something or that changes something on the display. Run that command look for the change. Try
Reading
Writing
If it doesn’t work check:
Protocol (RS232, RS422, RS482)
Baud Rate
Com Port
Stop Bits
Parity
Line ending/Termination characters.
Flow control
If this doesn’t work check the moxa logging (see data over the moxa connection). If this doesn’t work there is a device that can be plugged into a serial port to intercept all traffic.
It won’t work in the IOC initially, but with no changes works after connecting to hyperterminal
If the IOC doesn’t work on startup, but does after using hyperterminal, at a command prompt run a mode com1
on the appropriate com port (example is for COM1) before and after running hyperterminal. If there is a change that isn’t duplicated by starting the IOC, then there is possibly a setting missing in asyn.
It did work and then freezes
This may be due to a low control issue e.g. Xon
/Xoff
has been incorrectly specified and some binary character
happens to match Xoff
. You should see asyn still sending character when you enable trace, but non getting through. If you run the dbior
command at level 2 you may see waiting as seen Xoff
.
The device is sensitive to input commands
Some devices has sensitivity to the rate at which you poll them. Typically the devices manual will detail appropriate command spacing, and the ability to use wait [time in ms];
after in
or out
commands in the protocol file is available. However, it is important to ensure your records and their types and error handling have been correctly defined. For instance in the event of a type error, since this is handled outside of the protocol file, it will bypass any structures you have in place to slow polling rates.
What is Passing between the IOC and the Stream Device
It is possible to put stream into a debug mode where everything sent and received is written to the console. To do this simply add to you st.cmd file (defined on your asyn port) :
asynSetTraceMask("L0",-1,0x9)
asynSetTraceIOMask("L0",-1,0x2)
where drvAsynSerialPortConfigure(<port>...)
This will include the terminator character, if you don’t see it it is not being sent or received. If no reply is given this will include a message “No reply from device in XXXms”
See Asyn trace mask for more details on specifying trace masks
To stop printing these commands to the log use:
asynSetTraceMask("L0",-1,0x0)
asynSetTraceIOMask("L0",-1,0x0)
Is the MOXA seeing anything?
It is sometime useful to see if the moxa is seeing any traffic. If you are in the cabin you can look at the flashing lights (remember to take away 4 to convert from com number to port number). There is one light for send and one for receive both should flash.
If you want to do it remotely then you can by using the moxa web page. The address is on the nagios page (standard log on). Look at Hosts -> MOXA_
Environment Variable not getting set from MASTER_RELEASE
Variables are transferred from ...EPICS\configure\MASTER_RELEASE
to ...EPICS\ioc\master\<IOCNAME>\iocBoot\<IOCNAME>\envPaths
when the ioc is made. You will have to delete the file to get the newest macros in and the paths have to exist.
I get a UDF
alarm on by bi
record that’s getting its value from a calcout
This was observed whilst programming the GEMORC IOC. We had records like so:
record(calc, "$(P)CALC")
{
field(INPA, "$(P)LONGIN1")
field(INPB, "$(P)LONGIN2")
field(CALC, "A+B==0")
field(OUT, "$(P)BI")
}
record(bi, "$(P)BI")
{
field(ZNAM, "No")
field(ONAM, "Yes")
}
The result was a UDF
alarm on the bi
because the RVAL
wasn’t being updated in line with the VAL
. We tried a lot of things to sort this. In the end we changed the OUT
field of the CALC
to a FLNK
and added the following field to the bi
: field(INP, "$(P)CALC")
. We don’t know why this solution worked over the many others we tried.
Log shows currentTime::getCurrentTime(): XXX sec time discontinuity detected
The log for all IOCs shows the message:
currentTime::getCurrentTime(): XXX sec time discontinuity detected
It is caused by the time going backwards due to a correction; the underlying cause is unknown. However, this seems to disappear once the host machine is restarted.
My IOC is not starting/reacting at all when launching from console/ProcServ
If your IOC running in procserv is unresponsive (does not start up, ctrl + x
from console does nothing) but you can still start the IOC manually from the IOCBoot directory, it is possible another process is using the port allocated to this IOC by procserv.
You can find the port procserv is trying to use for a given IOC under
EPICS\iocstartup\procserv.bat
In a command window, type
netstat -ano
. This gives you a list of ports currently in use and the IDs of processes using themSearch that list for the port from step 1. This will give you the ID of the process using it.
Find the process by it’s ID in task manager.
Asyn reports an asynStatus
value of greater than 5
The asynStatus
enum only contains values 0 to 5. But these are extended in asyn/asynPortDriver/paramErrors.h
to include a few more useful statuses.
Autosave
Lost autosave values (especially on Galils)
This is based on ticket: https://github.com/ISISComputingGroup/IBEX/issues/2180
Possible symptoms include:
Autosaved values mysteriously changing when the IOC is restarted
This includes Galils not retaining their position
Errors of the form
[DATE] dbFindRecord for 'MY_PV.FIELD' failed
Autosave files containing just a header and
<END>
tag
This has been observed primarily on Galils since they create custom monitor sets but it is feasible the issue could be seen elsewhere. The Galils pass macros to their monitor sets. If no macros are passed (e.g. if GALILADDR0n
is not set) then no monitor will be created and no values will be saved. This will mean the device (e.g. the motor) will start with its default values. If the macro is reintroduced then those default values will become the new autosave values. The previous values can be recovered by restoring a previous autosave file (e.g. copy GALIL_02_settings.sav_170309-144116
to GALIL_02_settings.sav
and restart the IOC).
Galils also will not autosave any settings or values if they are set to DEVSIM
or RECSIM
mode.
comparing current to previous autosaved values
When an IOC starts, it copies the last autosave values to a date stamped file such as GALIL_02_settings.sav_170309-144116
You can compare the current running IOC values to any of these files by using the asVerify
command that is built as part of the autosave
module and should be in your path after a config_env
e.g.
asVerify C:\Instrument\Var\autosave\GALIL_02\GALIL_02_settings.sav_170309-144116
There is no command to apply an autosave file to the current PVS, this is because it may be a bit dangerous as boot time autosave can apply values without forcing a process of a PV, whereas using a caput
could have different results. Thus it is better to stop ioc, update autosave file, start ioc as described for galil above.
Motors
Limit switches not active at the limit position
https://github.com/ISISComputingGroup/IBEX/issues/2174
In the motor record (C:\Instrument\Apps\EPICS\support\motor\master\motorApp\MotorSrc\motorRecord.cc
) the limit flag is only activated if the motor’s limit switch bit is active and motor’s command direction (CDIR) is correct. Correct in this context means that the motor is moving out of range. That is, the motor is moving in a positive direction past an upper limit or in a negative direction past the lower limit. The command direction is only set when a command (e.g. home, move) is sent. If the IOC is restarted, the value isn’t saved (and cannot be auto saved) so the limit flag behaviour will depend on the initial value of the command direction.
In summary, being at the limit position is insufficient to cause the limit flag to be active. This is expected behaviour, though can be slightly unintuitive at first.
Readbacks from device keep twitching
This may be because of variances in the signal returned from the motor. This is true, for example, on the jaws set on Merlin. It also causes the motor status label on the bumpstrip to keep switching. Note that leaving it in this state can increase dramatically the amount of information sent to the archiver.
To fix, use caput for the relevant PV in the motor record:
IN:[INST]:MOT:MTR0n0m.MDEL
: Sets the motor deadband. This will stop the position readback from oscillatingIN:[INST]:MOT:MTR0n0m.ADEL
: Sets the archiver deadband. This will stop values being written to the archiver unless the value changes by more than the deadband valueIN:[INST]:MOT:MTR0n0m_EDEL_SP
: Sets the encoder deadband. No readbacks will be updated unless the encoder varies by this amount.
McLennan OPI controls all red
Does the panel on the front of the controller display a letter?
r
Can happen owing to a communication issue when the IOC starts if the baud rate/stop bits aren’t set correctly
t
Tracking abort. Usually happens if the encoder resolution is incorrect. Is the
ERES
macro fraction the right way around?
n
Observed once on Merlin after the McLennan had been idle for a long time. Only happened after homing the motor. Restarting the controller (with the IOC stopped) fixed the issue
No
Try restarting the IOC, then move the motor through a significant operation (a home or jog to limit). Sometimes restarting the IOC is enough, sometimes the move causes the underlying issue to reveal itself (see
n
above)
which process owns a serial port
From an administrator (i.e. gamekeeper) prompt, run the sysinternals
handle command like:
handle.exe -a | findstr "Serial pid:"
This will print a list of PIDs and any matches to a serial port allocated, take the pid number above the relevant \Device\Serial line
Logging
I Changed the logging setting but it uses old settings
To sync up changes in logging from the DB file to the logging you must:
Restart the IOC so the logging info fields are pushed to the database
If any new PVs are being archived or settings have been changed the ARINST needs restarting
The ARACCESS needs restarting so it picks up the configuration changes from the database
There was an error which I have fixed but I missed a log file
To regenerate log files set the date in the c:\Logs\LOG_last_active_time
to the time you want to start generating files from, if this is more than delta days ago (2nd line of the log file) increase this number too. Then restart the ARACCESS
I want to Generate a Log file from some PVs Now
This can be done between two dates see ...EPICS\ISIS\inst_servers\master\ArchiverAccess\log_file_generator.py
as an example.
Value logs from IOC not produced/ARACCESS not creating log files/ENGINX Stress Rig not writing log files
Certain IOCs can be made to generate log files using the ARACCESS component, if these are no longer being produced then check:
ARCCESS console (restart it) This should show a little blurb and should have start and stop lines, e.g.
[2020-09-24 09:20:21] @@@ @@@ @@@ @@@ @@@ [2020-09-24 09:20:21] @@@ Received a sigChild for process 25052. The process was killed by signal 9 [2020-09-24 09:20:21] @@@ Current time: 2020-09-24 09:20:21 [2020-09-24 09:20:21] @@@ Child process is shutting down, a new one will be restarted shortly [2020-09-24 09:20:21] @@@ ^R or ^X restarts the child, ^Q quits the server [2020-09-24 09:20:22] @@@ Restarting child "ARACCESS" [2020-09-24 09:20:22] @@@ (as C:\Windows\system32\cmd.exe) [2020-09-24 09:20:22] @@@ The PID of new child "ARACCESS" is: 20124 [2020-09-24 09:20:22] @@@ @@@ @@@ @@@ @@@ [2020-09-24 09:20:23] [1600935623.82] INFO: Creating a new connection pool: DBSVR_127.0.0.1_archive_report [2020-09-24 09:20:24] [1600935624.74] INFO: Creating a new connection pool: DBSVR_127.0.0.1_iocdb_iocdb [2020-09-24 09:20:25] [1600935625.00] INFO: Reading config for ioc: INSTRON_01 [2020-09-24 09:20:25] [1600935625.00] INFO: Logging configuration (pvs as read from the archive) [2020-09-24 09:20:25] - file (log on end): C:\logs\INSTRON_01\INSTRON_01_{start_time}.dat [2020-09-24 09:20:25] - file (continuous): C:\logs\INSTRON_01\INSTRON_01_{start_time}_continuous.dat [2020-09-24 09:20:25] - trigger pv: IN:ENGINX:INSTRON_01:LOG:RECORD:SP.VAL [2020-09-24 09:20:25] - trigger pv: Logging from pv IN:ENGINX:INSTRON_01:LOG:SCAN.VAL with a default on error of 1s [2020-09-24 09:20:25] - file headers: [u'Cross Sectional Area = {1:.6f}', u'Gauge Length for Strain = {2:.6f}', u'RB Number = {0}', u''] [2020-09-24 09:20:25] - pvs in fileheader ['IN:ENGINX:ED:RBNUMBER.VAL', 'IN:ENGINX:INSTRON_01:STRESS:AREA.VAL', 'IN:ENGINX:INSTRON_01:STRAIN:LENGTH.VAL'] [2020-09-24 09:20:25] - table headers: ['Date/time', u'Run Number', u'Position (mm)', u'Load (MPa)', u'Strain (%)'] [2020-09-24 09:20:25] - table line: {time} {2} {0:.6f} {1:.6f} {3:.6f} [2020-09-24 09:20:25] - pvs in table line ['IN:ENGINX:INSTRON_01:POS.VAL', 'IN:ENGINX:INSTRON_01:STRESS.VAL', 'IN:ENGINX:DAE:RUNNUMBER.VAL', 'IN:ENGINX:INSTRON_01:STRAIN.VAL'] [2020-09-24 09:20:25] [1600935625.01] INFO: Last active: 2020-09-24T09:19:31 (13469489)
If this doesn’t show both start and stop lines which that it is set to auto start in the default block component
Start and stop the logging and see if it says it is creating a file, e.g.
INFO: Writing log file '...'
is this in the expected placeCheck the PVs it says it is logging and monitoring do they exist?
Next check the ARINST level is it up (check
http://localhost:4812/groups
)if not restart it, this means that there is no data stored for the run
Next step is to check that the data is getting into the database (probably plot on log plotter)
Write tip on Wiki for importing a database dump file into local database. Commands and output:
Reading historical PV values from local database after it has been backed up and truncated
This can be done by importing the backed-up database (usually on a network drive) into a local developer’s copy, then reading/plotting from there. Example command from a request to read chopper values on MERLIN:
mysql.exe -u root -p < [insert network back-up path here]\ndxMERLIN\ibex_database_backup_2024_01_30\ibex_db_sqldump_2024_01_30.sql
Enter password: ************
This script can be used to extract data into CSV file(s) for e.g. diagnostic purposes.
The local database can then be truncated to remove the import once work is complete.
EPICS_CA_MAX_ARRAY_BYTES
The EPICS_CA_MAX_ARRAY_BYTES
constant is used throughout ibex to specify the size of the largest waveform (which is usually produced by the dae). Large objects are need especially for (Live View)[/specific_iocs/dae/DAE-Live-View]. Currently the largest know version of this is on POLREF who want to see their linear detector with a time of flight 2D view. This has the size of:
5001 (time channels) * 640 * 8 ~ 20M