Procedures¶
Intended audience: Anyone who is administering Data Transfer Monitoring.
Deployment¶
Deployment is through a Makefile in the code repository.
Maintenance¶
Backup¶
No backups needed as the application does not store state.
Cold Startup¶
No specific cold startup procedures needed.
Cold Shutdown¶
No specific cold shutdown procedures needed.
Reproduce Service¶
Deploy another instance.
Calculate File Counts¶
Do not rely on File Notifications for accurate file counts since resent file generates duplicate file notifications for the same file. Run a recursive word count for the directory. Example below.
mc ls -r rubin-summit/rubin-summit/LSSTCam/20250921 | wc
Review End Readout and File Transfer Logs¶
Logs for End Readout messages and Summit to USDF Transfer time are in the logs. This can be manually reviewed to validate transfer times. The End to End Dashboard Logs provides a filter variable. Enter the Date and Sequence number in in YYYYMMDD_000<sequence number > format. An example is format is 20251120_000122
The output below is an example.
INFO:listeners.end_readout:end readout message: EndReadoutModel(private_sndStamp=1764038140.1819582, private_rcvStamp=0.0, private_efdStamp=1764038103.1819582, private_kafkaStamp=1764038140.1819582, private_seqNum=174564, private_revCode='834c6ef5', private_identity='ocs-bridge', private_origin=309651235, additional_keys='imageType:groupId:testType:stutterRows:stutterNShifts:stutterDelay:reason:program', additional_values='OBJECT:2025-11-25T02\\:34\\:56.976:OBJECT:0:0:0.0:FixedChaosMonkeyTest:BLOCK-T644', images_in_sequence=1, image_name='MC_O_20251124_000100', image_index=1, image_source='MC', image_controller='O', image_date='20251124', image_number=100, timestamp_acquisition_start=1764038100.6716127, requested_exposure_time=30.0, timestamp_end_of_readout=1764038140.1770334)
INFO:shared.metrics.s3_metrics:MC_O_20251124_000100 Summit to USDF transfer time: 4.31 seconds
INFO:shared.metrics.s3_metrics:MC_O_20251124_000100 end readout timestamp: 2025-11-25 02:35:03.177033+00:00
INFO:shared.metrics.s3_metrics:MC_O_20251124_000100 newest S3 file timestamp: 2025-11-25 02:35:07.491000+00:00
INFO:shared.s3_client:found LSSTCam/20251124/MC_O_20251124_000100/MC_O_20251124_000100_expectedSensors.json expected sensors file
Using Data Transfer Monitoring Metrics in Queries and Dashboards¶
The day label was added to Prometheus metrics for filtering metrics by observation day. To filter metrics use the day label with the day in YYYY-MM-DD format. An example query is dtm_file_messages_received_total{day="2025-11-22"}
Enable Debug Logs¶
To enable debug log change the DEBUG_LOGS environment label from false to true.
Changing Late File Time Parameter¶
The MAX_FILE_LATE_TIME value defines what threshold late files are identified. Change this value in the Data Transfer Monitoring Kubernetes deployment manifest for a different threshold.