Software-update: HTCondor 10.5.0 / 10.0.5
Het HTCondor Team van de Universiteit van Wisconsin-Madison heeft nieuwe feature- en longtermsupportversies uitgebracht van zijn 'workload management system' HTCondor. De versienummers zijn beland bij 10.5.0 en 10.0.5. HTCondor richt zich op het beheer van rekenintensieve taken en kan deze over verschillende aangesloten nodes verdelen. De gebruiker stuurt zijn taak naar HTCondor, waarna dit het proces afhandelt op basis van ingestelde policies en de beschikbaarheid van aangesloten resources, om tot slot de resultaten naar de gebruiker terug te sturen. HTCondor kan bijvoorbeeld een dedicated Beowulf-cluster aansturen, maar ook gewone desktops die even niets te doen hebben. Tijdens SC16 hebben Google, Fermilab en het HTCondor Team een 160k-core cloud-based elastic compute cluster gedemonstreerd, en in 2020 heeft de National Science Foundation gekozen voor HTCondor als onderdeel van haar Partnership to Advance Throughput Computing. De beknopte aanpassingen van deze uitgaven zien er als volgt uit:
Version 10.5.0 - Feature ChannelCan now define DAGMan save points to be able to rerun DAGs from thereExpand default list of environment variables passed to the DAGMan managerAdministrators can prevent users using “getenv = true” in submit filesImproved throughput when submitting a large number of ARC-CE jobsExecute events contain the slot name, sandbox path, resource quantitiesCan add attributes of the execution point to be recorded in the user logEnhanced condor_transform_ads tool to ease offline job transform testingFixed a bug where memory limits over 2 GiB might not be correctly enforced
Can now define DAGMan save points to be able to rerun DAGs from thereExpand default list of environment variables passed to the DAGMan managerAdministrators can prevent users using “getenv = true” in submit filesImproved throughput when submitting a large number of ARC-CE jobsExecute events contain the slot name, sandbox path, resource quantitiesCan add attributes of the execution point to be recorded in the user logEnhanced condor_transform_ads tool to ease offline job transform testingFixed a bug where memory limits over 2 GiB might not be correctly enforcedVersion 10.0.5 - Long Term Support ChannelRename upgrade9to10checks.py script to condor_upgrade_checkFix spurious warning from condor_upgrade_check about regexes with spaces
Rename upgrade9to10checks.py script to condor_upgrade_checkFix spurious warning from condor_upgrade_check about regexes with spacesVersion 10.0.4 - Long Term Support ChannelProvides script to assist updating from HTCondor version 9 to version 10Fixes a bug where rarely an output file would not be transferred backFixes counting of submitted jobs, so MAX_JOBS_SUBMITTED works correctlyFixes SSL Authentication failure when PRIVATE_NETWORK_NAME was setFixes rare crash when SSL or SCITOKENS authentication was attemptedCan allow client to present an X.509 proxy during SSL authenticationFixes issue where a users jobs were ignored by the HTCondor-CE on restartFixes issues where some events that HTCondor-CE depends on were missing
Provides script to assist updating from HTCondor version 9 to version 10Fixes a bug where rarely an output file would not be transferred backFixes counting of submitted jobs, so MAX_JOBS_SUBMITTED works correctlyFixes SSL Authentication failure when PRIVATE_NETWORK_NAME was setFixes rare crash when SSL or SCITOKENS authentication was attemptedCan allow client to present an X.509 proxy during SSL authenticationFixes issue where a users jobs were ignored by the HTCondor-CE on restartFixes issues where some events that HTCondor-CE depends on were missingVersion 10.0.3 - Long Term Support ChannelGPU metrics continues to be reported after the startd is reconfiguredFixed issue where GPU metrics could be wildly over-reportedFixed issue that kept jobs from running when installed on Debian or UbuntuFixed DAGMan problem when retrying a proc failure in a multi-proc node
GPU metrics continues to be reported after the startd is reconfiguredFixed issue where GPU metrics could be wildly over-reportedFixed issue that kept jobs from running when installed on Debian or UbuntuFixed DAGMan problem when retrying a proc failure in a multi-proc nodeVersion 10.0.2 - Long Term Support ChannelHTCondor can optionally create intermediate directories for output filesImproved condor_schedd scalability when a user runs more than 1,000 jobsFix issue where condor_ssh_to_job fails if the user is not in /etc/passwdThe Python Schedd.query() now returns the ServerTime attribute for FifemonVM Universe jobs pass through the host CPU model to support newer kernelsHTCondor Python wheel is now available for Python 3.11Fix issue that prevented HTCondor installation on Ubuntu 18.04
HTCondor can optionally create intermediate directories for output filesImproved condor_schedd scalability when a user runs more than 1,000 jobsFix issue where condor_ssh_to_job fails if the user is not in /etc/passwdThe Python Schedd.query() now returns the ServerTime attribute for FifemonVM Universe jobs pass through the host CPU model to support newer kernelsHTCondor Python wheel is now available for Python 3.11Fix issue that prevented HTCondor installation on Ubuntu 18.04Version 10.0.1 - Long Term Support ChannelAdd Ubuntu 22.04 (Jammy Jellyfish) supportAdd file transfer plugin that supports stash:// and osdf:// URLsFix bug where cgroup memory limits were not enforced on Debian and UbuntuFix bug where forcibly removing DAG jobs could crash the condor_scheddFix bug where Docker repository images cannot be run under SingularityFix issue where blahp scripts were missing on Debian and Ubuntu platformsFix bug where curl file transfer plugins would fail on Enterprise Linux 8
Add Ubuntu 22.04 (Jammy Jellyfish) supportAdd file transfer plugin that supports stash:// and osdf:// URLsFix bug where cgroup memory limits were not enforced on Debian and UbuntuFix bug where forcibly removing DAG jobs could crash the condor_scheddFix bug where Docker repository images cannot be run under SingularityFix issue where blahp scripts were missing on Debian and Ubuntu platformsFix bug where curl file transfer plugins would fail on Enterprise Linux 8Version 10.0.0 - Long Term Support ChannelUsers can prevent runaway jobs by specifying an allowed durationAble to extend submit commands and create job submit templatesInitial implementation of htcondor command line interfaceInitial implementation of Job Sets in the htcondor CLI toolAdd Container UniverseSupport for heterogeneous GPUsImproved File transfer error reportingGSI Authentication method has been removedHTCondor now utilizes ARC-CE’s REST interfaceSupport for ARM and PowerPC for Enterprise Linux 8For IDTOKENS, signing key not required on every execution pointTrust on first use ability for SSL connectionsImprovements against replay attacks
Users can prevent runaway jobs by specifying an allowed durationAble to extend submit commands and create job submit templatesInitial implementation of htcondor command line interfaceInitial implementation of Job Sets in the htcondor CLI toolAdd Container UniverseSupport for heterogeneous GPUsImproved File transfer error reportingGSI Authentication method has been removedHTCondor now utilizes ARC-CE’s REST interfaceSupport for ARM and PowerPC for Enterprise Linux 8For IDTOKENS, signing key not required on every execution pointTrust on first use ability for SSL connectionsImprovements against replay attacks
Source:
Tweakers.net