sahara job binary copy times out when files is too big

Bug #1705762 reported by Telles Mota Vidal Nóbrega
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
In Progress
Medium
Telles Mota Vidal Nóbrega

Bug Description

Sahara times out during the copy of job binary into the cluster if the binary is too big. The size of file tested was around 115MB and it took around 15 minutes to copy.

The copy is done using paramiko sftp on the sahara/utils/ssh_remote.py:

def _write_fl(sftp, remote_file, data):
    fl = sftp.file(remote_file, 'w')
    fl.write(data)
    fl.close()

A little research says that increasing transfer window size could be helpful but we need deeper investigation on it.

Another possible solution is change sftp write for sftp put.

Changed in sahara:
assignee: nobody → Telles Mota Vidal Nóbrega (tellesmvn)
importance: Undecided → Medium
Revision history for this message
Jeremy Freudberg (jfreud) wrote :

Another possible solution is moving job binary retrieving to be inside the cluster itself instead of inside Sahara.

Changed in sahara:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (master)

Fix proposed to branch: master
Review: https://review.openstack.org/552256

Changed in sahara:
status: Triaged → In Progress
Revision history for this message
Telles Mota Vidal Nóbrega (tellesmvn) wrote :

True Jeremy, iirc the problem with that solution is that we would need to pass on into the cluster the swift credentials which is not the safest thing to do. But, once we clear that we have SSL/TLS communication between Sahara and cluster we can certainly review this and make the change.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.