Commit 3ecccd22 authored by Millian Poquet's avatar Millian Poquet

Major commit: multiple workloads.

This commit is a big step forward to handle multiple workloads at the same time.

Protocol update. Jobs are no longer identified only by a unique number, but
by a workload_name and a unique number within this workload_name. The separator
between these two fields is '!'. The default workload, the one read from the
input JSON file, is called "static". If the scheduler gives an order about
a job by only calling it by its unique number, the 'static' workload is
assumed, for compatibility reasons.

Batsim code update. There is no longer "Jobs" and "Profiles" roaming in the
BatsimContext. Now, Jobs and Profiles are grouped together inside one Workload
instance. Different Workload instances can be stored into a Workloads (note
the 's'!), which is just a map<string, Workload*> with wrapping methods
to simplify job handling. A Workloads is instanciated in the BatsimContext.

Batsim code update. Jobs are now identified by a structure JobIdentifier.
This struct is just a string (the workload name) and an integer (the unique
job number within its workload). JobIdentifier are used within most
ipp messages now. Furthermore, jobs know in which workload they are,
and Jobs know in which Workload they are.

Batsim tests pass this commit, but a Redis server should be launched to
avoid assertion fails. Redis is not yet used on the scheduler side,
this part has not been tested yet. Travis has not been updated yet to
install and execute Redis, so the tests won't work on Travis at the moment.
parent 6139096c
......@@ -38,35 +38,35 @@ the job ID of the job which just completed. This part is not mandatory, it depen
# Message Stamps #
| Proto. version | Stamp | Direction | Content syntax | Meaning
|---------------- |-------|-------------- |------------------------- |-------------
| 0+ | S | Bastim->Sched | JOB_ID | Job submission: one (static) job is available and can now be allocated to resources.
| 0+ | C | Batsim->Sched | JOB_ID | Job completion: one (static) job finished its execution.
| 0+ | J | Sched->Batsim | JID1=MID1,MID2,MIDn[;...]| Job allocation: tells to put job JID1 on machines MID1, ..., MIDn. Many jobs might be allocated in the same event. Each MIDk part can be a single machine ID or a closed interval MIDa-MIDb where MIDa <= MIDb
| 0+ | N | Both | No content | NOP: tells to do nothing / nothing happened.
| 1+ | P | Sched->Batsim | MID1,MID2,MIDn=PSTATE | Asks to change the power state of some machines. Each MIDk part can be a single machine ID or a closed interval MIDa-MIDb where MIDa <= MIDb
| 1+ | p | Batsim->Sched | MID1,MID2,MIDn=PSTATE | Tells the scheduler that the power state of one or several machines has changed. Each MIDk part can be a single machine ID or a closed interval MIDa-MIDb where MIDa <= MIDb. There is one and only one 'p' message for each 'P' message.
| 1+ | R | Sched->Batsim | JOB_ID | Job rejection: the scheduler tells that one (static) job will not be computed.
| 1+ | n | Sched->Batsim | TIME | NOP me later: the scheduler asks to be awaken at the given simulation time TIME.
| 1+ | E | Sched->Batsim | No content | Asks Batsim about the total consumed energy (from time 0 to now) in Joules. Works only in energy mode.
| 1+ | e | Batsim->Sched | CONSUMED_ENERGY | Batsim tells the total consumed energy (from time 0 to now) in Joules. Works only in energy mode. There is one and only one 'e' message for each 'E' message.
| Proto. version | Stamp | Direction | Content syntax | Meaning
|---------------- |-------|-------------- |-------------------------------- |-------------
| 2+ | S | Bastim->Sched | WLOAD!JOB_ID | Job submission: job JOB_ID of workload WLOAD is available and can now be allocated to resources.
| 2+ | C | Batsim->Sched | WLOAD!JOB_ID | Job completion: job JOB_ID of workload WLOAD finished its execution.
| 2+ | J | Sched->Batsim | WLOAD!JID1=MID1,MID2,MIDn[;...] | Job allocation: tells to put job JID1 of workload WLOAD on machines MID1, ..., MIDn. Many jobs might be allocated in the same event. Each MIDk part can be a single machine ID or a closed interval MIDa-MIDb where MIDa <= MIDb
| 0+ | N | Both | No content | NOP: tells to do nothing / nothing happened.
| 1+ | P | Sched->Batsim | MID1,MID2,MIDn=PSTATE | Asks to change the power state of some machines. Each MIDk part can be a single machine ID or a closed interval MIDa-MIDb where MIDa <= MIDb
| 1+ | p | Batsim->Sched | MID1,MID2,MIDn=PSTATE | Tells the scheduler that the power state of one or several machines has changed. Each MIDk part can be a single machine ID or a closed interval MIDa-MIDb where MIDa <= MIDb. There is one and only one 'p' message for each 'P' message.
| 2+ | R | Sched->Batsim | WLOAD!JOB_ID | Job rejection: the scheduler tells that one (static) job will not be computed.
| 1+ | n | Sched->Batsim | TIME | NOP me later: the scheduler asks to be awaken at the given simulation time TIME.
| 1+ | E | Sched->Batsim | No content | Asks Batsim about the total consumed energy (from time 0 to now) in Joules. Works only in energy mode.
| 1+ | e | Batsim->Sched | CONSUMED_ENERGY | Batsim tells the total consumed energy (from time 0 to now) in Joules. Works only in energy mode. There is one and only one 'e' message for each 'E' message.
# Message Examples #
## Static Job Submission ##
Batsim -> Scheduler
0:10.000015|10.000015:S:1
0:13|12:S:2|12.5:S:3|13:S:4
0:10.000015|10.000015:S:static!1
0:13|12:S:2|12.5:S:3|13:S:static!4
## Static Job Completion ##
Batsim -> Scheduler
0:15.836694|15.836694:C:1
0:40.001320|25:C:2|38.002565:C:3
0:15.836694|15.836694:C:static!1
0:40.001320|25:C:2|38.002565:C:static!3
## Static Job Allocation ##
Scheduler -> Batsim
0:15.000015|15.000015:J:1=1,2,0,3;2=3
0:45.00132|45.00132:J:4=3,1,2,0
0:15.000015|15.000015:J:static!1=1,2,0,3;static!2=3
0:45.00132|45.00132:J:static!4=3,1,2,0
## NOP ##
Scheduler -> Batsim or Batsim -> Scheduler
......@@ -85,7 +85,7 @@ the job ID of the job which just completed. This part is not mandatory, it depen
## Static Job Rejection ##
Scheduler -> Batsim
0:50|50:R:5
0:50|50:R:static!5
## NOP Me Later ##
Scheduler -> Batsim
......
......@@ -149,6 +149,11 @@ class Batsim(object):
# [ (timestamp, txtDATA), ...]
self._msgs_to_send = []
# TODO: job identifiers are now WORKLOAD!JOB_NUMBER.
# The hack in the next loop allows pybatsim to still work with static
# jobs, but the new syntax should be handled so pybatsim also handles
# dynamic jobs.
for i in range(1, len(sub_msgs)):
data = sub_msgs[i].split(':')
if data[1] == 'R':
......@@ -156,10 +161,15 @@ class Batsim(object):
elif data[1] == 'N':
self.scheduler.onNOP()
elif data[1] == 'S':
self.scheduler.onJobSubmission(self.jobs[int(data[2])])
# Received WORKLOAD_NAME!JOB_ID
workload_name, job_id = data[2].split('!')
job_id = int(job_id)
self.scheduler.onJobSubmission(self.jobs[job_id])
self.nb_jobs_recieved += 1
elif data[1] == 'C':
j = self.jobs[int(data[2])]
workload_name, job_id = data[2].split('!')
job_id = int(job_id)
j = self.jobs[job_id]
j.finish_time = float(data[0])
self.scheduler.onJobCompletion(j)
elif data[1] == 'p':
......
......@@ -289,9 +289,12 @@ int main(int argc, char * argv[])
context.allow_space_sharing = mainArgs.allow_space_sharing;
context.trace_schedule = mainArgs.enable_schedule_tracing;
// Loading the static workload
int nb_machines_by_workload;
load_json_workload(&context, mainArgs.workloadFilename, nb_machines_by_workload);
context.jobs.setProfiles(&context.profiles);
const string static_workload_name = "static";
Workload * static_workload = new Workload;
static_workload->load_from_json(mainArgs.workloadFilename, nb_machines_by_workload);
context.workloads.insert_workload(static_workload_name, static_workload);
int limit_machines_count = -1;
if ((mainArgs.limit_machines_count_by_workload) && (mainArgs.limit_machines_count > 0))
......@@ -305,7 +308,7 @@ int main(int argc, char * argv[])
XBT_INFO("The number of machines will be limited to %d", limit_machines_count);
XBT_INFO("Checking whether SMPI is used or not...");
context.smpi_used = context.jobs.containsSMPIJob();
context.smpi_used = static_workload->jobs->containsSMPIJob();
if (!context.smpi_used)
{
XBT_INFO("SMPI will NOT be used.");
......@@ -314,7 +317,7 @@ int main(int argc, char * argv[])
else
{
XBT_INFO("SMPI will be used.");
register_smpi_applications(&context);
static_workload->register_smpi_applications();
SMPI_init();
}
......@@ -380,7 +383,8 @@ int main(int argc, char * argv[])
XBT_INFO("Creating jobs_submitter process...");
JobSubmitterProcessArguments * submitterArgs = new JobSubmitterProcessArguments;
submitterArgs->context = &context;
MSG_process_create("jobs_submitter", job_submitter_process, (void*)submitterArgs, masterMachine->host);
submitterArgs->workload_name = static_workload_name;
MSG_process_create("jobs_submitter", static_job_submitter_process, (void*)submitterArgs, masterMachine->host);
XBT_INFO("The jobs_submitter process has been created.");
XBT_INFO("Creating the uds_server process...");
......
......@@ -12,6 +12,7 @@
#include "export.hpp"
#include "pstate.hpp"
#include "storage.hpp"
#include "workload.hpp"
/**
* @brief The Batsim context
......@@ -20,8 +21,7 @@ struct BatsimContext
{
UnixDomainSocket socket; //!< The UnixDomainSocket
Machines machines; //!< The machines
Jobs jobs; //!< The jobs
Profiles profiles; //!< The profiles
Workloads workloads; //!< The workloads
PajeTracer paje_tracer; //!< The PajeTracer
PStateChangeTracer pstate_tracer; //!< The PStateChangeTracer
EnergyConsumptionTracer energy_tracer; //!< The EnergyConsumptionTracer
......
......@@ -492,41 +492,48 @@ void exportJobsToCSV(const std::string &filename, const BatsimContext *context)
xbt_assert(f.is_open(), "Cannot write file '%s'", filename.c_str());
// write headers
f << "jobID,submission_time,requested_number_of_processors,requested_time,success,starting_time,execution_time,finish_time,waiting_time,turnaround_time,stretch,consumed_energy,allocated_processors\n";
f << "job_number,workload_name,submission_time,requested_number_of_processors,requested_time,success,starting_time,execution_time,finish_time,waiting_time,turnaround_time,stretch,consumed_energy,allocated_processors\n";
const auto & jobs = context->jobs.jobs();
for (const auto & mit : jobs)
for (const auto mit : context->workloads.workloads())
{
Job * job = mit.second;
string workload_name = mit.first;
const Workload * workload = mit.second;
if (job->state == JobState::JOB_STATE_COMPLETED_SUCCESSFULLY || job->state == JobState::JOB_STATE_COMPLETED_KILLED)
const auto & jobs = workload->jobs->jobs();
for (const auto & mit : jobs)
{
char * buf = nullptr;
int success = (job->state == JobState::JOB_STATE_COMPLETED_SUCCESSFULLY);
xbt_assert(job->runtime >= 0);
int ret = asprintf(&buf, "%d,%lf,%d,%lf,%d,%lf,%lf,%lf,%lf,%lf,%lf,%Lf,", // finished by a ',' because the next part is written after asprintf
job->id,
job->submission_time,
job->required_nb_res,
job->walltime,
success,
job->starting_time,
job->runtime,
job->starting_time + job->runtime, // finish_time
job->starting_time - job->submission_time, // waiting_time
job->starting_time + job->runtime - job->submission_time, // turnaround_time
(job->starting_time + job->runtime - job->submission_time) / job->runtime, // stretch
job->consumed_energy
);
(void) ret; // Avoids a warning if assertions are ignored
xbt_assert(ret != -1, "asprintf failed (not enough memory?)");
f << buf;
free(buf);
xbt_assert((int)job->allocation.size() == job->required_nb_res);
f << job->allocation.to_string_hyphen(" ") << "\n";
Job * job = mit.second;
if (job->state == JobState::JOB_STATE_COMPLETED_SUCCESSFULLY || job->state == JobState::JOB_STATE_COMPLETED_KILLED)
{
char * buf = nullptr;
int success = (job->state == JobState::JOB_STATE_COMPLETED_SUCCESSFULLY);
xbt_assert(job->runtime >= 0);
int ret = asprintf(&buf, "%d,%s,%lf,%d,%lf,%d,%lf,%lf,%lf,%lf,%lf,%lf,%Lf,", // finished by a ',' because the next part is written after asprintf
job->number, // job_id
workload_name.c_str(), // workload_name
job->submission_time, // submission_time
job->required_nb_res, // requested_number_of_processors
job->walltime, // requested_time
success, // success
job->starting_time, // starting_time
job->runtime, // execution_time
job->starting_time + job->runtime, // finish_time
job->starting_time - job->submission_time, // waiting_time
job->starting_time + job->runtime - job->submission_time, // turnaround_time
(job->starting_time + job->runtime - job->submission_time) / job->runtime, // stretch
job->consumed_energy // consumed energy
);
(void) ret; // Avoids a warning if assertions are ignored
xbt_assert(ret != -1, "asprintf failed (not enough memory?)");
f << buf;
free(buf);
xbt_assert((int)job->allocation.size() == job->required_nb_res);
f << job->allocation.to_string_hyphen(" ") << "\n";
}
}
}
......@@ -552,33 +559,38 @@ void exportScheduleToCSV(const std::string &filename, const BatsimContext *conte
long double seconds_used_by_scheduler = context->microseconds_used_by_scheduler / (long double)1e6;
const auto & jobs = context->jobs.jobs();
for (const auto & mit : jobs)
for (const auto mit : context->workloads.workloads())
{
Job * job = mit.second;
const Workload * workload = mit.second;
if (job->state == JobState::JOB_STATE_COMPLETED_SUCCESSFULLY || job->state == JobState::JOB_STATE_COMPLETED_KILLED)
const auto & jobs = workload->jobs->jobs();
for (const auto & mit : jobs)
{
nb_jobs_finished++;
Job * job = mit.second;
if (job->runtime < min_job_execution_time)
min_job_execution_time = job->runtime;
if (job->runtime > max_job_execution_time)
max_job_execution_time = job->runtime;
if (job->state == JobState::JOB_STATE_COMPLETED_SUCCESSFULLY || job->state == JobState::JOB_STATE_COMPLETED_KILLED)
{
nb_jobs_finished++;
if (job->state == JobState::JOB_STATE_COMPLETED_SUCCESSFULLY)
nb_jobs_success++;
else
nb_jobs_killed++;
if (job->runtime < min_job_execution_time)
min_job_execution_time = job->runtime;
if (job->runtime > max_job_execution_time)
max_job_execution_time = job->runtime;
double completion_time = job->starting_time + job->runtime;
double turnaround_time = job->starting_time + job->runtime - job->submission_time;
if (job->state == JobState::JOB_STATE_COMPLETED_SUCCESSFULLY)
nb_jobs_success++;
else
nb_jobs_killed++;
if (completion_time > makespan)
makespan = completion_time;
double completion_time = job->starting_time + job->runtime;
double turnaround_time = job->starting_time + job->runtime - job->submission_time;
if (turnaround_time > max_turnaround_time)
max_turnaround_time = turnaround_time;
if (completion_time > makespan)
makespan = completion_time;
if (turnaround_time > max_turnaround_time)
max_turnaround_time = turnaround_time;
}
}
}
......
......@@ -151,3 +151,8 @@ IPMessage::~IPMessage()
data = nullptr;
}
string JobIdentifier::to_string() const
{
return workload_name + '!' + std::to_string(job_number);
}
......@@ -14,6 +14,22 @@
struct BatsimContext;
/**
* @brief A simple structure used to identify one job
*/
struct JobIdentifier
{
std::string workload_name; //!< The name of the workload the job belongs to
int job_number; //!< The job unique number inside its workload
/**
* @brief Returns a string representation of the JobIdentifier.
* @details Output format is WORKLOAD_NAME!JOB_NUMBER
* @return A string representation of the JobIdentifier.
*/
std::string to_string() const;
};
/**
* @brief Stores the different types of inter-process messages
*/
......@@ -40,7 +56,7 @@ enum class IPMessageType
*/
struct JobSubmittedMessage
{
int job_id; //!< The job ID
JobIdentifier job_id; //!< The JobIdentifier
};
/**
......@@ -48,7 +64,7 @@ struct JobSubmittedMessage
*/
struct JobCompletedMessage
{
int job_id; //!< The job ID
JobIdentifier job_id; //!< The JobIdentifier
};
/**
......@@ -56,7 +72,7 @@ struct JobCompletedMessage
*/
struct JobRejectedMessage
{
int job_id; //!< The job ID
JobIdentifier job_id; //!< The JobIdentifier
};
/**
......@@ -64,7 +80,7 @@ struct JobRejectedMessage
*/
struct SchedulingAllocation
{
int job_id; //!< The job unique number
JobIdentifier job_id; //!< The JobIdentifier
MachineRange machine_ids; //!< The IDs of the machines on which the job should be allocated
std::vector<msg_host_t> hosts; //!< The corresponding SimGrid hosts
};
......@@ -177,6 +193,7 @@ struct SwitchPStateProcessArguments
struct JobSubmitterProcessArguments
{
BatsimContext * context; //!< The BatsimContext
std::string workload_name; //!< The name of the workload the submitter should use
};
/**
......
......@@ -14,7 +14,7 @@
using namespace std;
int job_submitter_process(int argc, char *argv[])
int static_job_submitter_process(int argc, char *argv[])
{
(void) argc;
(void) argv;
......@@ -22,13 +22,19 @@ int job_submitter_process(int argc, char *argv[])
JobSubmitterProcessArguments * args = (JobSubmitterProcessArguments *) MSG_process_get_data(MSG_process_self());
BatsimContext * context = args->context;
xbt_assert(context->workloads.exists(args->workload_name),
"Error: a static_job_submitter_process is in charge of workload '%s', "
"which does not exist", args->workload_name.c_str());
Workload * workload = context->workloads.at(args->workload_name);
send_message("server", IPMessageType::SUBMITTER_HELLO);
double previousSubmissionDate = MSG_get_clock();
vector<const Job *> jobsVector;
const auto & jobs = context->jobs.jobs();
const auto & jobs = workload->jobs->jobs();
for (const auto & mit : jobs)
{
const Job * job = mit.second;
......@@ -46,8 +52,17 @@ int job_submitter_process(int argc, char *argv[])
if (job->submission_time > previousSubmissionDate)
MSG_process_sleep(job->submission_time - previousSubmissionDate);
// Let's put the metadata about the job into the data storage
string job_id_string = args->workload_name + "!" + to_string(job->number);
string job_key = "job_" + job_id_string;
string profile_key = "profile_" + job_id_string;
context->storage.set(job_key, job->json_description);
context->storage.set(profile_key, workload->profiles->at(job->profile)->json_description);
// Let's now continue the simulation
JobSubmittedMessage * msg = new JobSubmittedMessage;
msg->job_id = job->id;
msg->job_id.workload_name = args->workload_name;
msg->job_id.job_number = job->number;
send_message("server", IPMessageType::JOB_SUBMITTED, (void*)msg);
previousSubmissionDate = MSG_get_clock();
......
......@@ -6,9 +6,9 @@
#pragma once
/**
* @brief The process in charge of submitting jobs
* @brief The process in charge of submitting static jobs (those described before running the simulations)
* @param argc The number of arguments
* @param argv The argument values
* @return 0
*/
int job_submitter_process(int argc, char *argv[]);
int static_job_submitter_process(int argc, char *argv[]);
......@@ -16,6 +16,8 @@
#include <simgrid/msg.h>
#include <rapidjson/document.h>
#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include "profiles.hpp"
......@@ -42,6 +44,11 @@ void Jobs::setProfiles(Profiles *profiles)
_profiles = profiles;
}
void Jobs::setWorkload(Workload *workload)
{
_workload = workload;
}
void Jobs::load_from_json(const Document &doc, const string &filename)
{
(void) filename; // Avoids a warning if assertions are ignored
......@@ -53,6 +60,7 @@ void Jobs::load_from_json(const Document &doc, const string &filename)
for (SizeType i = 0; i < jobs.Size(); i++) // Uses SizeType instead of size_t
{
Job * j = new Job;
j->workload = _workload;
j->starting_time = -1;
j->runtime = -1;
j->state = JobState::JOB_STATE_NOT_SUBMITTED;
......@@ -63,26 +71,32 @@ void Jobs::load_from_json(const Document &doc, const string &filename)
xbt_assert(job.HasMember("id"), "Invalid JSON file '%s': one job has no 'id' field", filename.c_str());
xbt_assert(job["id"].IsInt(), "Invalid JSON file '%s': one job has a non-integral 'id' field ('%s')", filename.c_str(), job["id"].GetString());
j->id = job["id"].GetInt();
j->number = job["id"].GetInt();
xbt_assert(job.HasMember("subtime"), "Invalid JSON file '%s': job %d has no 'subtime' field", filename.c_str(), j->id);
xbt_assert(job["subtime"].IsNumber(), "Invalid JSON file '%s': job %d has a non-number 'subtime' field", filename.c_str(), j->id);
xbt_assert(job.HasMember("subtime"), "Invalid JSON file '%s': job %d has no 'subtime' field", filename.c_str(), j->number);
xbt_assert(job["subtime"].IsNumber(), "Invalid JSON file '%s': job %d has a non-number 'subtime' field", filename.c_str(), j->number);
j->submission_time = job["subtime"].GetDouble();
xbt_assert(job.HasMember("walltime"), "Invalid JSON file '%s': job %d has no 'walltime' field", filename.c_str(), j->id);
xbt_assert(job["walltime"].IsNumber(), "Invalid JSON file '%s': job %d has a non-number 'walltime' field", filename.c_str(), j->id);
xbt_assert(job.HasMember("walltime"), "Invalid JSON file '%s': job %d has no 'walltime' field", filename.c_str(), j->number);
xbt_assert(job["walltime"].IsNumber(), "Invalid JSON file '%s': job %d has a non-number 'walltime' field", filename.c_str(), j->number);
j->walltime = job["walltime"].GetDouble();
xbt_assert(job.HasMember("res"), "Invalid JSON file '%s': job %d has no 'res' field", filename.c_str(), j->id);
xbt_assert(job["res"].IsInt(), "Invalid JSON file '%s': job %d has a non-number 'res' field", filename.c_str(), j->id);
xbt_assert(job.HasMember("res"), "Invalid JSON file '%s': job %d has no 'res' field", filename.c_str(), j->number);
xbt_assert(job["res"].IsInt(), "Invalid JSON file '%s': job %d has a non-number 'res' field", filename.c_str(), j->number);
j->required_nb_res = job["res"].GetInt();
xbt_assert(job.HasMember("profile"), "Invalid JSON file '%s': job %d has no 'profile' field", filename.c_str(), j->id);
xbt_assert(job["profile"].IsString(), "Invalid JSON file '%s': job %d has a non-string 'profile' field", filename.c_str(), j->id);
xbt_assert(job.HasMember("profile"), "Invalid JSON file '%s': job %d has no 'profile' field", filename.c_str(), j->number);
xbt_assert(job["profile"].IsString(), "Invalid JSON file '%s': job %d has a non-string 'profile' field", filename.c_str(), j->number);
j->profile = job["profile"].GetString();
xbt_assert(!exists(j->id), "Invalid JSON file '%s': duplication of job id %d", filename.c_str(), j->id);
_jobs[j->id] = j;
// Let's get the JSON string which describes the job
rapidjson::StringBuffer buffer;
rapidjson::Writer<rapidjson::StringBuffer> writer(buffer);
job.Accept(writer);
j->json_description = buffer.GetString();
xbt_assert(!exists(j->number), "Invalid JSON file '%s': duplication of job id %d", filename.c_str(), j->number);
_jobs[j->number] = j;
}
}
......@@ -100,6 +114,16 @@ const Job *Jobs::operator[](int job_id) const
return it->second;
}
Job *Jobs::at(int job_id)
{
return operator[](job_id);
}
const Job *Jobs::at(int job_id) const
{
return operator[](job_id);
}
bool Jobs::exists(int job_id) const
{
auto it = _jobs.find(job_id);
......@@ -124,7 +148,7 @@ void Jobs::displayDebug() const
vector<string> jobsVector;
for (auto & mit : _jobs)
{
jobsVector.push_back(std::to_string(mit.second->id));
jobsVector.push_back(std::to_string(mit.second->number));
}
// Let us create the string that will be sent to XBT_INFO
......
......@@ -13,6 +13,7 @@
#include "machine_range.hpp"
class Profiles;
class Workload;
/**
* @brief Contains the different states a job can be in
......@@ -32,12 +33,15 @@ enum class JobState
*/
struct Job
{
int id; //!< The unique job number
Workload * workload = nullptr; //!< The workload the job belongs to
int number; //!< The job unique number within its workload
std::string profile; //!< The job profile name. The corresponding profile tells how the job should be computed
double submission_time; //!< The job submission time: The time at which the becomes available
double walltime; //!< The job walltime: if the job is executed for more than this amount of time, it will be killed
int required_nb_res; //!< The number of resources the job is requested to be executed on
std::string json_description; //!< The JSON description of the job
long double consumed_energy; //!< The sum, for each machine on which the job has been allocated, of the consumed energy (in Joules) during the job execution time (consumed_energy_after_job_completion - consumed_energy_before_job_start)
double starting_time; //!< The time at which the job starts to be executed.
......@@ -75,6 +79,12 @@ public:
*/
void setProfiles(Profiles * profiles);
/**
* @brief Sets the Workload within which this Jobs instance exist
* @param[in] workload The Workload
*/
void setWorkload(Workload * workload);
/**
* @brief Loads the jobs from a JSON document
* @param[in] doc The JSON document
......@@ -96,6 +106,20 @@ public:
*/
const Job * operator[](int job_id) const;
/**
* @brief Accesses one job thanks to its unique number
* @param[in] job_id The job unique number
* @return A pointer to the job associated to the given job number
*/
Job * at(int job_id);
/**
* @brief Accesses one job thanks to its unique number (const version)
* @param[in] job_id The job unique number
* @return A (const) pointer to the job associated to the given job number
*/
const Job * at(int job_id) const;
/**
* @brief Allows to know whether a job exists
* @param[in] job_id The unique job number
......@@ -122,5 +146,6 @@ public:
private:
std::map<int, Job*> _jobs; //!< The std::map which contains the jobs
Profiles * _profiles; //!< The profiles associated with the jobs
Profiles * _profiles = nullptr; //!< The profiles associated with the jobs
Workload * _workload = nullptr; //!< The Workload the jobs belong to
};
<
......@@ -49,10 +49,11 @@ int smpi_replay_process(int argc, char *argv[])
int execute_profile(BatsimContext *context,
const std::string & profile_name,
const SchedulingAllocation * allocation,
double *remaining_time)
double * remaining_time)
{
Job * job = context->jobs[allocation->job_id];
Profile * profile = context->profiles[profile_name];
Workload * workload = context->workloads.at(allocation->job_id.workload_name);
Job * job = workload->jobs->at(allocation->job_id.job_number);
Profile * profile = workload->profiles->at(profile_name);
int nb_res = job->required_nb_res;
if (profile->type == ProfileType::MSG_PARALLEL_HOMOGENEOUS)
......@@ -85,7 +86,7 @@ int execute_profile(BatsimContext *context,
}
}
string task_name = "phg " + to_string(job->id) + "'" + job->profile + "'";
string task_name = "phg " + to_string(job->number) + "'" + job->profile + "'";
XBT_INFO("Creating task '%s'", task_name.c_str());
msg_task_t ptask = MSG_parallel_task_create(task_name.c_str(),
......@@ -129,7 +130,7 @@ int execute_profile(BatsimContext *context,
memcpy(computation_amount, data->cpu, sizeof(double) * nb_res);
memcpy(communication_amount, data->com, sizeof(double) * nb_res * nb_res);
string task_name = "p " + to_string(job->id) + "'" + job->profile + "'";
string task_name = "p " + to_string(job->number) + "'" + job->profile + "'";
XBT_INFO("Creating task '%s'", task_name.c_str());
msg_task_t ptask = MSG_parallel_task_create(task_name.c_str(),
......@@ -204,7 +205,7 @@ int execute_profile(BatsimContext *context,
for (int i = 0; i < nb_res; ++i)
{
char *str_instance_id = NULL;
int ret = asprintf(&str_instance_id, "%d", job->id);
int ret = asprintf(&str_instance_id, "%d", job->number);
xbt_assert(ret != -1, "asprintf failed (not enough memory?)");
char *str_rank_id = NULL;
......@@ -212,7 +213,7 @@ int execute_profile(BatsimContext *context,
xbt_assert(ret != -1, "asprintf failed (not enough memory?)");
char *str_pname = NULL;
ret = asprintf(&str_pname, "%d_%d", job->id, i);
ret = asprintf(&str_pname, "%d_%d", job->number, i);
xbt_assert(ret != -1, "asprintf failed (not enough memory?)");