TANSIV issueshttps://gitlab.inria.fr/tansiv/tansiv/-/issues2023-03-14T10:32:54+01:00https://gitlab.inria.fr/tansiv/tansiv/-/issues/27Make tansiv provide the socket address as an argument when deploying tansiv c...2023-03-14T10:32:54+01:00RILLING LouisMake tansiv provide the socket address as an argument when deploying tansiv clientsThe socket address of the `tansiv` coordinator is currently [hardcoded in tansiv](src/vsg/vsg.h#L8) but is taken as a parameter by `tansiv-client` (see the [implementation](src/client/tansiv-client/src/config.rs#L10-12) and usage in [Qem...The socket address of the `tansiv` coordinator is currently [hardcoded in tansiv](src/vsg/vsg.h#L8) but is taken as a parameter by `tansiv-client` (see the [implementation](src/client/tansiv-client/src/config.rs#L10-12) and usage in [Qemu](https://gitlab.inria.fr/msimonin/qemu/-/blob/tantap/vl.c#L4273) as well as in [sample programs](examples/send/send.cpp#L55)). We should keep this address as a parameter to anticipate evolutions with clients deployed in remote machines and communicating with TCP.
To make the current code more consistent, we should let `tansiv` provide its clients with the socket address as a parameter when deploying them.SIMONIN MatthieuSIMONIN Matthieuhttps://gitlab.inria.fr/tansiv/tansiv/-/issues/22Split vsg.h between protocol types and test-specific C functions and vsg/* to...2022-03-25T18:01:47+01:00RILLING LouisSplit vsg.h between protocol types and test-specific C functions and vsg/* to tests/Currently [vsg.h](src/vsg/vsg.h) includes both VSG protocol types definitions and declarations of test-specific functions.
To make it cleaner we should:
- [ ] Move protocol types definitions to `src/include/vsg_types.h`
- [ ] Move the r...Currently [vsg.h](src/vsg/vsg.h) includes both VSG protocol types definitions and declarations of test-specific functions.
To make it cleaner we should:
- [ ] Move protocol types definitions to `src/include/vsg_types.h`
- [ ] Move the remaining parts to `src/tests/vsg.h`
Finally, to empty the [vsg](src/vsg) directory, we should:
- [x] Remove the usage of [log.h](src/vsg/log.h) in [vsg.c](src/vsg/vsg.c)
- [x] Remove [log.h](src/vsg/log.h) and [log.c](src/vsg/log.c)
- [ ] Move [vsg.c](src/vsg/vsg.c) to [tests](src/tests)https://gitlab.inria.fr/tansiv/tansiv/-/issues/26Rework the wire-level protocol of VSG for better safety2022-03-25T17:57:57+01:00RILLING LouisRework the wire-level protocol of VSG for better safetyThe wire-level protocol of VSG just writes raw C structs in UNIX sockets. While it is simple a major drawback is that it is easy to get nasty bugs in the implementation of any party of the protocol. The Rust implementation currently reli...The wire-level protocol of VSG just writes raw C structs in UNIX sockets. While it is simple a major drawback is that it is easy to get nasty bugs in the implementation of any party of the protocol. The Rust implementation currently relies on the custom [binser](src/binser) crate to get some safety while encoding and decoding protocol messages. This brings some complexity while other approaches could be more reliable:
- [ ] (No change in the wire protocol) Check if the [abomonation crate](https://github.com/TimelyDataflow/abomonation) or some wrapper could provide the same safety guarantees in `tansiv-client`
- [x] Consider changing the wire protocol using a cross-language message-passing serialization framework, like:
- [Protocol Buffers](https://developers.google.com/protocol-buffers) (Rust implementation with crate [protobuf](https://github.com/stepancheg/rust-protobuf/))
- [Cap'n Proto](https://capnproto.org/) (Rust implementation with crates [capnp*](https://github.com/capnproto/capnproto-rust))
- [FlatBuffers](https://google.github.io/flatbuffers/) (features C, C++, Rust, Python... implementations)SIMONIN MatthieuSIMONIN Matthieuhttps://gitlab.inria.fr/tansiv/tansiv/-/issues/28[tansiv-client] TestActor::check has no effect on the tests outcome => possib...2022-02-21T14:10:02+01:00SIMONIN Matthieu[tansiv-client] TestActor::check has no effect on the tests outcome => possible reason TestActor::check might be never calledStep to reproduce:
- Force a failure somewhere
```
diff --git a/src/client/tansiv-client/src/connector/unix.rs b/src/client/tansiv-client/src/connector/unix.rs
index ad2ecbc..b870906 100644
--- a/src/client/tansiv-client/src/connector/un...Step to reproduce:
- Force a failure somewhere
```
diff --git a/src/client/tansiv-client/src/connector/unix.rs b/src/client/tansiv-client/src/connector/unix.rs
index ad2ecbc..b870906 100644
--- a/src/client/tansiv-client/src/connector/unix.rs
+++ b/src/client/tansiv-client/src/connector/unix.rs
@@ -600,7 +600,7 @@ mod test {
let seconds = ref_send_time.as_secs();
let useconds = ref_send_time.subsec_micros();
TestActor::check_eq(msg.send_time.seconds, seconds, "Received wrong value for Time::seconds")?;
- TestActor::check_eq(msg.send_time.useconds, useconds as u64, "Received wrong value for Time::useconds")?;
+ TestActor::check_eq(msg.send_time.useconds, useconds + 1 as u64, "Received wrong value for Time::useconds")?;
let payload_len = msg.packet.size as usize;
TestActor::check_eq(&buffer[..payload_len], ref_payload.deref(), "Received wrong payload")
} else {
```
My understanding is that:
- Child Actors exit with status code of 0 (https://gitlab.inria.fr/tansiv/tansiv/-/blob/76fdca497c7866fa2f13ee64d3cef2a3e82bdcd9/src/client/tansiv-client/src/connector/unix.rs#L108 )
- Parent process doesn't care about the actual exit code of the child: https://gitlab.inria.fr/tansiv/tansiv/-/blob/76fdca497c7866fa2f13ee64d3cef2a3e82bdcd9/src/client/tansiv-client/src/connector/unix.rs#L136
EDIT:
Looks like there's a race here:
- https://gitlab.inria.fr/tansiv/tansiv/-/blob/76fdca497c7866fa2f13ee64d3cef2a3e82bdcd9/src/client/tansiv-client/src/connector/unix.rs#L284-290
If the child actor is busy working it will be killed (gracefully) by its parent :(
Maybe we need to fix that somehow (I've been busy with curating the tests for the flatbuffers when facing this)RILLING LouisRILLING Louishttps://gitlab.inria.fr/tansiv/tansiv/-/issues/24CI: Remove build stage2021-12-06T12:19:20+01:00SIMONIN MatthieuCI: Remove build stagehttps://gitlab.inria.fr/tansiv/tansiv/-/issues/23Fix test "VSG send piggyback port"2021-12-06T12:19:20+01:00RILLING LouisFix test "VSG send piggyback port"See failure of Job [#1469717](https://gitlab.inria.fr/tansiv/tansiv/-/jobs/1469717#L2562)See failure of Job [#1469717](https://gitlab.inria.fr/tansiv/tansiv/-/jobs/1469717#L2562)https://gitlab.inria.fr/tansiv/tansiv/-/issues/21Give more consistent names to tansiv components in the source tree2021-12-06T12:19:20+01:00RILLING LouisGive more consistent names to tansiv components in the source treeThe following directories should be renamed:
- [x] [src/fake-vm/fake_vm](src/fake-vm/fake_vm) -> `src/fake-vm/tanproc`
- [x] [src/fake-vm/fake_vm_capi](src/fake-vm/fake_vm_capi) -> `src/fake-vm/tanproc_capi`
- [x] [src/fake-vm](src/fake-...The following directories should be renamed:
- [x] [src/fake-vm/fake_vm](src/fake-vm/fake_vm) -> `src/fake-vm/tanproc`
- [x] [src/fake-vm/fake_vm_capi](src/fake-vm/fake_vm_capi) -> `src/fake-vm/tanproc_capi`
- [x] [src/fake-vm](src/fake-vm) -> `src/client`
- [x] [src/simgrid](src/simgrid) -> `src/coordinator`
Moreover, Rust crates that are only custom dependencies of `tansiv-client` should be moved in dedicated sub-direcory:
- [x] Create directory `src/rust-deps` and populate it with [src/binser](src/binser), [src/crossbeam](src/crossbeam), [src/libc_timer](src/libc_timer), [src/seq_lock](src/seq_lock)RILLING LouisRILLING Louishttps://gitlab.inria.fr/tansiv/tansiv/-/issues/20Fix logic of Test "VSG receive one message"2021-12-06T12:19:20+01:00RILLING LouisFix logic of Test "VSG receive one message"Test [VSG receive one message](src/tests/tests.cpp#L69-91) sometimes fails like [this](https://gitlab.inria.fr/quinson/2018-vsg/-/jobs/1464903#L2536).
The test logic is actually flawed. Because of timing uncertainty, [scenario.cpp::recv...Test [VSG receive one message](src/tests/tests.cpp#L69-91) sometimes fails like [this](https://gitlab.inria.fr/quinson/2018-vsg/-/jobs/1464903#L2536).
The test logic is actually flawed. Because of timing uncertainty, [scenario.cpp::recv_one()](src/tests/scenario.cpp#L137-139) should accept an arbitrary number of `AtDeadline` messages before expecting `SendPacket`.https://gitlab.inria.fr/tansiv/tansiv/-/issues/19Build is broken due to `ld`2021-12-06T12:19:20+01:00SIMONIN MatthieuBuild is broken due to `ld`https://gitlab.inria.fr/quinson/2018-vsg/-/jobs/1430477#L3737https://gitlab.inria.fr/quinson/2018-vsg/-/jobs/1430477#L3737https://gitlab.inria.fr/tansiv/tansiv/-/issues/18CI / docker: make docker tagging a bit more thorough2021-12-06T12:19:20+01:00SIMONIN MatthieuCI / docker: make docker tagging a bit more thoroughCurrently any commits (whatever the branch) overrides the latest docker tag. This can very very confusing !!
So we propose to
1. Tag based on branch name and keep the information of the corresponding commit id (use docker labels)
Th...Currently any commits (whatever the branch) overrides the latest docker tag. This can very very confusing !!
So we propose to
1. Tag based on branch name and keep the information of the corresponding commit id (use docker labels)
This should be triggered on every push.
- [x] Change the .gitlab-ci.yml to named the docker image after the branch name: `...tansiv-$branch:latest`
2. On tag, rebuild (if necessary) the docker image to make it persistent.
- [x] On tag, build the doker image and name it after the tag name: `...tansiv:$tag`
- [x] Optimize by checking if an image is already built on any branch where the current is ( maybe use `git branch --contains` for finding the branches)https://gitlab.inria.fr/tansiv/tansiv/-/issues/9global `all_existing_models` doesn't exist anymore in simgrid2021-12-06T12:19:20+01:00SIMONIN Matthieuglobal `all_existing_models` doesn't exist anymore in simgridThis https://github.com/simgrid/simgrid/commit/e22da6010c6499813ff88c76041cf499ffbf2b67 cleans the use of globals in simgrid.
But currently there's no way to get the list of models (all or by types) in the simgrid API.
I've a local fix ...This https://github.com/simgrid/simgrid/commit/e22da6010c6499813ff88c76041cf499ffbf2b67 cleans the use of globals in simgrid.
But currently there's no way to get the list of models (all or by types) in the simgrid API.
I've a local fix for that btw.https://gitlab.inria.fr/tansiv/tansiv/-/issues/8Running flent for some time leads to send_time to be out of bound (client side)2021-12-06T12:19:20+01:00SIMONIN MatthieuRunning flent for some time leads to send_time to be out of bound (client side)```
[nova-2.lyon.grid5000.fr:sender:(31328) 228.519751] [vm_coordinator/INFO] sending (size 1512) from vm [192.168.120.11], to vm [192.168.120.10] (on pm [nova-1.lyon.grid5000.fr])
2021-01-27 08:54:58,535 ERROR [tansiv_client] send_time ...```
[nova-2.lyon.grid5000.fr:sender:(31328) 228.519751] [vm_coordinator/INFO] sending (size 1512) from vm [192.168.120.11], to vm [192.168.120.10] (on pm [nova-1.lyon.grid5000.fr])
2021-01-27 08:54:58,535 ERROR [tansiv_client] send_time = 228.519894001s is beyond current_deadline = 228.519894s! Aborting
```
Step to reproduce on 7d5582d7e9926171fb90c0ef9f216ca35f070606:
```
python g5k.py deploy
python g5k.py flent
```https://gitlab.inria.fr/tansiv/tansiv/-/issues/7Initial iperf tests fail with: seg fault + [tansiv_client] recv failed: faile...2021-12-06T12:19:19+01:00SIMONIN MatthieuInitial iperf tests fail with: seg fault + [tansiv_client] recv failed: failed to fill whole buffer and a Seg faultStep to reproduce
```
python g5k.py deploy ../packer/packer-debian-10.3.0-x86_64-qemu/debian-10.3.0-x86_64.qcow2 inputs/nova_cluster.xml inputs/deployment_2.xml --cluster paravance
# then manually
10.0.0.10) iperf -s
10.0.0.11) iperf -...Step to reproduce
```
python g5k.py deploy ../packer/packer-debian-10.3.0-x86_64-qemu/debian-10.3.0-x86_64.qcow2 inputs/nova_cluster.xml inputs/deployment_2.xml --cluster paravance
# then manually
10.0.0.10) iperf -s
10.0.0.11) iperf -c tantap10
```
Note: also happen with some more conservative parameters: `-b 1k --mss 500` (1kb/s transfer and 500byte mss).
What we got from the logs (`docker logs tansiv`)
```
Receive a message [src_decode=192.168.120.11] -> transfering to NIC
Segmentation fault.
2020-12-16 15:36:18,703 ERROR [tansiv_client] recv failed: failed to fill whole buffer
2020-12-16 15:36:18,703 ERROR [tansiv_client] recv failed: failed to fill whole buffer
```
NOTE: Sending (small) stuffs over tcp work using netcat:
```
10.0.0.10) nc -l -s tantap10 -p 1234
10.0.0.11) nc tantap10 1234
```
NOTE: Using udp datagrams is ok:
```
10.0.0.10) iperf -u -s
10.0.0.11) iperf -u -c tantap10
------------------------------------------------------------
Client connecting to tantap10, UDP port 5001
Sending 1470 byte datagrams, IPG target: 11215.21 us (kalman adjust)
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.120.11 port 37551 connected with 192.168.120.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec
[ 3] Sent 892 datagrams
[ 3] Server Report:
[ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 0.001 ms 0/ 892 (0%)
```https://gitlab.inria.fr/tansiv/tansiv/-/issues/4Add build and test of fake-vm to CI2021-12-06T12:19:19+01:00RILLING LouisAdd build and test of fake-vm to CIThis should be easy since the Cargo book provides an [example](https://doc.rust-lang.org/cargo/guide/continuous-integration.html#gitlab-ci) of how to do that.
We just have to take care that a [top-level Makefile](src/fake-vm/Makefile) is...This should be easy since the Cargo book provides an [example](https://doc.rust-lang.org/cargo/guide/continuous-integration.html#gitlab-ci) of how to do that.
We just have to take care that a [top-level Makefile](src/fake-vm/Makefile) is used to build and run tests in [fake-vm](src/fake-vm).https://gitlab.inria.fr/tansiv/tansiv/-/issues/3add a gitlab-ci.yml2021-12-06T12:19:19+01:00SIMONIN Matthieuadd a gitlab-ci.ymlIn a first iteration, it's been suggested to
- start from a simgrid image
- install what's missing (build-essentials ... )
- compile tansiv
- compile qemu with our modified version of libslirp
In a first iteration, it's been suggested to
- start from a simgrid image
- install what's missing (build-essentials ... )
- compile tansiv
- compile qemu with our modified version of libslirp
SIMONIN MatthieuSIMONIN Matthieuhttps://gitlab.inria.fr/tansiv/tansiv/-/issues/2libvsg: add a `dest` parameter to `vsg_send`2021-12-06T12:19:19+01:00SIMONIN Matthieulibvsg: add a `dest` parameter to `vsg_send`In the vsg protocol the `dest` must be prepended to the application message.
Currently it's up to the caller of `vsg_send` to craft the vsg message this way, adding a `dest` parameter might help here.In the vsg protocol the `dest` must be prepended to the application message.
Currently it's up to the caller of `vsg_send` to craft the vsg message this way, adding a `dest` parameter might help here.SIMONIN MatthieuSIMONIN Matthieuhttps://gitlab.inria.fr/tansiv/tansiv/-/issues/1fake-vm: Fix tests with deadline signal reaching wrong test threads2021-12-06T12:19:19+01:00RILLING Louisfake-vm: Fix tests with deadline signal reaching wrong test threadsSince internal tests in fake-vm create a thread to mock an actor, the deadline signal may reach one of these actor threads instead of the application thread. More than failing to interrupt the application thread, in some cases this cause...Since internal tests in fake-vm create a thread to mock an actor, the deadline signal may reach one of these actor threads instead of the application thread. More than failing to interrupt the application thread, in some cases this causes the test to deadlock, which is both inefficient testing and bad for automated testing and CI.
To fix this, instead of creating a thread to mock an actor, just fork a new process using the nix crate API.RILLING LouisRILLING Louis