TODO.md



After the Initial Release


Without taking the trouble of using the fuzzer to test the code
in Emulated, we could perhaps develop a suite of cram tests.


Create an iterator directly on a short sequence and avoid the
indirection in ShortPersistentSequence.


Can we compile in -unsafe mode? (Only in release mode.)
Does it make a difference?


Decide if iter_segments should take three separate parameters or
a 3-tuple (an array segment).


Expose a module Segment, with functions such as Segment.iter?


For a stack usage, i.e., a sequence accessed only at back side, the
code can be tweaked so that there is no need to allocate the front
chunk. Moreover, a few tests can be saved, e.g. in is_empty.
The functor settings "back_access_only" should be added to control
this behavior. At any time, the structure can be turned back into
a normal structure by allocating a proper front chunk. We need to
decide whether the front chunk would then be silently restored when
needed, or whether we want to force the programmer to call an explicit
conversion function to restore the front chunk of the sequence.


Should we expose both iter2_lax and iter2,
or just the lax version?
Exposing both is useful if we consider that the user is allowed to
catch Invalid_argument.
If we consider that the user is not allowed to do this, then
we can expose just iter_lax and call it iter.
Similar question for other functions, e.g. exists2, for_all2.
equal is for_all2 (strict).
Addition: I think the best solution for iter2 is to define just one function,
which raises a user exception (not Invalid_argument). Unfortunately, this
does not work for map2, fold_left2, etc. because raising an exception
prevents us from returning a result.
Addition bis: we could expose just one version (the lax version, which
is more flexible) and let the user handle size mismatches however they
wish; they can do so cheaply.


P.filter naturally returns an ephemeral sequence. This is more
powerful than returning a persistent sequence, but it can be a
pain to manually call snapshot_and_clear when a persistent
sequence is desired.
Same question for several other functions.
flatten_map is doubly-tricky. Ideally, its argument f
should have type 'a -> 'b Seq.t. That would be more general
than either a -> 'b Ephemeral.t or a -> 'b Persistent.t.


make and fill in PSeq could share all their schunks.


Document the time complexity of the new operations (iterators, etc.).


Should all of the conversion functions (of_array, to_array, etc.)
take a direction parameter?


Restructure the code so that it compiles with OCaml < 4.08.


Check if specialisation of pov's and measures is performed as expected.
Eta-expansion at the outermost level may be necessary to trigger it.


Check if Segment.iter is properly inlined/specialised in find_weight_index.


Add a constructor Single to ShareableSequence.


Document, dynamically-check, and exploit the invariant that an ephemeral
sequence may or may not be owner of all of its schunks. Maybe add a field
in the ephemeral sequence record to keep track of the fact that the user
wishes to maintain this property, or (better?) just of the fact that this
property is currently true or not. Maintaining the invariant on concat
operations requires particular care.


Consider other values for max_length_of_free_list; benchmark with
LIFO and FIFO scenarios.


Benchmark make and init versus iterated pushes,
out of curiosity. Document the result.


Benchmark: show point clouds or confidence intervals.


Think about reference counting. Think about transactions (take a snapshot,
do some updates, then decide to revert or commit; can we keep unique
ownership?). Think about backtracking scenarios.
Can we implement checkpoint and revert?
What dynamic checks are required in order to detect incorrect usage?


Once odoc issue #417 is fixed,
merge the branch supply_default. Or find a way of reformulating the code
to work around the issue. (Avoid include followed with shadowing.)


What about an approximate_split function that splits near the middle, and
is more efficient than split? (Be careful to control the ratio of
imbalance. Maybe sufficient to simply never a split a chunk at the outermost
level.)


Implement take and drop, i.e., specialized versions of split where
only one part of the sequence is actually constructed.


Benchmark to see if the free list is useful (in some scenarios)
and how much it costs (in general). Remove it if it is useless.
If we keep it, tune its maximum length.


Think about setting up sequential and parallel benchmarks under Multicore OCaml.


Benchmark to measure the benefits and costs of inner buffers.


Emulate Array.


Emulate some Vector (which one?).
Benchmark against CCVector, CCFun_vec, BatVect, Clarity.Vector...
See the benchmarks in ocaml-containers.


Complete the Stack and Queue wrappers with to_seq, add_seq, of_seq.


Benchmark our emulated Stack, Queue, Vector, Array, List, Buffer
against the originals. (Measure time and space.)


Should we expose update, xchg, get_and_update?
Maybe not necessary if we have an iterator with set.


Implement rev on ephemeral sequences. Two variants: in place, not in place.


Think about adding a sense field to obtain O(1) reversal of sequences.


Document iterator invalidation. Detect it at runtime?
Add a version field in ephemeral sequences.
Use a functor parameter to enable these dynamic checks (false by default?).
Note that an attempt by f to modify the collection while iter f is
running should also be detected. The test can be performed just once at
the end of iteration (either upon normal termination, or upon exceptional
termination).


Implement pushn_back to push a range of values from an external array into
the sequence. Likewise, popn to pop and blit into an external array.
(We already have of_array_segment. Avoid excessive redundancy.)


Allow blitting a subsequence into a subarray (to_array_segment).
(Use the iterator to do this?) (Related to above.)


Add blit functions between Sek objects, with a version that
uses sharing and a version that does not introduce any sharing.


Encoding sub in terms of split can be expensive;
propose direct implementations of sub. (Related to above.)


Does the generated code change when we expose the fact that
the type segment is equal to int?
This might allow the compiler to remove the write barrier
when updating a view field.
Is performance affected? Benchmark.


Add relaxed operations where the data structure does not represent
a sequence, but a bag. In that case, all push operations affect
the back chunk. One can maintain the invariant that all chunks in
the middle sequence are full. The bag_concat function can be
simplified and optimised. A bag_split operation splits the bag
into two parts of a specified size.


map and mapi functions are missing.
Add in-place map and mapi for ephemeral sequences.
(The name transform comes to mind.)


Define insert, delete, etc. (either using split and concat, or
directly).


Evaluate the interest of having owners on nodes from ShareableSequences.
The benefit would be to update weights in place on insertion operations.


Report odoc bug where a heading in an included signature (@inline)
does not appear in the table of contents of the current page.


Using an iterator and init, can implement compact.
Or do it by popping segments off one sequence and pushing them into another.


Think about more aggressive density invariants (e.g. fuse 3 chunks into 2,
etc.).