Commit 1ed89eb8 authored by Jussi Lindgren's avatar Jussi Lindgren

Documentation: Miscellaneous fixes

parent ccb6efe7
......@@ -37,6 +37,7 @@ Settings description
__________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_IndependentComponentAnalysisFastICA_Settings|
* |OVP_DocEnd_BoxAlgorithm_IndependentComponentAnalysisFastICA_Settings|
*
* |OVP_DocBegin_BoxAlgorithm_IndependentComponentAnalysisFastICA_Setting1|
* Number of independent components to extract (equals PCA dimension reduction)
......@@ -83,11 +84,8 @@ __________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_IndependentComponentAnalysisFastICA_Setting11|
* Should the matrix W be saved to a file?
* |OVP_DocEnd_BoxAlgorithm_IndependentComponentAnalysisFastICA_Setting11|
*
* |OVP_DocEnd_BoxAlgorithm_IndependentComponentAnalysisFastICA_Settings|
__________________________________________________________________
Examples description
......
......@@ -35,8 +35,10 @@ IF(doxygen_bin)
# tokens. This tokens will later be included in the skeleton doxumentation
# files (.dox-skeleton)
GET_PROPERTY(CURRENT_PROJECTS GLOBAL PROPERTY OV_PROP_CURRENT_PROJECTS)
FOREACH(current_project ${CURRENT_PROJECTS})
# MESSAGE(STATUS " [ OK ] Project ${current_project}")
STRING(REGEX REPLACE " +$" "" current_project ${current_project})
# updates the doxyfile variable for input directories
......@@ -46,27 +48,26 @@ IF(doxygen_bin)
ENDIF(EXISTS "${current_project}/include")
IF(EXISTS "${current_project}/src")
# MESSAGE(STATUS " [ OK ] Candidate src directory found ${current_project_src}")
SET(current_project_src "${current_project}/src")
SET(ov_doxy_input "${ov_doxy_input} \\\"${current_project_src}\\\"")
ENDIF(EXISTS "${current_project}/src")
IF(EXISTS "${current_project}/doc")
# MESSAGE(STATUS " [ OK ] Candidate doc directory found ${current_project_doc}")
SET(current_project_doc "${current_project}/doc")
SET(ov_doxy_input "${ov_doxy_input} \\\"${current_project_doc}\\\"")
ENDIF(EXISTS "${current_project}/doc")
#MESSAGE(STATUS " [ OK ] Candidate doc directory found ${current_project_src}")
# looks for resources and stores them in a list
FILE(GLOB_RECURSE resource_files_tmp "${current_project_doc}/*.png" "${current_project_doc}/*.svg" "${current_project_doc}/*.css" "${current_project_doc}/*.php")
SET(resource_files ${resource_files} ${resource_files_tmp})
# looks for resources and stores them in a list
FILE(GLOB_RECURSE resource_files_tmp "${current_project_doc}/*.png" "${current_project_doc}/*.svg" "${current_project_doc}/*.css" "${current_project_doc}/*.php")
SET(resource_files ${resource_files} ${resource_files_tmp})
# looks for partial hand written documentation
FILE(GLOB_RECURSE doxs "${current_project_doc}/*.dox-part")
SET(DOX_PART_FILES "${DOX_PART_FILES};${doxs}")
# looks for partial hand written documentation
FILE(GLOB_RECURSE doxs "${current_project_doc}/*.dox-part")
SET(DOX_PART_FILES "${DOX_PART_FILES};${doxs}")
ENDIF(EXISTS "${current_project}/doc")
ENDFOREACH(current_project)
# look for box snapshots generated by the plugin inspector
FILE(GLOB_RECURSE resource_files_tmp "${CMAKE_CURRENT_BINARY_DIR}/*.png")
SET(resource_files ${resource_files} ${resource_files_tmp})
......@@ -80,7 +81,7 @@ IF(doxygen_bin)
IF(WIN32)
STRING(REPLACE "/" "\\\\" ov_doxy_final ${ov_doxy_final})
ENDIF(WIN32)
# these two lines configure the variables used to configure the doxyfile
SET(ov_doxy_input "${ov_doxy_input} \\\"${CMAKE_CURRENT_SOURCE_DIR}\\\"")
SET(ov_doxy_input "${ov_doxy_input} \\\"${CMAKE_CURRENT_BINARY_DIR}\\\"")
......@@ -97,7 +98,7 @@ IF(doxygen_bin)
# updates the doxyfile variable for input directories
SET(ov_plugin_inspector_load_path "${ov_plugin_inspector_load_path}:${current_project}")
ENDFOREACH(current_project)
# create folder to put the output from doxygen to
file(MAKE_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/../doc")
file(MAKE_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/../doc/html")
......@@ -133,8 +134,6 @@ ELSEIF(UNIX)
)
ENDIF(WIN32)
# does not work, for some reason
MESSAGE(STATUS "CURRENT SOURCE: ${CMAKE_CURRENT_SOURCE_DIR}")
INSTALL(DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/../doc/" DESTINATION ${CMAKE_INSTALL_FULL_DOCDIR} PATTERN ".svn" EXCLUDE)
ELSE(doxygen_bin)
......@@ -142,3 +141,5 @@ ELSE(doxygen_bin)
MESSAGE(STATUS " FAILED to find doxygen...")
ENDIF(doxygen_bin)
......@@ -118,7 +118,7 @@ This option can be used to resample the dataset to feature all classes equally.
if the box is used for incremental learning, where all classes may not be equally represented in the training data
obtained so far, even if the design itself is balanced. Note that enabling this will make the cross-validation
results optimistic. In most conditions, the feature should be disabled.
* |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting11|
* |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting11|
__________________________________________________________________
......@@ -138,6 +138,8 @@ Miscellaneous description
__________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Miscellaneous|
The box supports various mtulticlass strategies and classifiers as plugins.
\par Available strategy:
Strategy refers to how feature vectors are routed to one or more classifiers, which possibly can handle only 2 classes themselves.
......@@ -195,7 +197,7 @@ Note that feature vectors are normalized between -1 and 1 (using the min/max of
\par
This algorithm provides both hyperplane distance (identity of output layer) and probabilites (softmax function on output layer).
Cross Validation
\par Cross Validation
In this section, we will detail how the k-fold test is implemented in this box. For the k-fold test to be performed, you
have to choose more than 1 partition in the related settings. Suppose you chose \c n partitions. Then when trigger stimulation
......@@ -243,16 +245,16 @@ In conclusion, be careful when choosing this k-fold test setting. Typical value
test on 25% - 4 times) to 10 partitions (train on 90% of the feature vectors and test on 10% - 10 times).
Note that the cross-validation performed by the classifier trainer box in OpenViBE may be optimistic.
The cross-validation is working as it should, but it cannot take into account what happens outside it.
In OpenViBE scenarios, there may be e.g. time overlap from epoching, examples drawn from the same epoch
ending up in the same cross-validation partition, and (supervised) preprocessing
such as CSP or XDawn potentially overfitting the data before its given to the classifier trainer.
Such situations are not compatible with the theoretical assumption that the classified examples are
The cross-validation computation is working as it should, but it cannot take into account what happens outside
the classifier trainer box. In OpenViBE scenarios, there may be e.g. time overlap from epoching, feature
vectors drawn from the same epoch ending up in the same cross-validation partition, and (supervised)
preprocessing such as CSP or xDAWN potentially overfitting the data before its given to the classifier trainer.
Such situations are not compatible with the theoretical assumption that the feature vectors are
independent and identically distributed (the typical iid assumption in machine learning) across
train and test. To do cross-validation controlling for such issues, we have provided
a more advanced cross-validation tutorial in the OpenViBE web documentation.
a more advanced cross-validation tutorial as part of the OpenViBE web documentation.
Confusion Matrices
\par Confusion Matrices
At the end of the training, the box will print one or two confusion matrices, depending if cross-validation
was used: one matrix for the cross-validation, the other for the training data. Each matrix will contain true
......@@ -264,7 +266,7 @@ source, the signal processing chains for the different classes, or the classifie
then to be investigated. Also, if very low accuracies are observed in these matrices, it may give reason
to suspect that prediction accuracies on fresh data might be likewise lacking -- or worse.
Incremental Learning
\par Incremental Learning
The box can also be used for simple incremental (online) learning. To achieve this, simply send the box the training
stimulation and it will train a classifier with all the data it has received so far. You can give it more
......
......@@ -15,15 +15,51 @@ Output:
1) If the outputs of the box are raw numeric values, the box first sends every connecting client eight variables of uint32: format version number (in network byte order), endianness of the stream (in network byte order, 0==unknown, 1==little, 2==big, 3==pdp), sampling frequency of the signal, the number of channels, the number of samples per chunk and three variables of padding, 8*4=32 bytes in total. The last 6 variables are in the byte order of the stream. Note that only those variables will be non-zero that are meaningful for the input in question.
Header layout as a table,
\verbatim
| Name | Type | Bytes from start |
| ------------------- | ------------- | ---------------- |
| Format version | uint32 | 0 |
| Endianness | uint32 | 4 |
| Sampling frequency | uint32 | 8 |
| Number of channels | uint32 | 12 |
| Samples per chunk | uint32 | 16 |
| Reserved0 | uint32 | 20 |
| Reserved1 | uint32 | 24 |
| Reserved2 | uint32 | 28 |
\endverbatim
1b) If the output is chosen as hex string or descriptive string (these are valid for Stimulation input only), no header is sent.
2) After the possible global header, the data itself is sent. The data is a stream of float64 chunks for Signal and StreamedMatrix.
Each chunk is a matrix [nChannels x nSamples], sent in row-major order, i.e. all samples for one channel are sent in a sequence (a row),
then all samples of the next sample (next row), and so on. For Stimulations, the data is uint64 if the user chooses raw,
or char strings otherwise.
Multiple clients can connect to the socket of the box. The box keeps sending data to each client until either the scenario is stopped or the client disconnects. The box does not guarantee that the client starts to receive the input stream from any particular location. When kernel calls box::process() at time t, all clients connected at time t get forwarded the chunks given to box::process() at t. However, if a client establish a connection during box::process(), it may get a partial chunk of t and the whole chunk of t+1 and so on.
then all samples of the next channel (next row), and so on. This is the same order that OpenViBE uses internally for signal chunks.
Signal/matrix data layout as a table (k = nSamples, n = nChannels),
\verbatim
| Name | Type | Bytes from start |
| -------------------- | ------------- | ------------------ |
| Channel 1, sample 1 | float64 | 32 + (k*0+0)*8 |
| Channel 1, sample 2 | float64 | 32 + (k*0+1)*8 |
| ... | ... | ... |
| Channel 1, sample k | float64 | 32 + (k*0+(k-1))*8 |
| Channel 2, sample 1 | float64 | 32 + (k*1+0)*8 |
| Channel 2, sample 2 | float64 | 32 + (k*1+1)*8 |
| ... | ... | ... |
| Channel 1, sample k+1 | float64 | 32 + (k*n+0)*8 |
| ... | ... | .... |
\endverbatim
For Stimulations, the data is a sequence of uint64 if the user chooses raw output, or char strings otherwise.
Multiple clients can connect to the socket of the box. The box keeps sending data to
each client until either the scenario is stopped or that client disconnects. When
kernel calls box::process() at time t, all clients connected before or at t,
get forwarded the chunks that are pending in box::process() at time t. Note that
the information how long time has elapsed between the acquisition or scenario startup
and the client connection is not currently relayed by TCP Writer.
* |OVP_DocEnd_BoxAlgorithm_TCPWriter_Description|
__________________________________________________________________
......@@ -81,7 +117,8 @@ __________________________________________________________________
Detected transmission errors will cause a disconnection of the client.
Streamed Matrix can be recognized from the TCPWriter header by the sampling rate 0. If the stream is a signal, the sampling rate is a positive number.
Streamed Matrix can be recognized from the TCPWriter header by the sampling rate 0. If the stream is a signal,
the sampling rate is a positive number. Raw stimulation streams have channel and sample counts per buffer 0 as well.
Known issues: Note that it can be difficult to time-synchronize signals and stimulations exactly on the client side when the client receives data from
two TCP Writer boxes. Maintaining such synchronization was not a design goal of this box. If you need synchronized streams, it is advised to build
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment