Commit dbfe3c39 authored by xtof's avatar xtof

Merge branch 'master' of gitlab.inria.fr:cerisara/olkisite

parents 093e21d1 2aeeeb59
Pipeline #107252 passed with stage
in 5 seconds
......@@ -21,11 +21,15 @@ theme = "timer-hugo"
[[menu.nav]]
name = "Dates"
url = "/dates"
weight = 7
weight = 6
[[menu.nav]]
name = "Platform"
url = "/platform"
weight = 7
[[menu.nav]]
name = "Intranet"
url = "/intranet"
weight = 8
# Site Params
......
---
title: "Machine Learning on copyrighted data"
date: 2019-11-19T12:02:00+06:00
description : "Machine learning and copyrighted data"
type: draft
image:
author: Christophe Cerisara
tags: ["AI", "deep learning"]
---
## Training algorithms on copyrighted data is not illegal, according to the United States Supreme Court
According to a recent Supreme Court decision in the USA, it is allowed to
train machine learning models on copyrighted data:
https://towardsdatascience.com/the-most-important-supreme-court-decision-for-data-science-and-machine-learning-44cfc1c1bcaf
This decision is very impacting in the current debate about data ownership and privacy, but it certainly should not
be interpreted in an oversimplistic way, as there is always the problem of re-generating the source data from a number
of recent deep learning models, which is likely to moderate these conclusions in many practical cases...
---
title: "Federated Deep Learning"
date: 2019-10-23T12:02:00+06:00
description : "On federated deep learning"
type: draft
image:
author: Christophe Cerisara
tags: ["AI", "deep learning"]
---
## Challenge of training deep learning models
Training deep learning models has always been a challenge.
Not only because of the large quantity of data to train on,
or because of the large number of model parameters to optimize,
but also because of the fact that training a deep learning model
involves finding its high through a huge variety of related paths,
which involves testing many different combinations of hyper-parmeters
and model topologies, without any certainty but just using basic intuition
and human knowledge and expertise.
As a result, it's not only that a single training run may be very long even on
many GPUs, but that we have to actually perform many of such runs before finding a good path.
So, globally, the cost - in time, human resources and money - of training deep learning models is very high.
In order to speed up this process, the common trend is to deploy models
in high-performance computing clusters, equipped with last-generation powerful GPUs.
But such clusters are extremely costly, and become outdated only after a few years.
This is why another paradigm is currently emerging under the name **federated deep learning**.
## Distributed training
Let us first review the various strategies to train deep learning models, centralized or distributed:
- On a single GPU on a single machine: this is the easiest nowadays, thanks to efficient implementation of CUDA operations in pytorch ans tensorflow
- Accross GPUs on a single machine: this is a bit more difficult to realize, because it often involves modifying the code of the model to efficiently distribute it on multiple GPUs. More and more libraries are however proposed to try and automate this distribution.
- Accross GPUs on multiple nodes in a single cluster: this is more difficult, because central memory is not shared anymore, and latency between nodes is higher.
- Accross low-end devices: this is the so-called **federated deep learning**. Latency is however a major issue there, and synchronous SGD is not an option any more.
So the idea is to train on each device for multiple epochs using only the data hosted locally, before updating the global model.
This is also presented as a way to protect privacy, as device-dependent data is not transferred to another node.
## Federated, but not in the sense of OLKi
The term **federated deep learning** in the sense given above involves a central model, and every node is still working to update this central model.
In that sense, it fundamentally differs from the use of federated in the **federated OLKi platform**, where there is no central model
and every node has his own objectives, but still voluntarily share knowledge with other nodes in his community.
---
title: "The 4 meanings of AI"
date: 2019-05-21T12:02:00+06:00
description : "AI Abuse"
type: post
image:
author: Christophe Cerisara
tags: ["project"]
draft: True
---
The term Artificial Intelligence have multiple meanings and it is important to distinguish them and understand their difference.
The four main meanings of AI are:
- The **historical meaning**: A long time ago, starting from 1980, AI was only a research domain, which aimed at mimicking within machines a few human cognitive process, especially those related to
perception: speech recognition and vision, but also playing games like chess, interacting with the environment with robots and a few other ones.
There were no concrete applications at that time, because the performances were not good enough, and reserchears' interest was mostly targeted at increasing scientific knowledge.
A variety of methods were investigated: formal methods, for instance based on logics or grammars, but also many types of statistical approaches, including machine learninig.
- The **deep learning meaning**: Starting from 2010, deep learning methods have brought impressive improvments in performances in the domain, largely surpassing those of formal methods and traditional
machine learning and making industrialisation of the domain possible. Modern scientists and companies usually refer to the deep learning methods when they talk about AI, which have become so useful
that they are also used in application domains that have nothing to do with the original view of mimicking cognitive processes, but everything to do with processing huge amounts of data,
every kind of data: weather prediction, machine and industrial sensors, satellite, sonar and X-ray images, etc.
- The **GAFAM meaning**:
- The **science fiction meaning**:
......@@ -10,7 +10,62 @@ tags: ["project"]
OLKi is a research project. An outcome of the project will be a platform, which is:
- For **everybody**, to share resources, any kinds of resources (files, programs, datasets...), and to communicate. You stay in control of your resource, and you benefit from the federation. You don't need to install anything and may choose to be hosted, if you prefer.
<br>
<br>
It's not only for everybody, it's also for scientists, professionals and amateurs; it's
- <a href="../../detpage/socialnet/">A **social network** for scientists, and citizens as well</a>
- <a href="../../detpage/services/">A network that will support **services** for scientists</a>
- <a href="../../detpage/ethics/">A pratical solution towards an **ethical** transformation of science</a>
----
The vision of OLKi is a federation for scientists, i.e. a decentralized network
potentially composed of many University servers, each one managing its own community.
As it implements the W3C ActivityPub standard, this scientific network
will also be part of the Fediverse, a community-managed multimodal social
network that hosts 2 million citizens, hence enabling direct interactions
between scientists and citizens.
Two services will initially be deployed on OLKi:
1- Resource sharing: every user may upload (under the control of the hosting
node) and share resources (datasets, papers, programs, models, videos...),
or import them from data repositories (such as ORTOLANG, Zenodo, CLARIN,
Dataverse, arXiv, HAL...) with OAIPMH. Note that OLKi does not offer persistent
storage, and thus is not a data repository. OLKi is complementary from data
repositories: it makes their metadata visible on a global decentralized
social network to scientists and citizens; and it handles sharing short-term
resources.
2- Instantaneous scientific communication: beyond traditional conferences,
journals and emails, some scientists have expressed the need to communicate
more quickly on social media (Twitter, Reddit, Researchgate, Academia...).
OLKi provides open-source solutions for that (including math rendering, referencing
resources...) on a global federated social network already used by citizens,
while keeping all of their data under their control on their local node.
Facilities to easily implement and deploy new services over the OLKi platform
will be provided; future services include:
- Bots for scientific watch and literature review
- Access to NLP tools API hosted on specialized nodes (ORTOLANG...)
- Federated deep learning to enable multi-task transfer learning of AI models between Universities
Compared to most existing solutions, the OLKi vision presents the following two main sets of advantages:
- As a decentralized solution, the costs are shared amongst participating
actors, there is no single point of failure, the network is resilient to
attacks, scalability and sustainability are unlimited, joining the network is
free and open without any control...
- As a federation, the platform is community-managed, transparent; the software
and APIs are open-source; hosting and connection **policies are defined per
node/University**; and the platform is ethical because there is no privileged
node who has more information or control than any other,
and data providers stay in control of their own data on their local node.
<img src="../../images/olkiplat.svg" width="80%" style="display:block; margin-left:auto; margin-right:auto"/>
......@@ -23,7 +23,7 @@ DATE | EVENT
7 jan 2019 | presentation of OLKi to the Comité des Projets INRIA Grand-Est
10 jan 2019 | meeting: comop OLKi
14 jan 2019 | presentation of OLKi at the ATILF laboratory
15 jan 2019 | interviews post-doc Virginie
15 jan 2019 | interviews post-doc
17 jan 2019 | presentation of OLKi to the lycée Poincaré Math. Sup. students
24 jan 2019 | presentation of OLKi to the Ecole des Mines students
29 jan 2019 | presentation of OLKi at the LORIA laboratory
......@@ -44,5 +44,20 @@ DATE | EVENT
26 avr 2019 | meeting: comop OLKi
29 avr 2019 | meeting: collaboration ANR w/ ORTOLANG
24 mai 2019 | workshop: ethics
28 mai 2019 | Angeliki Monnier talk
29 mai 2019 | interviews PhD
03 jun 2019 | meeting with dir. doc & edition
05 jun 2019 | meeting: comop OLKi
13 jun 2019 | meeting: platform working group
18 jun 2019 | meeting with Meetup organisers
26 jun 2019 | meeting: comop OLKi
02 jul 2019 | meeting: platform working group
04 jul 2019 | meeting with Numerev
10 jul 2019 | meeting about the future of OLKi
09 aug 2019 | meeting with dir CLARIN
07 sep 2019 | ActivityPub conference in Prague
30 sep 2019 | OLKi at CLARIN conference Bazaar in Leipzig
18 oct 2019 | presentation of OLKi at Telecom hackaton
18 oct 2019 | presentation of OLKi at "nuit des 80 ans du CNRS"
</div>
---
title: "Etherpads"
date: 2019-05-21T12:02:00+06:00
description : "(Only for project members)"
type: post
image:
author: Christophe Cerisara
tags: ["project"]
---
Liste de nos etherpads:
- etherpad principal: http://etherpadlite.univ-lorraine.fr/p/y740q8h75l
---
title: "Workshop 19 Nov. 2019 in Metz"
type: portfolio
date: 2019-11-04T10:59:54+06:00
description : "CREM Workshop"
caption: 19th, November 2019, Metz, France
image: images/portfolio/journeeCREM.png
category: ["AI"]
client: CREM
submitDate: 19th Nov. 2019
location: Metz
liveLink: http://crem.univ-lorraine.fr/journalistic-practices-facing-computation-and-automationles-pratiques-journalistiques-face-la
---
### Journalistic Practices Facing Computation and Automation
Until recently, journalists were not concerned by automation threats. Like other creative practices, news writing was thought to profit from computers without running the risk to delegate too much of its value to it. However, advanced computing (or artificial intelligence) is beginning to attract the attention of professionals as well as scholars working on journalism. Algorithms are trained to write articles, built on raw information found on the Internet. Automated social media accounts (newsbots) operate automatic selections of news, contributing to shape their global circulation, and possibly creating echo chambers and communities of readers. Major experiments in automatic news selection, such as Google News, have created opportunities for new business models for the press, challenging the traditional ones. Search engines, especially when they assign importance to news based on digital social ties, influence access to information, and potentially create confining filter bubbles. All these phenomena, and more, require attention of academia and professionals alike. This one-day workshop brings together researchers specialising in these issues to discuss the impacts of automation on journalistic practices, at the crossroads of sociological, technical and ethical considerations.
More information [here](http://crem.univ-lorraine.fr/journalistic-practices-facing-computation-and-automationles-pratiques-journalistiques-face-la)
......@@ -20,4 +20,5 @@ The most famous "part" of the fediverse is Mastodon, a micro-blogging server con
Our objective is to add another "modality"/"part" into the fediverse dedicated to scientific resources, such as
linguistic corpora, research papers, scientific videos, software tools...
See the current status of this OLKi platform [here](https://olki.loria.fr/platform).
......@@ -11,9 +11,14 @@ submitDate: 2019
location: Nancy
---
The OLKi project joins with the GDR LIFT to propose the
The OLKi project has joined with the GDR LIFT to propose the
**[Python4NLP Summer school](https://synalp.loria.fr/python4nlp/)**
that will take place during the last week of August 2019, in Nancy, France.
that has taken place during the last week of August 2019, in Nancy, France.
[Join us here !](https://synalp.loria.fr/python4nlp/)
The week was dedicated to teaching and experimenting with various aspects of
textual corpus processing, from scraping to computing words embeddings.
We further enjoyed two engaging and high quality invited talks, which greatly
contributed to the success of the summer school.
Thank you very much to everyone who joined or help us during this week !
......@@ -4,4 +4,11 @@
# rsync -av --delete -e "ssh -i /home/gitlab-runner/.ssh/id_rsa" public/ xtof@olkihost.loria.fr:/var/www/html/website/
rsync -av --delete public/ olkihost:/var/www/html/website/
rsync -av xtof/python4nlp.php olkihost:/var/www/html/website/
rm -f pp
touch pp
echo 'AuthType Basic' >> pp
echo 'AuthName "Restricted Content"' >> pp
echo 'AuthUserFile /etc/apache2/.htpasswd' >> pp
echo 'Require valid-user' >> pp
rsync -av pp olkihost:/var/www/html/website/intranet/.htaccess
This diff is collapsed.
......@@ -8,7 +8,6 @@
<div class="block">
<h2>{{ .Title }}</h2>
<div class="portfolio-meta">
<span>{{ .Date.Format "2006-01-02" }}</span>|
<span> Category: {{ delimit .Params.category ", " }}</span>|
<span> website:
<a href="{{ .Params.liveLink }}">{{ .Params.liveLink }}</a>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment