Issue when loading two datasets
I tried loading two datasets (one xml and another csv) one after the other. The first one loaded without error but the second one had errors on the UI. There was only half a log in catalina.out
.
10:13:15,394 INFO ConnectionLens:1217 - Computing CL tail.
10:13:15,394 INFO PostgresFullTextIndex:123 - creating index on nodes using normalabel.
10:13:15,407 INFO ConnectionLens:1250 - Computing stats.
10:13:15,551 INFO ConnectionLens:1253 - Done with graph statistic computations: 144
10:13:15,551 INFO ConnectionLens:1257 - Persisting extraction contexts.
10:13:15,624 INFO ConnectionLens:1262 - Creating Normalised Graph
10:13:15,625 INFO ExtractionPolicy:99 - 218 extraction rules.
10:13:15,626 INFO Normalization:48 - Creating the normalized graph.
10:13:15,709 INFO Normalization:156 - Created contexts table: CREATE TABLE norm_extraction_contexts AS SELECT * FROM extraction_contexts
10:13:15,712 INFO RelationalGraph:635 - Indexing model: POSTGRES_FULLTEXT
10:13:15,712 INFO PostgresFullTextIndex:107 - max_matches_per_kwd: -1
10:13:15,712 INFO ConnectionLens:1264 - Creating Normalised Graph Stats
10:13:15,896 INFO PostgresFullTextIndex:123 - creating index on norm_nodes using normalabel.
10:13:15,904 INFO ConnectionLens:302 - Loaded 1 data sources
10:13:15,904 INFO ImportServlet:143 - [XML(1,file:/Users/mmohanty/Work/connection-studio/Softwares/apache-tomcat-9.0.54/webapps/webservices/WEB-INF/uploads/hatvp-small.xml)]
10:13:15,904 INFO ImportServlet:146 - Filehatvp-small.xmlsuccessfully added
10:13:50,761 INFO SaveParametersServlet:141 - Current graph value of search_stopper_topk is: -1
10:13:50,762 INFO SaveParametersServlet:150 - Save config: {"extract_policy":""} to db, and update CL. Current extractor is: Cache(gptNER)
10:13:50,762 INFO SaveParametersServlet:154 - search_stopper_topk value updated is : -1
10:14:12,008 INFO ImportServlet:56 - Starting import
10:14:12,010 INFO ImportServlet:168 - Save import locally: /Users/mmohanty/Work/connection-studio/Softwares/apache-tomcat-9.0.54/webapps/webservices/WEB-INF/uploads/Cac40.csv
10:14:12,010 INFO ImportServlet:125 - will start to register
10:14:12,010 INFO ImportServlet:128 - will register source file:/Users/mmohanty/Work/connection-studio/Softwares/apache-tomcat-9.0.54/webapps/webservices/WEB-INF/uploads/Cac40.csv
10:14:12,010 INFO ConnectionLens:295 - Going into source registration
10:14:12,011 INFO ConnectionLens:405 - Drop CL tail.
10:14:12,047 INFO RelationalGraph:2395 - Drop tables norm_nodes, norm_edges, norm_specificity
10:14:12,096 INFO RelationalGraph:2415 - Create tables: norm_nodes, norm_edges, weak_same_as, norm_specificity
When I loaded it using CL with Studio's local.settings
, it worked fine.
mmohanty@MAC-11107417 connection-lens % java -jar core/target/connection-lens-core-full-3.6-SNAPSHOT-develop-8748b6d-20240704-1027.jar -n -i /Users/mmohanty/Desktop/Cac40.csv -c /Users/mmohanty/Desktop/local.settings -DRDBMS_DBName=cl_hatvp_bkup
Overriding cl_default with cl_hatvp_bkup
2024-07-04 10:31:56 INFO SimilarPairProcessor:35 - Registering comparators: URIPairProcessor PersonPairProcessor LocationPairProcessor OrganizationPairProcessor DatePairProcessor EmailPairProcessor HashTagPairProcessor
2024-07-04 10:31:56 WARN ConnectionLens:244 - ERROR: database "cl_hatvp_bkup" already exists
2024-07-04 10:31:56 INFO FlairNERExtractor:53 - This is the Flair Named Entity Extractor (NER), locale is: fr
2024-07-04 10:31:56 INFO PythonUtils:76 - Python installation path: /Users/mmohanty/Work/connection-studio/connection-studio-dir/cs_env/bin/python
2024-07-04 10:31:56 INFO PythonUtils:77 - Running the script: /Users/mmohanty/Work/connection-studio/connection-studio-dir/scripts/Flair_NER_tool/flask_Flair_NER.py
2024-07-04 10:31:56 INFO PythonUtils:91 - Starting french Flair NER service on port 5001 (re-check every 2000 ms until 150000 ms)...
2024-07-04 10:31:56 INFO PythonUtils:94 - /Users/mmohanty/Work/connection-studio/connection-studio-dir/cs_env/bin/python /Users/mmohanty/Work/connection-studio/connection-studio-dir/scripts/Flair_NER_tool/flask_Flair_NER.py -l french -d /Users/mmohanty/Work/connection-studio/connection-studio-dir/scripts -bs 1 -p 5001
2024-07-04 10:32:34 INFO PythonUtils:110 - Started Flair in 38000 ms.
2024-07-04 10:32:34 INFO NERExtractorFamily:205 - Batch size: 1
2024-07-04 10:32:34 INFO CacheExtractor:82 - Created Cache(flairNER) of size 100000 in memory and with unbounded cache on disk, locale: en
2024-07-04 10:32:34 INFO ConnectionLens:180 - Creating graph from pre-existing catalog
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2024-07-04 10:32:34 INFO RelationalGraph:635 - Indexing model: POSTGRES_FULLTEXT
2024-07-04 10:32:34 INFO PostgresFullTextIndex:107 - max_matches_per_kwd: -1
2024-07-04 10:32:34 INFO RelationalGraph:635 - Indexing model: POSTGRES_FULLTEXT
2024-07-04 10:32:34 INFO PostgresFullTextIndex:107 - max_matches_per_kwd: -1
2024-07-04 10:32:34 INFO Experiment:293 - Registration test starts: 38507
2024-07-04 10:32:34 INFO Experiment:294 - PER_INSTANCE value creation, PER_GRAPH entity creation
2024-07-04 10:32:34 INFO ConnectionLens:295 - Going into source registration
2024-07-04 10:32:34 INFO ConnectionLens:405 - Drop CL tail.
2024-07-04 10:32:34 INFO RelationalGraph:2395 - Drop tables norm_nodes, norm_edges, norm_specificity
2024-07-04 10:32:34 INFO RelationalGraph:2415 - Create tables: norm_nodes, norm_edges, weak_same_as, norm_specificity
2024-07-04 10:32:55 INFO ConnectionLens:1217 - Computing CL tail.
2024-07-04 10:32:55 INFO PostgresFullTextIndex:123 - creating index on nodes using normalabel.
2024-07-04 10:32:55 INFO ConnectionLens:1250 - Computing stats.
2024-07-04 10:32:55 INFO ConnectionLens:1253 - Done with graph statistic computations: 330
2024-07-04 10:32:55 INFO ConnectionLens:1257 - Persisting extraction contexts.
2024-07-04 10:32:55 INFO ConnectionLens:1262 - Creating Normalised Graph
2024-07-04 10:32:55 INFO Normalization:48 - Creating the normalized graph.
2024-07-04 10:32:55 INFO Normalization:156 - Created contexts table: CREATE TABLE norm_extraction_contexts AS SELECT * FROM extraction_contexts
2024-07-04 10:32:55 INFO RelationalGraph:635 - Indexing model: POSTGRES_FULLTEXT
2024-07-04 10:32:55 INFO PostgresFullTextIndex:107 - max_matches_per_kwd: -1
2024-07-04 10:32:55 INFO ConnectionLens:1264 - Creating Normalised Graph Stats
2024-07-04 10:32:56 INFO PostgresFullTextIndex:123 - creating index on norm_nodes using normalabel.
2024-07-04 10:32:56 INFO ConnectionLens:302 - Loaded 1 data sources
2024-07-04 10:32:56 INFO Experiment:302 - Only reporting and possible drawing left: 60166
2024-07-04 10:32:56 INFO GraphPrintingByRepresentative:74 - Drawing in /Users/mmohanty/Work/connection-studio/connection-studio-dir/tmp
2024-07-04 10:32:56 INFO Experiment:324 - Actual total registration time: 21659
2024-07-04 10:32:56 INFO CacheExtractor:225 - Flush memory cache
2024-07-04 10:32:56 INFO ConnectionLens:1134 - Closing ConnectionLens for cl_hatvp_bkup
Total time: 60198 ms.
2024-07-04 10:32:56 INFO PythonUtils:194 - Stopping Flair extractor running on port 5001...
To be investigated further if there is indeed a problem and ensured that Studio can load two datasets.