Commit fb15ed78 authored by Vidjil Team's avatar Vidjil Team
Browse files

vidjil_utils: extract_fields_from_json can now take a partial JSON in input.

JSON files produced by Vidjil can be huge (dozens or hundreds of megabytes) and we
do not want to load them all. We just load a short prefix of the file.
Since the json module expects a well-formed JSON string we use the
cleanup_json_sample to close the JSON input
parent 6aa3e5f1
......@@ -252,7 +252,7 @@ def extract_value_from_json_path(json_path, json):
return elem
def extract_fields_from_json(json_fields, pos_in_list, filename):
def extract_fields_from_json(json_fields, pos_in_list, filename, max_bytes = None):
Takes a map of JSON fields (the key is a common name
and the value is a path) and return a similar map
......@@ -263,10 +263,14 @@ def extract_fields_from_json(json_fields, pos_in_list, filename):
get all of them)
json_dict = json.loads(open(filename).read())
if max_bytes is None:
json_dict = json.loads(cleanup_json_sample(open(filename).read()))
json_dict = json.loads(cleanup_json_sample(open(filename).read(max_bytes)))
except IOError:
json_dict = {}
except ValueError as e:
matched_keys = {}
for field in json_fields:
value = extract_value_from_json_path(json_fields[field], json_dict)
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment