In the last session we learned about Flux moduls. Flux moduls can do a lot of things. They configure the “high-level” transformation pipeline.
But the main transformation of incoming data at record, elemenet and value level is usually done by the transformation moduls Fix or Morph as one step in the pipeline.
By transformation we mean things like:
But not changing serialization that is part of encoding and decoding.
In this tutorial we focus on Fix. If you want to learn about Morph have a look at this presentation and the great documentation by Swiss Bib.
So let’s dive into Metafacture Fix and get back to the Playground.
Clear it if needed and paste the following Flux in the Flux-File area.
"https://openlibrary.org/books/OL2838758M.json"
| open-http
| as-lines
| decode-json
| fix ("retain('title')")
| encode-yaml
| print
;
You should end up with something like:
---
title: "Ordinary vices"
The fix
module in Metafacture is used to manipulate the input data filtering fields we would like to see. Only one fix-function was used: retain
, which throws away all the data from the input except the stated "title"
field. Normally all incoming data is passed through, unless it is somehow manipulated or a retain
function is used.
HINT: As long as you embedd the fix functions in the Flux Workflow, you have to use double quotes to fence the fix functions,
and single quotes in the fix functions. As we did here: fix ("retain('title')")
Now let us additionally keep the info that is given in the element "publish_date"
and the subfield "key"
in 'type'
by adding 'publish_date', 'type.key'
to retain
:
"https://openlibrary.org/books/OL2838758M.json"
| open-http
| as-lines
| decode-json
| fix ("retain('title', 'publish_date', 'notes.value', 'type.key')")
| encode-yaml
| print
;
You should now see something like this:
---
title: "Ordinary vices"
publish_date: "1984"
notes:
value: "Bibliography: p. 251-260.\nIncludes index."
When manipulating data you often need to create many fixes to process a data file in the format and structure you need. With a text editor you can write all fix functions in a singe separate Fix file.
The playground has an transformationFile-content area that can be used as if the Fix is in a separate file.
In the playground we use the variable transformationFile
to adress the Fix file in the playground.
Like this.
Fix:
retain("title", "publish_date", "notes.value", "type.key")
Using a separate Fix file is recommended if you need to write many Fix functions. It will keep the Flux workflow clear and legible.
To add more fixes we can again edit the Fix file. Lets add these lines in front of the retain function:
move_field("type.key", "pub_type")
Also change the retain
function so that you keep the new element "pub_type"
instead of the not existing nested "key"
element.
move_field("type.key","pub_type")
retain("title", "publish_date", "notes.value", "pub_type")
The output should be something like this:
---
title: "Ordinary vices"
publish_date: "1984"
pub_type: "/type/edition"
notes:
value: "Bibliography: p. 251-260.\nIncludes index."
With move_field
we moved and renamed an existing element.
As next step add the following function before the retain
function.
replace_all("pub_type","/type/","")
If you execute your last workflow with the “Process” button again, you should now see as ouput:
---
title: "Ordinary vices"
publish_date: "1984"
pub_type: "edition"
notes:
value: "Bibliography: p. 251-260.\nIncludes index."
We cleaned up the value of "pub_type"
element for better readability.
See the example in the playground.
Metafacture contains many Fix functions to manipulate data. Also there are many Flux commands/modules that can be used.
Check the documentation to get a complete list of Flux commands and Fix functions. This post only presented a short introduction into Metafacture. In the next posts we will go deeper into its capabilities.
Besides Fix functions you can also add as many comments and linebreaks as you want to a Fix.
Adding comments will save you a lot of time and effort when you look at your code in the future.
Comments in Fix start with a hash mark #
, while in Flux they start with //
.
Example:
# Make type.key a top level element.
move_field("type.key","pub_type")
# Clean the value of `pub_type`
replace_all("pub_type","/type/","")
# Keep only specific elements.
retain("title", "publish_date", "pub_type")
1) Additionally keep the "by_statement"
. Hint: Add something to retain
2) Add a field with todays date called "map_date"
.
Have a look at the fix functions: https://metafacture.org/metafacture-documentation/docs/fix/Fix-functions.html (Hint: you could use add_field
or timestamp
. And don’t forget to add the new element to retain
)
Next lesson: 04 Fix Path