Develop your Node
A node is a directory with a unique <node_id>
that adheres to the following minimal structure:
- The
code
folder contains the node's source code, withmain.py
serving as the entry point. - The
data
folder stores output results generated by the node. - The
logs.txt
file records execution logs.
The main.py
script is executed by the pipeline when the node runs. It can access data from parent nodes and may include calls to Python scripts, Jupyter notebooks, MATLAB scripts, or other executable code.
Initial Setup
When a node is created from the UI, a dedicated Python virtual environment is automatically created using uv
. This environment is located in a .venv
folder inside your node's directory.
To work on your node's code, you first need to navigate to its directory and activate the virtual environment.
-
Navigate to the node folder:
You can copy the path from the GUI, then use it in your terminal:
-
Activate the virtual environment:
From within the node's directory, run:
Your terminal prompt should now indicate that you are in the virtual environment.
Development Workflow
Here’s a typical workflow for developing the logic for your node:
-
Develop your code:
You can write your analysis, simulation, or machine learning code in a Jupyter notebook, a Python script, or even a MATLAB script. You can find examples in the
examples
directory of the project. -
Integrate with
main.py
:The
main.py
file is the entry point for your node's execution within the pipeline. You need to modify it to call the code you developed. The file already contains examples of how to call different types of scripts. -
Test your node locally:
Before running the node as part of the full pipeline, you can test it in isolation. From your node's
This command uses the node's dedicated virtual environment to run yourcode
directory, run:main.py
script, simulating how the pipeline will execute it. Make sure you have all the necessary dependencies installed in the virtual environment. -
Run the node from the pipeline:
Once you are satisfied with your local tests, you can run the node from the pipeline's user interface. This will execute the node in the correct order based on its dependencies.
User API
To access data from other nodes or manage the current node's data, fusionpipe
provides a simple API. Here are the main functions you can use in your scripts:
-
get_node_id()
: Retrieves the ID of the current node. The ID follows the formatn_<datetime>_<random_4digit_integers>
. -
get_all_parent_node_folder_paths(node_id)
: Returns a list of folder paths for all parent nodes of the specified node. This is how you access the output data from the nodes that run before yours. -
get_folder_path_node()
: Retrieves the folder path of the current node. This is useful for saving your node's output to itsdata
subfolder.
Using Jupyter
If you prefer to develop using Jupyter Notebook or JupyterLab, you can set up a dedicated kernel for your node. This ensures that your notebook uses the same environment and dependencies as the pipeline.
To create and set up the Jupyter kernel, navigate to your node's code
directory and run:
For more advanced topics, such as developing a node in conjunction with a custom Python package, see the Best practices for pipeline and package development.