API Reference

SciTeX Notebook — Jupyter notebook verification and compilation.

Provides tools to verify, compile, convert, and check Jupyter notebooks for reproducibility using the Clew verification system.

Key Concept: Notebooks can be executed in any cell order. SciTeX records actual execution order via timestamps, then reconstructs the dependency DAG afterward (“do what you want, organize later”).

Examples

>>> from scitex_notebook import verify_notebook, compile_notebook
>>> results = verify_notebook("experiment.ipynb")
>>> compiled = compile_notebook("experiment.ipynb")
>>> print(compiled.to_mermaid())  # DAG visualization
>>> print(compiled.to_script())   # DAG-ordered .py
class scitex_notebook.CompiledNotebook(notebook_path: str, execution_order: List[str] = <factory>, dag: Dict[str, ~typing.List[str]]=<factory>, runs: List[Dict] = <factory>)[source]

Bases: object

Result of compiling a notebook’s execution history.

notebook_path

Path to the source notebook.

Type:

str

execution_order

Session IDs in actual execution order (by timestamp).

Type:

list of str

dag

Adjacency list: {session_id: [dependent_session_ids]}.

Type:

dict

runs

Run records sorted by execution time.

Type:

list of dict

dag: Dict[str, List[str]]
execution_order: List[str]
notebook_path: str
runs: List[Dict]
to_mermaid() str[source]

Generate a Mermaid DAG diagram of execution flow.

to_script() str[source]

Generate a .py script with sessions in DAG order.

scitex_notebook.check(path: str | Path) List[Dict]

Find cells with scitex.io calls not wrapped in @scitex.session.

Parameters:

path (str or Path) – Path to the .ipynb file.

Returns:

Cells with untracked IO: {index, has_load, has_save, has_session}.

Return type:

list of dict

scitex_notebook.compile(path: str | Path, *, db=None) CompiledNotebook

Compile a notebook’s execution history into a DAG.

Queries the clew DB for all sessions associated with this notebook, sorts by timestamp, and builds a dependency DAG based on shared input/output files.

Parameters:
  • path (str or Path) – Path to the .ipynb file.

  • db (optional) – Pre-resolved clew DB handle. If omitted, the default DB from scitex_clew.get_db() is used. Exposed for testability so tests can inject a hand-rolled fake without patching internals.

Returns:

Compiled execution history with DAG and execution order.

Return type:

CompiledNotebook

scitex_notebook.convert(path: str | Path, output: str | Path | None = None, order: str = 'cell', mode: str = 'per_cell', *, db=None) str

Convert a .ipynb notebook to a .py script with @stx.session.

Parameters:
  • path (str or Path) – Path to the .ipynb file.

  • output (str or Path, optional) – Output .py file path. If None, returns string only.

  • order (str) – Cell ordering: “cell” (notebook order) or “dag” (execution order from clew DB timestamps).

  • mode (str) –

    Conversion mode:

    • ”per_cell”: Each code cell becomes a separate @stx.session function (default).

    • ”unified”: All cells merged into a single @stx.session main() function. Markdown cells become comments, imports are hoisted, and common notebook patterns (plt.show, pd.read_csv, etc.) are converted to SciTeX equivalents (stx.io.save/load).

  • db (optional) – Pre-resolved clew DB handle, used only when order='dag'. Exposed for testability so tests can inject a hand-rolled fake without patching internals.

Returns:

The generated Python script content.

Return type:

str

scitex_notebook.get_code_cells(path: str | Path) List[Dict][source]

Parse notebook and return only code cells.

Parameters:

path (str or Path) – Path to the .ipynb file.

Returns:

Code cells only.

Return type:

list of dict

scitex_notebook.get_notebook_name(path: str | Path) str[source]

Return the notebook stem name without extension.

scitex_notebook.parse_notebook(path: str | Path) List[Dict][source]

Parse a .ipynb file and extract code cells.

Parameters:

path (str or Path) – Path to the .ipynb file.

Returns:

Code cells with keys: index, source, cell_id, cell_type.

Return type:

list of dict

scitex_notebook.verify(path: str | Path, *, db=None, verify_run_fn=None) List[Dict]

Verify all clew sessions associated with a notebook.

Finds all runs in the clew DB whose metadata contains this notebook’s path, then runs L1 (cache) verification on each.

Parameters:
  • path (str or Path) – Path to the .ipynb file.

  • db (optional) – Pre-resolved clew DB handle. If omitted, the default DB from scitex_clew.get_db() is used. Exposed for testability so tests can inject a hand-rolled fake without patching internals.

  • verify_run_fn (optional) – Callable (session_id) -> Verification. If omitted, scitex_clew.verify_run is used. Exposed for testability.

Returns:

Verification results per session.

Return type:

list of dict

Parse

Parse Jupyter notebook files using stdlib json (no nbformat dependency).

scitex_notebook._parse.get_code_cells(path: str | Path) List[Dict][source]

Parse notebook and return only code cells.

Parameters:

path (str or Path) – Path to the .ipynb file.

Returns:

Code cells only.

Return type:

list of dict

scitex_notebook._parse.get_notebook_name(path: str | Path) str[source]

Return the notebook stem name without extension.

scitex_notebook._parse.parse_notebook(path: str | Path) List[Dict][source]

Parse a .ipynb file and extract code cells.

Parameters:

path (str or Path) – Path to the .ipynb file.

Returns:

Code cells with keys: index, source, cell_id, cell_type.

Return type:

list of dict

Verify

Verify notebook sessions and check for untracked IO.

scitex_notebook._verify.check_notebook(path: str | Path) List[Dict][source]

Find cells with scitex.io calls not wrapped in @scitex.session.

Parameters:

path (str or Path) – Path to the .ipynb file.

Returns:

Cells with untracked IO: {index, has_load, has_save, has_session}.

Return type:

list of dict

scitex_notebook._verify.verify_notebook(path: str | Path, *, db=None, verify_run_fn=None) List[Dict][source]

Verify all clew sessions associated with a notebook.

Finds all runs in the clew DB whose metadata contains this notebook’s path, then runs L1 (cache) verification on each.

Parameters:
  • path (str or Path) – Path to the .ipynb file.

  • db (optional) – Pre-resolved clew DB handle. If omitted, the default DB from scitex_clew.get_db() is used. Exposed for testability so tests can inject a hand-rolled fake without patching internals.

  • verify_run_fn (optional) – Callable (session_id) -> Verification. If omitted, scitex_clew.verify_run is used. Exposed for testability.

Returns:

Verification results per session.

Return type:

list of dict

Compile

Compile notebook execution history into a DAG from clew DB timestamps.

class scitex_notebook._compile.CompiledNotebook(notebook_path: str, execution_order: List[str] = <factory>, dag: Dict[str, ~typing.List[str]]=<factory>, runs: List[Dict] = <factory>)[source]

Result of compiling a notebook’s execution history.

notebook_path

Path to the source notebook.

Type:

str

execution_order

Session IDs in actual execution order (by timestamp).

Type:

list of str

dag

Adjacency list: {session_id: [dependent_session_ids]}.

Type:

dict

runs

Run records sorted by execution time.

Type:

list of dict

to_mermaid() str[source]

Generate a Mermaid DAG diagram of execution flow.

to_script() str[source]

Generate a .py script with sessions in DAG order.

scitex_notebook._compile.compile_notebook(path: str | Path, *, db=None) CompiledNotebook[source]

Compile a notebook’s execution history into a DAG.

Queries the clew DB for all sessions associated with this notebook, sorts by timestamp, and builds a dependency DAG based on shared input/output files.

Parameters:
  • path (str or Path) – Path to the .ipynb file.

  • db (optional) – Pre-resolved clew DB handle. If omitted, the default DB from scitex_clew.get_db() is used. Exposed for testability so tests can inject a hand-rolled fake without patching internals.

Returns:

Compiled execution history with DAG and execution order.

Return type:

CompiledNotebook

Convert

Convert Jupyter notebooks to SciTeX-compatible Python scripts.

scitex_notebook._convert.convert_notebook(path: str | Path, output: str | Path | None = None, order: str = 'cell', mode: str = 'per_cell', *, db=None) str[source]

Convert a .ipynb notebook to a .py script with @stx.session.

Parameters:
  • path (str or Path) – Path to the .ipynb file.

  • output (str or Path, optional) – Output .py file path. If None, returns string only.

  • order (str) – Cell ordering: “cell” (notebook order) or “dag” (execution order from clew DB timestamps).

  • mode (str) –

    Conversion mode:

    • ”per_cell”: Each code cell becomes a separate @stx.session function (default).

    • ”unified”: All cells merged into a single @stx.session main() function. Markdown cells become comments, imports are hoisted, and common notebook patterns (plt.show, pd.read_csv, etc.) are converted to SciTeX equivalents (stx.io.save/load).

  • db (optional) – Pre-resolved clew DB handle, used only when order='dag'. Exposed for testability so tests can inject a hand-rolled fake without patching internals.

Returns:

The generated Python script content.

Return type:

str