turbo_broccoli.guard
If a block of code produces a JSON file, say out/foo.json
, and if it is not
needed to rerun the block if the output file exists, then a guarded block
handler if an alternative to
if not Path("out/foo.json").exists():
...
if success:
tb.save_json(result, "out/foo.json")
else:
result = tb.load_json("out/foo.json")
A guarded block handler allows to guard an entire block of code, and even a loop on a per-iteration basis.
Guarding a simple block
Use it as follows:
h = GuardedBlockHandler("out/foo.json")
for _ in h:
# This whole block will be skipped if out/foo.json exists
# If not, don't forget to set the results:
h.result = ...
# In any case, the results of the block are available in h.result
I know the syntax isn't the prettiest, it would be more natural to use a with
h:
syntax but python doesn't allow for context managers that don't yield...
The handler's result
is None
by default. If h.result
is left to None
,
no output file is created. This allows for scenarios like
h = GuardedBlockHandler("out/foo.json")
for _ in h:
... # Guarded code
if success:
h.result = ...
It is also possible to use "native" saving/loading methods:
h = GuardedBlockHandler("out/foo.csv")
for _ in h:
...
h.result = some_pandas_dataframe
See turbo_broccoli.native.save
and turbo_broccoli.native.load
. Finally, if
the actual result of the block are not needed, use:
h = GuardedBlockHandler("out/large.json", load_if_skip=False)
for _ in h:
...
# If the block was skipped (out/large.json already exists), h.result is
# None instead of the content of out/large.json
Guarding a loop
Let's say you have a loop
for x in an_iterable:
... # expensive code that produces a result you want to save
You can guard the loop as follows:
h = GuardedBlockHandler("out/foo.json")
for i, x in h(an_iterable): # an_iterable is always enumerated!
# h.result is already a dict, no need to initialize it
... # expensive code that produces a result you want to save
h.result[x] = ...
The contents of h.result
are saved to out/foo.json
at the end of every
iteration. However, if out/foo.json
already exists, the loop will skip all
iterations that are already saved. In details, let's say that the contents of
out/foo.json
is
{"a": "aaa", "b": "bbb"}
Then the content of the following loop is only executed for "c"
:
for i, x in h(["a", "b", "c"]):
h.result[x] = x * 3
# h.result is now {"a": "aaa", "b": "bbb", "c": "ccc"}
If you want h.result
to be a list instead of a dict, use:
h = GuardedBlockHandler("out/foo.json")
for i, x in h(an_iterable, result_type="list"):
# h.result is already a list, no need to initialize it
... # expensive code that produces a result you want to save
h.result.append(...)
Caveats
Recall that in the case of simple blocks, setting/leaving
h.result
toNone
is understood as a failed computation:for _ in h: h.result = None for Z_ in h: # This block isn't skipped h.result = "Hello world"
In the case of loops however, if an entry of
h.result
is set toNone
, the corresponding iteration is not treated as failed. For example:for i, x in h(["a", "b", "c"]): h.result[x] = x * 3 if x != "b" else None # h.result is now {"a": "aaa", "b": None, "c": "ccc"} for i, x in h(["a", "b", "c"]): h.result[x] = x * 3 # The second loop has been completely skipped, h.result is still # {"a": "aaa", "b": None, "c": "ccc"}
The
load_if_skip
constructor argument has no effect, meaning that the JSON file is always loaded if it exists. If you want some level of laziness, consider the following trick:from turbo_broccoli.context import EmbeddedDict h = GuardedBlockHandler("out/foo.json", nodecode_types=["embedded"]) for i, x in h(["a", "b", "c"]): y = ... # a dict that is expensive to compute h.result[x] = EmbeddedDict(y)
By changing the type of
y
from a dict to anEmbeddedDict
, and setting the"embedded"
type in the guarded block handler's internal context'snodecode_types
, results that were already present in the JSON file will not be decoded.
1""" 2If a block of code produces a JSON file, say `out/foo.json`, and if it is not 3needed to rerun the block if the output file exists, then a guarded block 4handler if an alternative to 5 6```py 7if not Path("out/foo.json").exists(): 8 ... 9 if success: 10 tb.save_json(result, "out/foo.json") 11else: 12 result = tb.load_json("out/foo.json") 13``` 14 15A guarded block handler allows to *guard* an entire block of code, and even a 16loop on a per-iteration basis. 17 18## Guarding a simple block 19 20Use it as follows: 21 22```py 23h = GuardedBlockHandler("out/foo.json") 24for _ in h: 25 # This whole block will be skipped if out/foo.json exists 26 # If not, don't forget to set the results: 27 h.result = ... 28# In any case, the results of the block are available in h.result 29``` 30 31I know the syntax isn't the prettiest, it would be more natural to use a `with 32h:` syntax but python doesn't allow for context managers that don't yield... 33The handler's `result` is `None` by default. If `h.result` is left to `None`, 34no output file is created. This allows for scenarios like 35 36```py 37h = GuardedBlockHandler("out/foo.json") 38for _ in h: 39 ... # Guarded code 40 if success: 41 h.result = ... 42``` 43 44It is also possible to use ["native" saving/loading 45methods](https://altaris.github.io/turbo-broccoli/turbo_broccoli/native.html#save): 46 47```py 48h = GuardedBlockHandler("out/foo.csv") 49for _ in h: 50 ... 51 h.result = some_pandas_dataframe 52``` 53 54See `turbo_broccoli.native.save` and `turbo_broccoli.native.load`. Finally, if 55the actual result of the block are not needed, use: 56 57```py 58h = GuardedBlockHandler("out/large.json", load_if_skip=False) 59for _ in h: 60 ... 61# If the block was skipped (out/large.json already exists), h.result is 62# None instead of the content of out/large.json 63``` 64 65## Guarding a loop 66 67Let's say you have a loop 68 69```py 70for x in an_iterable: 71 ... # expensive code that produces a result you want to save 72``` 73 74You can guard the loop as follows: 75 76```py 77h = GuardedBlockHandler("out/foo.json") 78for i, x in h(an_iterable): # an_iterable is always enumerated! 79 # h.result is already a dict, no need to initialize it 80 ... # expensive code that produces a result you want to save 81 h.result[x] = ... 82``` 83 84The contents of `h.result` are saved to `out/foo.json` at the end of every 85iteration. However, if `out/foo.json` already exists, the loop will skip all 86iterations that are already saved. In details, let's say that the contents of 87`out/foo.json` is 88 89```json 90{"a": "aaa", "b": "bbb"} 91``` 92 93Then the content of the following loop is only executed for `"c"`: 94 95```py 96for i, x in h(["a", "b", "c"]): 97 h.result[x] = x * 3 98# h.result is now {"a": "aaa", "b": "bbb", "c": "ccc"} 99``` 100 101If you want `h.result` to be a list instead of a dict, use: 102 103```py 104h = GuardedBlockHandler("out/foo.json") 105for i, x in h(an_iterable, result_type="list"): 106 # h.result is already a list, no need to initialize it 107 ... # expensive code that produces a result you want to save 108 h.result.append(...) 109``` 110 111### Caveats 112 113- Recall that in the case of simple blocks, setting/leaving `h.result` to 114 `None` is understood as a failed computation: 115 116 ```py 117 for _ in h: 118 h.result = None 119 for Z_ in h: # This block isn't skipped 120 h.result = "Hello world" 121 ``` 122 123 In the case of loops however, if an entry of `h.result` is set to `None`, the 124 corresponding iteration is not treated as failed. For example: 125 126 ```py 127 for i, x in h(["a", "b", "c"]): 128 h.result[x] = x * 3 if x != "b" else None 129 # h.result is now {"a": "aaa", "b": None, "c": "ccc"} 130 for i, x in h(["a", "b", "c"]): 131 h.result[x] = x * 3 132 # The second loop has been completely skipped, h.result is still 133 # {"a": "aaa", "b": None, "c": "ccc"} 134 ``` 135 136- The `load_if_skip` constructor argument has no effect, meaning that the JSON 137 file is always loaded if it exists. If you want some level of laziness, 138 consider the following trick: 139 140 ```py 141 from turbo_broccoli.context import EmbeddedDict 142 143 h = GuardedBlockHandler("out/foo.json", nodecode_types=["embedded"]) 144 for i, x in h(["a", "b", "c"]): 145 y = ... # a dict that is expensive to compute 146 h.result[x] = EmbeddedDict(y) 147 ``` 148 149 By changing the type of `y` from a dict to an `EmbeddedDict`, and setting the 150 `"embedded"` type in the guarded block handler's internal context's 151 `nodecode_types`, results that were already present in the JSON file will not 152 be decoded. 153""" 154 155from pathlib import Path 156 157try: 158 from loguru import logger as logging 159except ModuleNotFoundError: 160 import logging # type: ignore 161 162from typing import Any, Generator, Iterable, Literal 163 164from .context import Context 165from .native import load as native_load 166from .native import save as native_save 167 168 169class GuardedBlockHandler: 170 """See module documentation""" 171 172 block_name: str | None 173 context: Context 174 file_path: Path 175 load_if_skip: bool 176 result: Any = None 177 178 def __call__( 179 self, it: Iterable, **kwargs 180 ) -> Generator[tuple[int, Any], None, None]: 181 """Alias for `GuardedBlockHandler.guard` with an iterable""" 182 yield from self.guard(it, **kwargs) 183 184 def __init__( 185 self, 186 file_path: str | Path, 187 block_name: str | None = None, 188 load_if_skip: bool = True, 189 context: Context | None = None, 190 **kwargs, 191 ) -> None: 192 """ 193 Args: 194 file_path (str | Path): Output file path. 195 block_name (str, optional): Name of the block, for logging 196 purposes. Can be left to `None` to suppress such logs. 197 load_if_skip (bool, optional): Wether to load the output file if 198 the block is skipped. 199 context (turbo_broccoli.context.Context, optional): Context to use 200 when saving/loading the target JSON file. If left to `None`, a 201 new context is built from the kwargs. 202 **kwargs: Forwarded to the `turbo_broccoli.context.Context` 203 constructor. Ignored if `context` is not `None`. 204 """ 205 self.file_path = kwargs["file_path"] = Path(file_path) 206 self.block_name, self.load_if_skip = block_name, load_if_skip 207 self.context = context if context is not None else Context(**kwargs) 208 209 def __iter__(self) -> Generator[Any, None, None]: 210 """ 211 Alias for `GuardedBlockHandler.guard` with no iterable and no kwargs 212 """ 213 yield from self.guard() 214 215 def _guard_iter( 216 self, 217 it: Iterable, 218 result_type: Literal["dict", "list"] = "dict", 219 **kwargs, 220 ) -> Generator[tuple[int, Any], None, None]: 221 if self.file_path.is_file(): 222 self.result = native_load(self.file_path) 223 else: 224 self.result = {} if result_type == "dict" else [] 225 if result_type == "dict": 226 yield from self._guard_iter_dict(it, **kwargs) 227 else: 228 yield from self._guard_iter_list(it, **kwargs) 229 230 def _guard_iter_dict( 231 self, it: Iterable, **__ 232 ) -> Generator[tuple[int, Any], None, None]: 233 for i, x in enumerate(it): 234 if x in self.result: 235 if self.block_name: 236 logging.debug( 237 f"Skipped iteration '{str(x)}' of guarded loop " 238 f"'{self.block_name}'" 239 ) 240 continue 241 yield (i, x) 242 self._save() 243 244 def _guard_iter_list( 245 self, it: Iterable, **__ 246 ) -> Generator[tuple[int, Any], None, None]: 247 for i, x in enumerate(it): 248 if i < len(self.result): 249 if self.block_name: 250 logging.debug( 251 f"Skipped iteration {i} of guarded loop " 252 f"'{self.block_name}'" 253 ) 254 continue 255 yield (i, x) 256 self._save() 257 258 def _guard_no_iter(self, **__) -> Generator[Any, None, None]: 259 if self.file_path.is_file(): 260 self.result = ( 261 native_load(self.file_path) if self.load_if_skip else None 262 ) 263 if self.block_name: 264 logging.debug(f"Skipped guarded block '{self.block_name}'") 265 return 266 yield self 267 if self.result is not None: 268 self._save() 269 if self.block_name is not None: 270 logging.debug( 271 f"Saved guarded block '{self.block_name}' results to " 272 f"'{self.file_path}'" 273 ) 274 275 def _save(self): 276 """Saves `self.result`""" 277 self.file_path.parent.mkdir(parents=True, exist_ok=True) 278 native_save(self.result, self.file_path) 279 280 def guard( 281 self, it: Iterable | None = None, **kwargs 282 ) -> Generator[Any, None, None]: 283 """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation""" 284 if it is None: 285 yield from self._guard_no_iter(**kwargs) 286 else: 287 yield from self._guard_iter(it, **kwargs)
170class GuardedBlockHandler: 171 """See module documentation""" 172 173 block_name: str | None 174 context: Context 175 file_path: Path 176 load_if_skip: bool 177 result: Any = None 178 179 def __call__( 180 self, it: Iterable, **kwargs 181 ) -> Generator[tuple[int, Any], None, None]: 182 """Alias for `GuardedBlockHandler.guard` with an iterable""" 183 yield from self.guard(it, **kwargs) 184 185 def __init__( 186 self, 187 file_path: str | Path, 188 block_name: str | None = None, 189 load_if_skip: bool = True, 190 context: Context | None = None, 191 **kwargs, 192 ) -> None: 193 """ 194 Args: 195 file_path (str | Path): Output file path. 196 block_name (str, optional): Name of the block, for logging 197 purposes. Can be left to `None` to suppress such logs. 198 load_if_skip (bool, optional): Wether to load the output file if 199 the block is skipped. 200 context (turbo_broccoli.context.Context, optional): Context to use 201 when saving/loading the target JSON file. If left to `None`, a 202 new context is built from the kwargs. 203 **kwargs: Forwarded to the `turbo_broccoli.context.Context` 204 constructor. Ignored if `context` is not `None`. 205 """ 206 self.file_path = kwargs["file_path"] = Path(file_path) 207 self.block_name, self.load_if_skip = block_name, load_if_skip 208 self.context = context if context is not None else Context(**kwargs) 209 210 def __iter__(self) -> Generator[Any, None, None]: 211 """ 212 Alias for `GuardedBlockHandler.guard` with no iterable and no kwargs 213 """ 214 yield from self.guard() 215 216 def _guard_iter( 217 self, 218 it: Iterable, 219 result_type: Literal["dict", "list"] = "dict", 220 **kwargs, 221 ) -> Generator[tuple[int, Any], None, None]: 222 if self.file_path.is_file(): 223 self.result = native_load(self.file_path) 224 else: 225 self.result = {} if result_type == "dict" else [] 226 if result_type == "dict": 227 yield from self._guard_iter_dict(it, **kwargs) 228 else: 229 yield from self._guard_iter_list(it, **kwargs) 230 231 def _guard_iter_dict( 232 self, it: Iterable, **__ 233 ) -> Generator[tuple[int, Any], None, None]: 234 for i, x in enumerate(it): 235 if x in self.result: 236 if self.block_name: 237 logging.debug( 238 f"Skipped iteration '{str(x)}' of guarded loop " 239 f"'{self.block_name}'" 240 ) 241 continue 242 yield (i, x) 243 self._save() 244 245 def _guard_iter_list( 246 self, it: Iterable, **__ 247 ) -> Generator[tuple[int, Any], None, None]: 248 for i, x in enumerate(it): 249 if i < len(self.result): 250 if self.block_name: 251 logging.debug( 252 f"Skipped iteration {i} of guarded loop " 253 f"'{self.block_name}'" 254 ) 255 continue 256 yield (i, x) 257 self._save() 258 259 def _guard_no_iter(self, **__) -> Generator[Any, None, None]: 260 if self.file_path.is_file(): 261 self.result = ( 262 native_load(self.file_path) if self.load_if_skip else None 263 ) 264 if self.block_name: 265 logging.debug(f"Skipped guarded block '{self.block_name}'") 266 return 267 yield self 268 if self.result is not None: 269 self._save() 270 if self.block_name is not None: 271 logging.debug( 272 f"Saved guarded block '{self.block_name}' results to " 273 f"'{self.file_path}'" 274 ) 275 276 def _save(self): 277 """Saves `self.result`""" 278 self.file_path.parent.mkdir(parents=True, exist_ok=True) 279 native_save(self.result, self.file_path) 280 281 def guard( 282 self, it: Iterable | None = None, **kwargs 283 ) -> Generator[Any, None, None]: 284 """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation""" 285 if it is None: 286 yield from self._guard_no_iter(**kwargs) 287 else: 288 yield from self._guard_iter(it, **kwargs)
See module documentation
185 def __init__( 186 self, 187 file_path: str | Path, 188 block_name: str | None = None, 189 load_if_skip: bool = True, 190 context: Context | None = None, 191 **kwargs, 192 ) -> None: 193 """ 194 Args: 195 file_path (str | Path): Output file path. 196 block_name (str, optional): Name of the block, for logging 197 purposes. Can be left to `None` to suppress such logs. 198 load_if_skip (bool, optional): Wether to load the output file if 199 the block is skipped. 200 context (turbo_broccoli.context.Context, optional): Context to use 201 when saving/loading the target JSON file. If left to `None`, a 202 new context is built from the kwargs. 203 **kwargs: Forwarded to the `turbo_broccoli.context.Context` 204 constructor. Ignored if `context` is not `None`. 205 """ 206 self.file_path = kwargs["file_path"] = Path(file_path) 207 self.block_name, self.load_if_skip = block_name, load_if_skip 208 self.context = context if context is not None else Context(**kwargs)
Args:
file_path (str | Path): Output file path.
block_name (str, optional): Name of the block, for logging
purposes. Can be left to None
to suppress such logs.
load_if_skip (bool, optional): Wether to load the output file if
the block is skipped.
context (turbo_broccoli.context.Context, optional): Context to use
when saving/loading the target JSON file. If left to None
, a
new context is built from the kwargs.
**kwargs: Forwarded to the turbo_broccoli.context.Context
constructor. Ignored if context
is not None
.
281 def guard( 282 self, it: Iterable | None = None, **kwargs 283 ) -> Generator[Any, None, None]: 284 """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation""" 285 if it is None: 286 yield from self._guard_no_iter(**kwargs) 287 else: 288 yield from self._guard_iter(it, **kwargs)
See turbo_broccoli.guard.GuardedBlockHandler
's documentation