turbo_broccoli.guard
If a block of code produces a JSON file, say out/foo.json, and if it is not
needed to rerun the block if the output file exists, then a guarded block
handler if an alternative to
if not Path("out/foo.json").exists():
...
if success:
tb.save_json(result, "out/foo.json")
else:
result = tb.load_json("out/foo.json")
A guarded block handler allows to guard an entire block of code, and even a loop on a per-iteration basis.
Guarding a simple block
Use it as follows:
h = GuardedBlockHandler("out/foo.json")
for _ in h:
# This whole block will be skipped if out/foo.json exists
# If not, don't forget to set the results:
h.result = ...
# In any case, the results of the block are available in h.result
I know the syntax isn't the prettiest, it would be more natural to use a with
h: syntax but python doesn't allow for context managers that don't yield...
The handler's result is None by default. If h.result is left to None,
no output file is created. This allows for scenarios like
h = GuardedBlockHandler("out/foo.json")
for _ in h:
... # Guarded code
if success:
h.result = ...
It is also possible to use "native" saving/loading methods:
h = GuardedBlockHandler("out/foo.csv")
for _ in h:
...
h.result = some_pandas_dataframe
See turbo_broccoli.native.save and turbo_broccoli.native.load. Finally, if
the actual result of the block are not needed, use:
h = GuardedBlockHandler("out/large.json", load_if_skip=False)
for _ in h:
...
# If the block was skipped (out/large.json already exists), h.result is
# None instead of the content of out/large.json
Guarding a loop
Let's say you have a loop
for x in an_iterable:
... # expensive code that produces a result you want to save
You can guard the loop as follows:
h = GuardedBlockHandler("out/foo.json")
for i, x in h(an_iterable): # an_iterable is always enumerated!
# h.result is already a dict, no need to initialize it
... # expensive code that produces a result you want to save
h.result[x] = ...
The contents of h.result are saved to out/foo.json at the end of every
iteration. However, if out/foo.json already exists, the loop will skip all
iterations that are already saved. In details, let's say that the contents of
out/foo.json is
{"a": "aaa", "b": "bbb"}
Then the content of the following loop is only executed for "c":
for i, x in h(["a", "b", "c"]):
h.result[x] = x * 3
# h.result is now {"a": "aaa", "b": "bbb", "c": "ccc"}
If you want h.result to be a list instead of a dict, use:
h = GuardedBlockHandler("out/foo.json")
for i, x in h(an_iterable, result_type="list"):
# h.result is already a list, no need to initialize it
... # expensive code that produces a result you want to save
h.result.append(...)
Caveats
Recall that in the case of simple blocks, setting/leaving
h.resulttoNoneis understood as a failed computation:for _ in h: h.result = None for Z_ in h: # This block isn't skipped h.result = "Hello world"In the case of loops however, if an entry of
h.resultis set toNone, the corresponding iteration is not treated as failed. For example:for i, x in h(["a", "b", "c"]): h.result[x] = x * 3 if x != "b" else None # h.result is now {"a": "aaa", "b": None, "c": "ccc"} for i, x in h(["a", "b", "c"]): h.result[x] = x * 3 # The second loop has been completely skipped, h.result is still # {"a": "aaa", "b": None, "c": "ccc"}The
load_if_skipconstructor argument has no effect, meaning that the JSON file is always loaded if it exists. If you want some level of laziness, consider the following trick:from turbo_broccoli.context import EmbeddedDict h = GuardedBlockHandler("out/foo.json", nodecode_types=["embedded"]) for i, x in h(["a", "b", "c"]): y = ... # a dict that is expensive to compute h.result[x] = EmbeddedDict(y)By changing the type of
yfrom a dict to anEmbeddedDict, and setting the"embedded"type in the guarded block handler's internal context'snodecode_types, results that were already present in the JSON file will not be decoded.
1""" 2If a block of code produces a JSON file, say `out/foo.json`, and if it is not 3needed to rerun the block if the output file exists, then a guarded block 4handler if an alternative to 5 6```py 7if not Path("out/foo.json").exists(): 8 ... 9 if success: 10 tb.save_json(result, "out/foo.json") 11else: 12 result = tb.load_json("out/foo.json") 13``` 14 15A guarded block handler allows to *guard* an entire block of code, and even a 16loop on a per-iteration basis. 17 18## Guarding a simple block 19 20Use it as follows: 21 22```py 23h = GuardedBlockHandler("out/foo.json") 24for _ in h: 25 # This whole block will be skipped if out/foo.json exists 26 # If not, don't forget to set the results: 27 h.result = ... 28# In any case, the results of the block are available in h.result 29``` 30 31I know the syntax isn't the prettiest, it would be more natural to use a `with 32h:` syntax but python doesn't allow for context managers that don't yield... 33The handler's `result` is `None` by default. If `h.result` is left to `None`, 34no output file is created. This allows for scenarios like 35 36```py 37h = GuardedBlockHandler("out/foo.json") 38for _ in h: 39 ... # Guarded code 40 if success: 41 h.result = ... 42``` 43 44It is also possible to use ["native" saving/loading 45methods](https://altaris.github.io/turbo-broccoli/turbo_broccoli/native.html#save): 46 47```py 48h = GuardedBlockHandler("out/foo.csv") 49for _ in h: 50 ... 51 h.result = some_pandas_dataframe 52``` 53 54See `turbo_broccoli.native.save` and `turbo_broccoli.native.load`. Finally, if 55the actual result of the block are not needed, use: 56 57```py 58h = GuardedBlockHandler("out/large.json", load_if_skip=False) 59for _ in h: 60 ... 61# If the block was skipped (out/large.json already exists), h.result is 62# None instead of the content of out/large.json 63``` 64 65## Guarding a loop 66 67Let's say you have a loop 68 69```py 70for x in an_iterable: 71 ... # expensive code that produces a result you want to save 72``` 73 74You can guard the loop as follows: 75 76```py 77h = GuardedBlockHandler("out/foo.json") 78for i, x in h(an_iterable): # an_iterable is always enumerated! 79 # h.result is already a dict, no need to initialize it 80 ... # expensive code that produces a result you want to save 81 h.result[x] = ... 82``` 83 84The contents of `h.result` are saved to `out/foo.json` at the end of every 85iteration. However, if `out/foo.json` already exists, the loop will skip all 86iterations that are already saved. In details, let's say that the contents of 87`out/foo.json` is 88 89```json 90{"a": "aaa", "b": "bbb"} 91``` 92 93Then the content of the following loop is only executed for `"c"`: 94 95```py 96for i, x in h(["a", "b", "c"]): 97 h.result[x] = x * 3 98# h.result is now {"a": "aaa", "b": "bbb", "c": "ccc"} 99``` 100 101If you want `h.result` to be a list instead of a dict, use: 102 103```py 104h = GuardedBlockHandler("out/foo.json") 105for i, x in h(an_iterable, result_type="list"): 106 # h.result is already a list, no need to initialize it 107 ... # expensive code that produces a result you want to save 108 h.result.append(...) 109``` 110 111### Caveats 112 113- Recall that in the case of simple blocks, setting/leaving `h.result` to 114 `None` is understood as a failed computation: 115 116 ```py 117 for _ in h: 118 h.result = None 119 for Z_ in h: # This block isn't skipped 120 h.result = "Hello world" 121 ``` 122 123 In the case of loops however, if an entry of `h.result` is set to `None`, the 124 corresponding iteration is not treated as failed. For example: 125 126 ```py 127 for i, x in h(["a", "b", "c"]): 128 h.result[x] = x * 3 if x != "b" else None 129 # h.result is now {"a": "aaa", "b": None, "c": "ccc"} 130 for i, x in h(["a", "b", "c"]): 131 h.result[x] = x * 3 132 # The second loop has been completely skipped, h.result is still 133 # {"a": "aaa", "b": None, "c": "ccc"} 134 ``` 135 136- The `load_if_skip` constructor argument has no effect, meaning that the JSON 137 file is always loaded if it exists. If you want some level of laziness, 138 consider the following trick: 139 140 ```py 141 from turbo_broccoli.context import EmbeddedDict 142 143 h = GuardedBlockHandler("out/foo.json", nodecode_types=["embedded"]) 144 for i, x in h(["a", "b", "c"]): 145 y = ... # a dict that is expensive to compute 146 h.result[x] = EmbeddedDict(y) 147 ``` 148 149 By changing the type of `y` from a dict to an `EmbeddedDict`, and setting the 150 `"embedded"` type in the guarded block handler's internal context's 151 `nodecode_types`, results that were already present in the JSON file will not 152 be decoded. 153""" 154 155from pathlib import Path 156 157try: 158 from loguru import logger as logging 159except ModuleNotFoundError: 160 import logging # type: ignore 161 162from typing import Any, Generator, Iterable, Literal 163 164from .context import Context 165from .native import load as native_load 166from .native import save as native_save 167 168 169class GuardedBlockHandler: 170 """See module documentation""" 171 172 block_name: str | None 173 context: Context 174 file_path: Path 175 load_if_skip: bool 176 result: Any = None 177 178 def __call__( 179 self, it: Iterable, **kwargs 180 ) -> Generator[tuple[int, Any], None, None]: 181 """Alias for `GuardedBlockHandler.guard` with an iterable""" 182 yield from self.guard(it, **kwargs) 183 184 def __init__( 185 self, 186 file_path: str | Path, 187 block_name: str | None = None, 188 load_if_skip: bool = True, 189 context: Context | None = None, 190 **kwargs, 191 ) -> None: 192 """ 193 Args: 194 file_path (str | Path): Output file path. 195 block_name (str, optional): Name of the block, for logging 196 purposes. Can be left to `None` to suppress such logs. 197 load_if_skip (bool, optional): Wether to load the output file if 198 the block is skipped. 199 context (turbo_broccoli.context.Context, optional): Context to use 200 when saving/loading the target JSON file. If left to `None`, a 201 new context is built from the kwargs. 202 **kwargs: Forwarded to the `turbo_broccoli.context.Context` 203 constructor. Ignored if `context` is not `None`. 204 """ 205 self.file_path = kwargs["file_path"] = Path(file_path) 206 self.block_name, self.load_if_skip = block_name, load_if_skip 207 self.context = context if context is not None else Context(**kwargs) 208 209 def __iter__(self) -> Generator[Any, None, None]: 210 """ 211 Alias for `GuardedBlockHandler.guard` with no iterable and no kwargs 212 """ 213 yield from self.guard() 214 215 def _guard_iter( 216 self, 217 it: Iterable, 218 result_type: Literal["dict", "list"] = "dict", 219 **kwargs, 220 ) -> Generator[tuple[int, Any], None, None]: 221 if self.file_path.is_file(): 222 self.result = native_load(self.file_path) 223 else: 224 self.result = {} if result_type == "dict" else [] 225 if result_type == "dict": 226 yield from self._guard_iter_dict(it, **kwargs) 227 else: 228 yield from self._guard_iter_list(it, **kwargs) 229 230 def _guard_iter_dict( 231 self, it: Iterable, **__ 232 ) -> Generator[tuple[int, Any], None, None]: 233 for i, x in enumerate(it): 234 if x in self.result: 235 if self.block_name: 236 logging.debug( 237 f"Skipped iteration '{str(x)}' of guarded loop " 238 f"'{self.block_name}'" 239 ) 240 continue 241 yield (i, x) 242 self._save() 243 244 def _guard_iter_list( 245 self, it: Iterable, **__ 246 ) -> Generator[tuple[int, Any], None, None]: 247 for i, x in enumerate(it): 248 if i < len(self.result): 249 if self.block_name: 250 logging.debug( 251 f"Skipped iteration {i} of guarded loop " 252 f"'{self.block_name}'" 253 ) 254 continue 255 yield (i, x) 256 self._save() 257 258 def _guard_no_iter(self, **__) -> Generator[Any, None, None]: 259 if self.file_path.is_file(): 260 self.result = ( 261 native_load(self.file_path) if self.load_if_skip else None 262 ) 263 if self.block_name: 264 logging.debug(f"Skipped guarded block '{self.block_name}'") 265 return 266 yield self 267 if self.result is not None: 268 self._save() 269 if self.block_name is not None: 270 logging.debug( 271 f"Saved guarded block '{self.block_name}' results to " 272 f"'{self.file_path}'" 273 ) 274 275 def _save(self): 276 """Saves `self.result`""" 277 self.file_path.parent.mkdir(parents=True, exist_ok=True) 278 native_save(self.result, self.file_path) 279 280 def guard( 281 self, it: Iterable | None = None, **kwargs 282 ) -> Generator[Any, None, None]: 283 """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation""" 284 if it is None: 285 yield from self._guard_no_iter(**kwargs) 286 else: 287 yield from self._guard_iter(it, **kwargs)
170class GuardedBlockHandler: 171 """See module documentation""" 172 173 block_name: str | None 174 context: Context 175 file_path: Path 176 load_if_skip: bool 177 result: Any = None 178 179 def __call__( 180 self, it: Iterable, **kwargs 181 ) -> Generator[tuple[int, Any], None, None]: 182 """Alias for `GuardedBlockHandler.guard` with an iterable""" 183 yield from self.guard(it, **kwargs) 184 185 def __init__( 186 self, 187 file_path: str | Path, 188 block_name: str | None = None, 189 load_if_skip: bool = True, 190 context: Context | None = None, 191 **kwargs, 192 ) -> None: 193 """ 194 Args: 195 file_path (str | Path): Output file path. 196 block_name (str, optional): Name of the block, for logging 197 purposes. Can be left to `None` to suppress such logs. 198 load_if_skip (bool, optional): Wether to load the output file if 199 the block is skipped. 200 context (turbo_broccoli.context.Context, optional): Context to use 201 when saving/loading the target JSON file. If left to `None`, a 202 new context is built from the kwargs. 203 **kwargs: Forwarded to the `turbo_broccoli.context.Context` 204 constructor. Ignored if `context` is not `None`. 205 """ 206 self.file_path = kwargs["file_path"] = Path(file_path) 207 self.block_name, self.load_if_skip = block_name, load_if_skip 208 self.context = context if context is not None else Context(**kwargs) 209 210 def __iter__(self) -> Generator[Any, None, None]: 211 """ 212 Alias for `GuardedBlockHandler.guard` with no iterable and no kwargs 213 """ 214 yield from self.guard() 215 216 def _guard_iter( 217 self, 218 it: Iterable, 219 result_type: Literal["dict", "list"] = "dict", 220 **kwargs, 221 ) -> Generator[tuple[int, Any], None, None]: 222 if self.file_path.is_file(): 223 self.result = native_load(self.file_path) 224 else: 225 self.result = {} if result_type == "dict" else [] 226 if result_type == "dict": 227 yield from self._guard_iter_dict(it, **kwargs) 228 else: 229 yield from self._guard_iter_list(it, **kwargs) 230 231 def _guard_iter_dict( 232 self, it: Iterable, **__ 233 ) -> Generator[tuple[int, Any], None, None]: 234 for i, x in enumerate(it): 235 if x in self.result: 236 if self.block_name: 237 logging.debug( 238 f"Skipped iteration '{str(x)}' of guarded loop " 239 f"'{self.block_name}'" 240 ) 241 continue 242 yield (i, x) 243 self._save() 244 245 def _guard_iter_list( 246 self, it: Iterable, **__ 247 ) -> Generator[tuple[int, Any], None, None]: 248 for i, x in enumerate(it): 249 if i < len(self.result): 250 if self.block_name: 251 logging.debug( 252 f"Skipped iteration {i} of guarded loop " 253 f"'{self.block_name}'" 254 ) 255 continue 256 yield (i, x) 257 self._save() 258 259 def _guard_no_iter(self, **__) -> Generator[Any, None, None]: 260 if self.file_path.is_file(): 261 self.result = ( 262 native_load(self.file_path) if self.load_if_skip else None 263 ) 264 if self.block_name: 265 logging.debug(f"Skipped guarded block '{self.block_name}'") 266 return 267 yield self 268 if self.result is not None: 269 self._save() 270 if self.block_name is not None: 271 logging.debug( 272 f"Saved guarded block '{self.block_name}' results to " 273 f"'{self.file_path}'" 274 ) 275 276 def _save(self): 277 """Saves `self.result`""" 278 self.file_path.parent.mkdir(parents=True, exist_ok=True) 279 native_save(self.result, self.file_path) 280 281 def guard( 282 self, it: Iterable | None = None, **kwargs 283 ) -> Generator[Any, None, None]: 284 """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation""" 285 if it is None: 286 yield from self._guard_no_iter(**kwargs) 287 else: 288 yield from self._guard_iter(it, **kwargs)
See module documentation
185 def __init__( 186 self, 187 file_path: str | Path, 188 block_name: str | None = None, 189 load_if_skip: bool = True, 190 context: Context | None = None, 191 **kwargs, 192 ) -> None: 193 """ 194 Args: 195 file_path (str | Path): Output file path. 196 block_name (str, optional): Name of the block, for logging 197 purposes. Can be left to `None` to suppress such logs. 198 load_if_skip (bool, optional): Wether to load the output file if 199 the block is skipped. 200 context (turbo_broccoli.context.Context, optional): Context to use 201 when saving/loading the target JSON file. If left to `None`, a 202 new context is built from the kwargs. 203 **kwargs: Forwarded to the `turbo_broccoli.context.Context` 204 constructor. Ignored if `context` is not `None`. 205 """ 206 self.file_path = kwargs["file_path"] = Path(file_path) 207 self.block_name, self.load_if_skip = block_name, load_if_skip 208 self.context = context if context is not None else Context(**kwargs)
Args:
file_path (str | Path): Output file path.
block_name (str, optional): Name of the block, for logging
purposes. Can be left to None to suppress such logs.
load_if_skip (bool, optional): Wether to load the output file if
the block is skipped.
context (turbo_broccoli.context.Context, optional): Context to use
when saving/loading the target JSON file. If left to None, a
new context is built from the kwargs.
**kwargs: Forwarded to the turbo_broccoli.context.Context
constructor. Ignored if context is not None.
281 def guard( 282 self, it: Iterable | None = None, **kwargs 283 ) -> Generator[Any, None, None]: 284 """See `turbo_broccoli.guard.GuardedBlockHandler`'s documentation""" 285 if it is None: 286 yield from self._guard_no_iter(**kwargs) 287 else: 288 yield from self._guard_iter(it, **kwargs)
See turbo_broccoli.guard.GuardedBlockHandler's documentation