You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

190 lines
13 KiB

[comment {--- punk::docgen generated from inline doctools comments ---}]
[comment {--- punk::docgen DO NOT EDIT DOCS HERE UNLESS YOU REMOVE THESE COMMENT LINES ---}]
[comment {--- punk::docgen overwrites this file ---}]
[manpage_begin punkshell_module_punk::fileline 0 0.1.0]
[copyright "2024"]
[titledesc {file line-handling utilities}] [comment {-- Name section and table of contents description --}]
[moddesc {punk fileline}] [comment {-- Description at end of page heading --}]
[require punk::fileline]
[keywords module text parse file]
[description]
[para] -
[section Overview]
[para]Utilities for in-memory analysis of text file data as both line data and byte/char-counted data whilst preserving the line-endings (even if mixed)
[para]This is important for certain text files where examining the number of chars/bytes is important
[para]For example - windows .cmd/.bat files need some byte counting to determine if labels lie on chunk boundaries and need to be moved.
[para]Despite including the word 'file', the library doesn't deal with reading/writing to the filesystem. It is for operating on text-file like data.
[subsection Concepts]
[para]A chunk of textfile data (possibly representing a whole file - but usually at least a complete set of lines) is loaded into a punk::fileline::class::textinfo instance at object creation.
[example_begin]
package require punk::fileline
package require fileutil
set rawdata [lb]fileutil::cat data.txt -translation binary[rb]
punk::fileline::class::textinfo create obj_data $rawdata
puts stdout [lb]obj_data linecount[rb]
[example_end]
[subsection Notes]
[para]Line records are referred to by a zero-based index instead of a one-based index as is commonly used when displaying files.
[para]This is for programming consistency and convenience, and the module user should do their own conversion to one-based indexing for line display or messaging if desired.
[para]No support for lone carriage-returns being interpreted as line-endings.
[para]CR line-endings that are intended to be interpreted as such should be mapped to something else before the data is supplied to this module.
[subsection dependencies]
[para] packages needed by punk::fileline
[list_begin itemized]
[item] [package {Tcl 8.6}]
[list_end] [comment {- end dependencies list -}]
[subsection {optional dependencies}]
[para] packages that add functionality but aren't strictly required
[list_begin itemized]
[item] [package {punk::ansi}]
[para] - recommended for class::textinfo [method chunk_boundary_display]
[item] [package {punk::char}]
[para] - recommended for class::textinfo [method chunk_boundary_display]
[item] [package {overtype}]
[para] - recommended for class::textinfo [method chunk_boundary_display]
[list_end] [comment {- end optional dependencies list -}]
[section API]
[subsection {Namespace punk::fileline::class}]
[para] class definitions
[list_begin enumerated]
[enum] CLASS [class textinfo]
[list_begin definitions]
[para] [emph METHODS]
[call class::textinfo [method constructor] [arg datachunk] [opt {option value...}]]
[para] Constructor for textinfo object which represents a chunk or all of a file
[para] datachunk should be passed with the file data including line-endings as-is for full functionality. ie use something like:
[example_begin]
fconfigure $fd -translation binary
set chunkdata [lb]read $fd[rb]]
or
set chunkdata [lb]fileutil::cat <filename> -translation binary[rb]
[example_end]
[para] when loading the data
[call class::textinfo [method chunk] [arg chunkstart] [arg chunkend]]
[para]Return a range of bytes from the underlying raw chunk data.
[para] e.g The following retrieves the entire chunk
[para] objName chunk 0 end
[call class::textinfo [method chunklen]]
[para] Number of bytes/characters in the raw data of the file
[call class::textinfo [method chunk_boundary_display]]
[para]Returns a string displaying the boundaries at chunksize bytes between chunkstart and chunkend
[para]Defaults to using ansi colour if punk::ansi module is available. Use -ansi 0 to disable colour
[call class::textinfo [method linecount]]
[para] Number of lines in the raw data of the file, counted as per the policy in effect
[call class::textinfo [method line] [arg lineindex]]
[para]Reconstructs and returns the raw line using the payload and per-line stored line-ending metadata
[para]A 'line' may be returned without a line-ending if the unerlying chunk had trailing data without a line-ending (or the chunk was loaded under a non-standard -policy setting)
[para]Whilst such data may not conform to definitions (e.g POSIX) of the terms 'textfile' and 'line' - it is useful here to represent it as a line with metadata le set to "none"
[para]To return just the data which might more commonly be needed for dealing with lines, use the [method linepayload] method - which returns the line data minus line-ending
[call class::textinfo [method linepayload_find_glob] [arg globsearch] [opt {option value...}]]
[para]Return a lineinfolist (see [method lineinfo] and [method lineinfolist]) of lines where payload matches the [arg globsearch] string
[para]To limit the returned results use the -limit n option - where -limit 0 means return all matches.
[para]For example: [method linepayload_find_glob] "*test*" -limit 1
[para]The result is always a list of lineinfo dictionaries even if one item is returned
[para] -limitfrom can be start|end
[para]The order of results is always the order as they occur in the data - even if -limitfrom end is specified.
[para]-limitfrom end means that only the last -limit items are returned
[para]Note that as glob accepts [lb]chars[rb]] to mean match any character in the set given by chars, searching for literal square brackets should be done by escaping the bracket with a backslash
[para]This is true even if only a single square bracket is being searched for. e.g {*[lb]file*} will not find the word file followed by a left square-bracket - even though the search didn't close the square brackets.
[para]In the above case - the literal search should be {*\[lb]file*}
[call class::textinfo [method linepayload] [arg lineindex]]
[para]Return the text of the line indicated by the zero-based lineindex
[para]The line-ending is not returned in the data - but is still stored against this lineindex
[para]Line Metadata such as the line-ending for a particular line and the byte/character range it occupies within the chunk can be retrieved with the [method linemeta] method
[para]To retrieve both the line text and metadata in a single call the [method lineinfo] method can be used
[para]To retrieve an entire line including line-ending use the [method line] method.
[call class::textinfo [method linepayloads] [arg startindex] [arg endindex]]
[para]Return a list of just the payloads in the specified linindex range, with no metadata.
[call class::textinfo [method linemeta] [arg lineindex]]
[para]Return a dict of the metadata for the line indicated by the zero-based lineindex
[para]Keys returned include
[list_begin itemized]
[item] le
[para] A string representing the type of line-ending: crlf|lf|none
[item] linelen
[para] The number of characters/bytes in the whole line including line-ending if any
[item] payloadlen
[para] The number of character/bytes in the line excluding line-ending
[item] start
[para] The zero-based index into the associated raw file data indicating at which byte/character index this line begins
[item] end
[para] The zero-based index into the associated raw file data indicating at which byte/character index this line ends
[para] This end-point corresponds to the last character of the line-ending if any - not necessarily the last character of the line's payload
[list_end]
[call class::textinfo [method lineinfo] [arg lineindex]]
[para]Return a dict of the metadata and text for the line indicated by the zero-based lineindex
[para]This returns the same info as the [method linemeta] with an added key of 'payload' which is the text of the line without line-ending.
[para]The 'payload' value is the same as is returned from the [method linepayload] method.
[call class::textinfo [method lineinfolist] [arg startidx] [arg endidx]]
[para]Returns list of lineinfo dicts for each line in line index range startidx to endidx
[call class::textinfo [method linerange_to_chunkrange] [arg startidx] [arg endidx]]
[call class::textinfo [method linerange_to_chunk] [arg startidx] [arg endidx]]
[call class::textinfo [method lines] [arg startidx] [arg endidx]]
[call class::textinfo [method linepayloads] [arg startidx] [arg endidx]]
[call class::textinfo [method chunkrange_to_linerange] [arg chunkstart] [arg chunkend]]
[call class::textinfo [method chunkrange_to_lineinfolist] [arg chunkstart] [arg chunkend] [opt {option value...}]]
[para]Return a list of dicts each with structure like the result of the [method lineinfo] method - but possibly with extra keys for truncation information if -show_truncated 1 is supplied
[para]The truncation key in a lineinfo dict may be returned for first and/or last line in the resulting list.
[para]truncation shows the shortened (missing bytes on left and/or right side) part of the entire line (potentially including line-ending or even partial line-ending)
[para]Note that this truncation info is only in the return value of this method - and will not be reflected in [method lineinfo] queries to the main chunk.
[call class::textinfo [method numeric_linerange] [arg startidx] [arg endidx]]
[para]A helper to return any Tcl-style end end-x values given to startidx or endidx; converted to their specific values based on the current state of the underlying line data
[para]This is used internally by API functions such as [method line] to enable it to accept more expressive indices
[call class::textinfo [method numeric_chunkrange] [arg startidx] [arg endidx]]
[para]A helper to return any Tcl-style end end-x entries supplied to startidx or endidx; converted to their specific values based on the current state of the underlying chunk data
[call class::textinfo [method normalize_indices] [arg startidx] [arg endidx] [arg max]]
[para]A utility to convert some of the of Tcl-style list-index expressions such as end, end-1 etc to valid indices in the range 0 to the supplied max
[para]Basic addition and subtraction expressions such as 4-1 5+2 are accepted
[para]startidx higher than endidx is allowed
[para]Unlike Tcl's index expressions - we raise an error if the calculated index is out of bounds 0 to max
[call class::textinfo [method regenerate_lines]]
[para]generate a list of lines from the current state of the stored raw data chunk and keep a map of line-endings indexed by lineindex
[para]This is called automatically by the Constructor during object creation
[para]It is exposed in the API experimentally - as chunk and line manipulation functions are considered.
[para]TODO - review whether such manual control will be necessary/desirable
[list_end]
[list_end] [comment {--- end class enumeration ---}]
[subsection {Namespace punk::fileline}]
[para] Core API functions for punk::fileline
[list_begin definitions]
[list_end] [comment {--- end definitions namespace punk::fileline ---}]
[subsection {Namespace punk::fileline::lib}]
[para] Secondary functions that are part of the API
[list_begin definitions]
[call [fun lib::range_spans_chunk_boundaries] [arg start] [arg end] [arg chunksize]]
[para]Takes start and end offset, generally representing bytes or character indices, and computes a list of boundaries at multiples of the chunksize that are spanned by the start and end range.
[list_begin arguments]
[arg_def integer start]
[para] zero-based start index of range
[arg_def integer end]
[para] zero-based end index of range
[arg_def integer chunksize]
[para] Number of bytes/characters in chunk - must be positive and > 0
[list_end]
[para]returns a dict with the keys is_span and boundaries
[para]is_span 0|1 indicates if the range specified spans a boundary of chunksize
[para]boundaries contains a list of the spanned boundaries - which are always multiples of the chunksize
[para]e.g
[example_begin]
range_spans_chunk_boundaries 10 1750 512
is_span 1 boundaries {512 1024 1536}
[example_end]
[para]The -offset <int> option
[example_begin]
range_spans_chunk_boundaries 10 1750 512 -offset 2
is_span 1 boundaries {514 1026 1538}
[example_end]
[para] This function automatically uses lseq (if Tcl >= 8.7) when number of boundaries spanned is approximately greater than 75
[list_end] [comment {--- end definitions namespace punk::fileline::lib ---}]
[section Internal]
[subsection {Namespace punk::fileline::system}]
[para] Internal functions that are not part of the API
[subsection {Namespace punk::fileline::ansi}]
[para]These are ansi functions imported from punk::ansi - or no-ops if that package is unavailable
[para]See [package punk::ansi] for documentation
[list_begin definitions]
[call [fun ansi::a]]
[call [fun ansi::a+]]
[call [fun ansi::stripansi]]
[list_end] [comment {--- end definitions namespace punk::fileline::ansi ---}]
[manpage_end]