因此,我正在制作一个Rust中的lexer,它有一堆测试用例需要通过.我的Rust代码现在的问题是,当我使用字符串运行测试用例时,字符串的测试用例都不起作用.但其他令牌的测试用例正在发挥作用.
so for example if there was a test that used "hello world":
ie:
fn test_03() {
assert_eq!(lex("hello world"),vec![Token::Alpha(b'h'), Token::Alpha(b'e'), Token::Alpha(b'l'), Token::Alpha(b'l'), Token::Alpha(b'o'), Token::WhiteSpace, Token::Alpha(b'w'), Token::Alpha(b'o'), Token::Alpha(b'r'), Token::Alpha(b'l'), Token::Alpha(b'd'), Token::EOF]);
}
测试失败,错误如下:
"线程‘test_03’在‘断言失败:(left == right)
’时死机
左:[Alpha(111), WhiteSpace, Alpha(100), EOF]
,
右:[Alpha(104), Alpha(101), Alpha(108), Alpha(108), Alpha(111), WhiteSpace, Alpha(119), Alpha(111), Alpha(114), Alpha(108), Alpha(100), EOF]
‘,测试/词汇.rs:15:3"
其中‘Right’具有正确的值,而Left是我的代码生成的值.所以基本上它是在接受‘h’,但接下来它不是读‘e’,而是直接进入空格.
我试过用PRINT语句调试小段代码,但都不起作用.我甚至try 将字符串处理 case 移到顶部,但这不起作用.任何关于我可以修改的帮助都将不胜感激,因为我是铁 rust 新手,不太确定这里可能有什么错误!
以下是我的代码:
pub enum Token {
Keyword(Vec<u8>),
Alpha(u8),
Digit(u8),
LeftParen,
RightParen,
LeftCurly,
RightCurly,
Equal,
Plus,
Dash,
Quote,
WhiteSpace,
Semicolon,
Comma,
Other,
EOF,
}
//LEX FUNCTION:
pub fn lex(input: &str) -> Vec<Token> {
let bytes = input.as_bytes();
let mut tokens = vec![];
let mut count = 0;
while count < bytes.len() {
let token = match bytes[count] {
//handling string input-- not working?
0x22 => {
let mut string = String::new(); // use String instead of Vec<u8>
count += 1;
while count < bytes.len() && bytes[count] != 0x22 {
string.push(bytes[count] as char); // append the character to the string
count += 1;
}
count += 1;
Token::Keyword(string.into_bytes()) // convert the String back to Vec<u8>
}
0x41..=0x5A | 0x61..=0x7A => {
let mut keyword = vec![bytes[count]];
while count + 1 < bytes.len()
&& (bytes[count + 1] >= 0x41 && bytes[count + 1] <= 0x5A
|| bytes[count + 1] >= 0x61 && bytes[count + 1] <= 0x7A)
{
keyword.push(bytes[count + 1]);
count += 1;
}
match &keyword[..] {
b"true" => Token::Keyword(keyword),
b"false" => Token::Keyword(keyword),
b"fn" => Token::Keyword(keyword),
b"return" => Token::Keyword(keyword),
b"let" => Token::Keyword(keyword),
_ => Token::Alpha(bytes[count]),
}
}
0x30..=0x39 => Token::Digit(bytes[count]),
0x28 => Token::LeftParen,
0x29 => Token::RightParen,
0x7B => Token::LeftCurly,
0x7D => Token::RightCurly,
0x3D => Token::Equal,
0x2B => Token::Plus,
0x2D => Token::Dash,
0x20 | 0xA | 0x9 => Token::WhiteSpace, //whitespace error?
0x3B => Token::Semicolon,
0x2C => Token::Comma,
_ => Token::Other,
};
tokens.push(token);
count += 1;
}
tokens.push(Token::EOF);
tokens
}
pub fn strip_whitespace(tokens: Vec<Token>) -> Vec<Token> {
let mut new:Vec<Token> = vec![];
for token in tokens{
if token != Token::WhiteSpace{
new.push(token);
}
}
return new;
}